网络安全公开数据集

from: http://users.cis.fiu.edu/~lpeng/Datasets_detail.html
DARPA入侵检测数据集
DARPA数据集是迄今为止网络入侵检测领域的标准数据集，该数据集包括DARPA 1998、DARPA 1999和DARPA 2000三个数据集。

DARPA 1998数据集

收集了9周的 TCPDUMP网络连接和系统审计数据，7周的训练数据，2周的测试数据，包含了Probe、DoS、R2L、U2R四大类攻击。

DARPA 1999数据集

DARPA 1999覆盖了Probe、DoS、R2L、U2R和Data等5大类58种典型攻击方式，是目前最为全面的攻击测试数据集，作为研究领域共同认可及广泛使用的基准数据集，DARPA 1999评测数据给出了5周的模拟数据。其中前两周是提供给参于评测者的训练数据：第1，3周为不包含任何攻击的正常数据；第2周中插入了属于18种攻击类型的43次攻击实例，第4,5周用于测试。

DARPA 2000数据集

DARPA 2000在DARPA 1999基础上攻击数据中加入了DDoS (Distributed Deny of Service)攻击，并增加了内部攻击、内部监听数据，以及Windows NT流量和攻击。

KDD Cup 99数据集

来自哥伦比亚大学的Sal Stolfo 教授和来自北卡罗莱纳州立大学的 Wenke Lee 教授采用数据挖掘等技术对DARPA 98和DARPA 99数据集进行特征分析和数据预处理，形成了一个新的数据集。该数据集用于1999年举行的KDD CUP竞赛中，成为著名的KDD CUP 99数据集。虽然年代有些久远，但KDD CUP 99数据集仍然是网络入侵检测领域的事实Benckmark，为基于计算智能的网络入侵检测研究奠定了研究基础。

网络下载

NSL-KDD数据集

针对KDD CUP 99数据集出现的不足，NSL-KDD 数据集除去了KDD CUP 99数据集中冗余的数据，克服了分类器偏向于重复出现的记录，学习方法的性能受影响等问题，另外，对正常和异常的数据比例进行了合适选择，测试和训练数据数量更合理，因此更适合在不同的机器学习技术之间进行有效准确的评估。

本地下载或者网络下载

Honeynet数据集

Honeynet 数据集是由HoneyNet组织收集的黑客攻击数据集，能较好地反映黑客攻击模式，数据集包括从2000年4月到2011年2月，累计11个月的Snort报警数据，每月大概60-3000多条Snort报警记录，其网络由8个IP地址通过ISDN连接到ISP，这样与大多数家庭和商业用户的网络环境基本一致，运行的操作系统包括Solaris Sparc, WinNT, Win98, and Linux Red Hat。

本地下载或者网络下载

Challenge 2013数据集

Challenge 2013是IEEE Visualization 举办的可视分析挑战赛VAST Challenge 2013 中关于网络安全数据可视分析的竞赛数据集，该数据集提供了某虚构的跨国公司内部网络两周的运行日志，日志类型有3种，分别是网络流量Netflow日志数据和Big Brother 网络健康和状态数据，日志包括：第一、二周的Netflow和Big Brother日志，第二周的入侵预防系统日志数据，通过日志的分析可以找出网络中存在的异常，网络包含的主机和服务器约1100 台，原始日志量接近10 GB，记录数超过9000万行,下载要先输入邮箱地址。

网络下载

Adult数据集

该数据集来自UCI，又名人口调查数据集，来自于美国1994年人口调查数据库，共有记录48842条，格式为TEXT，包含14个属性，分别为Age，workclass，fnlwgt，education，education-num，marital-status，occupation，relationship，race，sex，capital-gain，capital-loss，hours-per-week，native-country，该数据集适用于机器学习、数据挖掘和隐私保护等。

本地下载或网络下载

恶意软件数据集

该数据集由West Virginia University的Yanfang Ye 提供，包括二个部分，其中第一个用于恶意软件检测，包含50000个实例，其中一半是恶意软件中提取的特征，另外一半是良性文件中提取的特征，通过该数据集，可以在数据挖掘和大数据建模技术的基础上，通过Win API调用提取特征集进行恶意软件检测。

本地下载或网络下载

第二个用于基于文件说明的恶意软件聚类，包含69,165个文件样本，其中3095个是恶意软件，22,583个是良性文件，其余45,487个是未知文件。

本地下载或网络下载

一些开放的网络安全数据集——木马、蠕虫、僵尸网络等数据集

ISCX-2016-SlowDos
名称    类型
slowbody2
slowread
ddossim    DoS GET
goldeneye    DoS improved GET
slowheaders
rudy    slow send body
hulk    DoS GET
slowloris    slow-send headers
Slowhttptest    slow-read 、slow-send headers、slow send body

地址：https://www.unb.ca/cic/datasets/dos-dataset.html
ISCX-Bot-2014

    为了保证僵尸网络符合真实环境的情况，混合了
    ISOT dataset、ISCX 2012 IDS dataset 、Botnet traffic generated by the Malware Capture Facility Project的子集

名称    类型
Neris    IRC
Rbot    IRC
Menti    IRC
Sogou    HTTP
Murlo    IRC
Virut    HTTP
NSIS    P2P
Zeus    P2P
SMTP Spam    P2P
UDP Storm    P2P
Tbot    IRC
Zero Access    P2P
Weasel    P2P
Smoke Bot    P2P
Zeus Control (C&C)    P2P
ISCX IRC bot    P2P

地址：https://www.unb.ca/cic/datasets/botnet.html
isot_app_and_botnet_dataset

类别：HTTP僵尸网络
应用范围：DNS

组成：由不同僵尸网络生成的恶意DNS流量组成的僵尸网络数据集和由不同已知软件应用程序生成的DNS流量组成的良性数据集。

https://www.uvic.ca/engineering/ece/isot/datasets/

Alenazi A., Traore I., Ganame K., Woungang I. (2017) Holistic Model for HTTP Botnet Detection Based on DNS Traffic Analysis. In: Traore I., Woungang I., Awad A. (eds) Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments. ISDDC 2017. Lecture Notes in Computer Science, vol 10618. Springer, Cham
CTU-13 DATASET

包含了13个场景下的僵尸网络流量数据

image

地址：https://mcfp.weebly.com/the-ctu-13-dataset-a-labeled-dataset-with-botnet-normal-and-background-traffic.html
MSNBC.com匿名网络数据

数据描述了1999年9月28日访问过msnbc.com的用户的页面访问量。访问次数记录在URL类别级别（请参阅说明），并按时间顺序记录

https://kdd.ics.uci.edu/databases/msnbc/msnbc.html
UNINA traffic traces

真实网络的流量跟踪和时间序列。

http://traffic.comics.unina.it/Traces/ttraces.php
USC ISI web server

类型涉及TCP、IP、DNS、HTTP、ICMP
包括正常、异常检测、木马、蠕虫、僵尸网络等数据集
就是申请很是麻烦

数据集：http://www.isi.edu/ant/traces/dataset_list.html

申请地址：https://ant.isi.edu/datasets/requests.html
————————————————
原文链接：https://blog.csdn.net/qq_29857719/article/details/89211420

Datasets

Canadian Institute for Cybersecurity datasets are used around the world by universities, private industry, and independent researchers.

The following datasets are currently available:

DDoS Evaluation Dataset (CIC-DDoS2019) 举例

2. Dataset

CICDDoS2019 contains benign and the most up-to-date common DDoS attacks, which resembles the true real-world data (PCAPs). It also includes the results of the network traffic analysis using CICFlowMeter-V3 with labeled flows based on the time stamp, source, and destination IPs, source and destination ports, protocols and attack (CSV files).

Generating realistic background traffic was our top priority in building this dataset. We have used our proposed B-Profile system (Sharafaldin, et al. 2016) to profile the abstract behavior of human interactions and generates naturalistic benign background traffic in the proposed testbed (Figure 2). For this dataset, we built the abstract behaviour of 25 users based on the HTTP, HTTPS, FTP, SSH, and email protocols.

Machine	OS	IPs
Server	Ubuntu 16.04 (Web Server)	192.168.50.1 (first day) 192.168.50.4 (second day)
Firewall	Fortinet	205.174.165.81
PCs (first day)	Win 7 Win Vista Win 8.1 Win 10	192.168.50.8 192.168.50.5 192.168.50.6 192.168.50.7
PCs (second day)	Win 7 Win Vista Win 8.1 Win 10	192.168.50.9 192.168.50.6 192.168.50.7 192.168.50.8

In this dataset, we have different modern reflective DDoS attacks such as PortMap, NetBIOS, LDAP, MSSQL, UDP, UDP-Lag, SYN, NTP, DNS, and SNMP. Attacks were subsequently executed during this period. As Table III shows, we executed 12 DDoS attacks includes NTP, DNS, LDAP, MSSQL, NetBIOS, SNMP, SSDP, UDP, UDP-Lag, WebDDoS, SYN and TFTP on the training day and 7 attacks including PortScan, NetBIOS, LDAP, MSSQL, UDP, UDP-Lag and SYN in the testing day. The traffic volume for WebDDoS was so low and PortScan just has been executed in the testing day and will be unknown for evaluating the proposed model.

Days	Attacks	Attack Time
First Day	PortMap NetBIOS LDAP MSSQL UDP UDP-Lag SYN	9:43 - 9:51 10:00 - 10:09 10:21 - 10:30 10:33 - 10:42 10:53 - 11:03 11:14 - 11:24 11:28 - 17:35
Second Day	NTP DNS LDAP MSSQL NetBIOS SNMP SSDP UDP UDP-Lag WebDDoS SYN TFTP	10:35 - 10:45 10:52 - 11:05 11:22 - 11:32 11:36 - 11:45 11:50 - 12:00 12:12 - 12:23 12:27 - 12:37 12:45 - 13:09 13:11 - 13:15 13:18 - 13:29 13:29 - 13:34 13:35 - 17:15

3. Using the dataset

The dataset has been organized per day. For each day, we recorded the raw data including the network traffic (Pcaps) and event logs (windows and Ubuntu event Logs) per machine. In features extraction process from the raw data, we used the CICFlowMeter-V3 and extracted more than 80 traffic features and saved them as a CSV file per machine.

If you want to use the AI techniques to analyze, you can download our generated data (CSV) files and analyze the network traffic.

If you want to use a new feature extractor, you can use the raw captured files (PCAP) to extract your features. And then, you can use the data mining techniques for analyzing the generated data.

4. License

You may redistribute, republish, and mirror the CICDDoS2019 dataset in any form. However, any use or redistribution of the data must include a citation to the CICDDoS2019 dataset and related published paper. A research paper outlining the details of analyzing the similar IDS/IPS dataset and related principles:

Iman Sharafaldin, Arash Habibi Lashkari, Saqib Hakak, and Ali A. Ghorbani, "Developing Realistic Distributed Denial of Service (DDoS) Attack Dataset and Taxonomy", IEEE 53rd International Carnahan Conference on Security Technology, Chennai, India, 2019

posted @ 2020-11-27 16:18 bonelee 阅读(18180) 评论(1) 收藏举报

刷新页面返回顶部

将者，智、信、仁、勇、严也。

Hi，我是李智华，华为-安全AI算法专家，欢迎来到安全攻防对抗的有趣世界。