FeaturesFrom: Terry Brugger
Date: 15 Sep 2007
Subject: KDD Cup '99 dataset (Network Intrusion) considered harmful
Oftentimes in the scientific community, we become interested in new techniques or approaches based on characteristics of the technique or approach itself. While such investigation may be informative from a pure research standpoint, the general public -- and particularly most research sponsors -- tend to be more interested in the application of this technology. To this end, the KDD Cup Challenge has, for over ten years, provided the KDD community with datasets from real world problems to demonstrate the applicability and performance of different knowledge discovery techniques. Researchers in the computer security community (based on the tone of papers published at the time) were initially excited to see a problem from their domain adopted for the 1999 KDD Cup Challenge. Since then, however, the dataset has become widely discredited. This letter is intended to briefly outline the problems that have been cited with the KDD Cup '99 dataset, and discourage its further use.
The KDD Cup '99 dataset was created by processing the tcpdump portions of the 1998 DARPA Intrusion Detection System (IDS) Evaluation dataset, created by Lincoln Lab under contract to DARPA [Lippmann et al]. Since one can not know the intention (benign or malicious) of every connection on a real world network (if we could, we would not need research in intrusion detection), the artificial data was generated using a closed network, some proprietary network traffic generators, and hand-injected attacks. It was intended to simulate the traffic seen in a medium sized US Air Force base (and was created in collaboration with the AFRL in Rome, NY, which could be characterized as a medium sized US Air Force base).
Based on the published description of how the data was generated, McHugh published a fairly harsh criticism of the dataset. Among the issues raised, the most important seemed to be that no validation was ever performed to show that the DARPA dataset actually looked like real network traffic. Indeed, even a cursory examination of the data showed that the data rates were far below what will be experienced in a real medium sized network. Nevertheless, IDS researchers continued to use the dataset (and the KDD Cup dataset that was derived from it) for lack of anything better.
In 2003, Mahoney and Chan built a trivial intrusion detection system and ran it against the DARPA tcpdump data. They found numerous irregularities, including that -- due to the way the data was generated -- all the malicious packets had a TTL of 126 or 253 whereas almost all the benign packets had a TTL of 127 or 254. This served to demonstrate to most people in the network security research community that the DARPA dataset (and by extension, the KDD Cup '99 dataset) was fundamentally broken, and one could not draw any conclusions from any experiments run using them. Numerous researchers indicated to us (in personal conversations) that if they were reviewing a paper based solely on the DARPA dataset, they would reject it solely on that basis.
Indeed, at the time we were conducting our own assessment of the DARPA dataset, using Snort [Caswell and Roesch]. Trivial detection using the TTL aside, we found that it was still useful to evaluate the true positive performance of a network IDS; however, any false positive results were meaningless [Brugger and Chow]. Anonymous reviewers at respectable information security conferences were unimpressed; one noted, ``is there any interest to study the capacities of SNORT on such data?''. A reviewer from another conference summarized their review with ``The content of the paper is really out of date. If this paper appears five years ago, there is some value, but not much now.''
While the DARPA (and KDD Cup '99) dataset has fallen from grace in the network security community, we still see it widely used in the greater KDD community. Examples in the past couple years include [Kayacik et al.], [Sarasamma et al.], [Gao et al.], [Chan et al.], and [Zhang et al.]. While this sample doesn't necessarily represent the top-tier journals and conferences in the KDD community, they are to the best of our knowledge respectable, peer-reviewed publications. Obviously, the knowledge discovery researchers are well intentioned by wanting to show the usefulness of every technique imaginable to the network intrusion detection domain. Unfortunately, due to the problems with the dataset, such conclusions can not be drawn. As a result, we strongly recommend that (1) all researchers stop using the KDD Cup '99 dataset, (2) The KDD Cup and UCI websites include a warning on the KDD Cup '99 dataset webpage informing researchers that there are known problems with the dataset, and (3) peer reviewers for conferences and journals ding papers (or even outright reject them, as is common in the network security community) with results drawn solely from the KDD Cup '99 dataset.
S Terry Brugger, zow at acm dot org
UC Davis, Department of Computer Science
- Brugger, S. T. and J. Chow (January 2007). An assessment of the DARPA IDS Evaluation Dataset using Snort. Technical Report CSE-2007-1, University of California, Davis, Department of Computer Science, Davis, CA.http://www.cs.ucdavis.edu/research/tech-reports/2007/CSE-2007-1.pdf.
- Caswell, B. and M. Roesch (16 May 2004). Snort: The open source network intrusion detection system. http://www.snort.org/.
- Chan, A. P., W. W. Y. Ng, D. S. Yeung, and E. C. C. Tsang ( 19-21 August 2005). Comparison of different fusion approaches for network intrusion detection using ensemble of RBFNN. In Proc. of 2005 Intl. Conf. on Machine Learning and Cybernetics, Volume 6, Guangzhou, China, pp. 3846-3851. IEEE.
- Hai-Hua Gao, Hui-Hua Yang, X.-Y. W. (27-29 August 2005). Principal component neural networks based intrusion feature extraction and detection using SVM. In Advances in Natural Computation, Volume 3611 of Lecture Notes in Computer Science, Changsha, China, pp. 21-27. Springer.
- Kayacik, H. G., A. N. Zincir-Heywood, and M. I. Heywood (June 2007). A hierarchical SOM-based intrusion detection system. Engineering Applications of Artificial Intelligence 20 (4), 439-451. Full text not available; analysis based on detailed abstract.
- Lippmann, R. P., D. J. Fried, I. Graf, J. W. Haines, K. Kendall, D. McClung, D. Weber, S. Webster, D. Wyschogrod, R. K. Cunningham, and M. Zissman (January 2000). Evaluating intrusion detection systems: The 1998 DARPA off-line intrusion detection evaluation. In Proc. of the DARPA Information Survivability Conference and Exposition, Los Alamitos, CA. IEEE Computer Society Press.
- Mahoney, M. V. and P. K. Chan (8-10 September 2003). An analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for network anomaly detection. In G. Vigna, E. Jonsson, and C. Krugel (Eds.), Proc. 6th Intl. Symp. on Recent Advances in Intrusion Detection (RAID 2003), Volume 2820 of Lecture Notes in Computer Science, Pittsburgh, PA, pp. 220-237. Springer.
- McHugh, J. (2000). Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Trans. Information System Security 3 (4), 262-294.
- Sarasamma, S. T., Q. A. Zhu, and J. Huff (April 2005). Hierarchical Kohonenen net for anomaly detection in network security. IEEE Trans. Syst., Man, Cybern. B 35 (2), 302-312.
- Zhang, C., J. Jiang, and M. Kamel (May 2005). Intrusion detection using hierarchical neural networks. Pattern Recognition Letters 26 (6), 779-791.
All opinions expressed are solely the view of the author(s), and are not necessarily shared or endorsed by The University of California, Davis, or their employer(s).