The following data sets are in the SiLK Flow record format. These files are provided as reference data or sample data for use with the SiLK tool suite.
This data set is from a Cyber Exercise conducted by the Software Engineering Institute at Carnegie Mellon University in June 2015. The exercise was conducted in a virtual environment whose topology is reflected in the FlamingCupcakeChallengeNet.pdf file included in both zip files. Network traffic was captured during the exercise, and pcap files and a SiLK flow repository are provided here for use in training, documents, analytic testing, or other uses in accordance with the copyright specified by the DistributionStatementDM-0002956.txt file, also included in the zip files.
The FCCX data is separated into packet data and SiLK data.
(SHA256=18a0bf030866b2712953593edbcd46de975c1c3bddce14bf4cb855550b00a704)
(SHA256=ffa086cd3ea707cb38d91a57befa9dd3123cc2ea89441568b9112b90d12f6c42)
To unpack the files, run
gzip -d -c FCCX-pcap.tar.gz | tar xf - gzip -d -c FCCX-silk.tar.gz | tar xf -
The FCCX-pcap.tar.gz file unpacks into a FCCX-data subdirectory of your current working directory. This data may be analyzed using a tool like Wireshark. The FlamingCupcakeChallengeNet.pdf file provides network diagrams that may be used to better understand the network topology used during the exercise.
The FCCX-silk.tar.gz file unpacks into a FCCX-silk subdirectory of your current working directory. Under the FCCX-silk directory is a silk.conf file and a tree of subdirectories and SiLK Flow files that comprise a SiLK data repository which may be queried using rwfilter. The easiest way to use this data repository is to set the environment variable SILK_DATA_ROOTDIR to that directory's location.
export SILK_DATA_ROOTDIR=/full/path/FCCX-silk
To avoid having to give individual file names on the rwfilter command line, you need to provide the --start-date and --end-date switches to specify the time range these data files span. The first hour for which data is available is 2015/06/02T13 UTC, and the final hour is 2015/06/18T18 UTC.
The traffic is across multiple sensors. Use the rwsiteinfo command and the --fields switch to explore information about the repository contained in silk.conf. The following rwsiteinfo command lists all the available sensors and their ids.
rwsiteinfo --fields=id-sensor,describe-sensor
The sensor descriptions reflect switch names that were capture points shown in the network diagrams in the FlamingCupcakeChallengeNet.pdf file contained in the repository root.
There are six possible values for rwfilter's --type switch: in, inweb, out, outweb, int2int, and ext2ext. ICMP traffic is contained in the in, out, int2int, and ext2ext flowtypes. Please be aware of this if you are doing port analysis using those flowtypes.
Below are sample commands you may invoke in your shell to unpack the
data sets and use them. The leading dollar sign ($
)
represents your shell's prompt. Lines that end in a backslash
(\
) have been wrapped for improved readability. These
commands assume you have downloaded the FCCX-silk file to the
/tmp directory, and the commands assume the SiLK tools
are installed on your PATH.
The sample rwfilter command scans all outbound flow records across every protocol and stores the findings in the file named sample.rw. This sample.rw file is the input to the rwstats commands. The first rwstats command bins the flow records by protocol, and prints the top 5 protocols by flow record count. The second rwstats command bins the flow records by destination port and prints the top 5 destination ports by flow record count. The final rwstats command bins the flow records by source port and prints the top 5 source ports by flow record count.
$ cd /tmp $ gzip -d -c FCCX-silk.tar.gz | tar xf - $ export SILK_DATA_ROOTDIR=/tmp/FCCX-silk $ rwfilter --type=out,outweb --start-date=2015/06/02T13 \ --end-date=2015/06/18T18 --protocol=0- --pass=sample.rw $ rwstats sample.rw --fields=protocol --count=5 INPUT: 21216409 Records for 5 Bins and 21216409 Total Records OUTPUT: Top 5 Bins by Records pro| Records| %Records| cumul_%| 17| 14135688| 66.626204| 66.626204| 6| 6891630| 32.482547| 99.108751| 1| 187788| 0.885107| 99.993859| 89| 1279| 0.006028| 99.999887| 2| 24| 0.000113|100.000000| $ rwstats sample.rw --fields=dport --count=5 INPUT: 21216409 Records for 37556 Bins and 21216409 Total Records OUTPUT: Top 5 Bins by Records dPort| Records| %Records| cumul_%| 53| 10741659| 50.629015| 50.629015| 443| 2085773| 9.830943| 60.459958| 80| 1254747| 5.914040| 66.373999| 5723| 697874| 3.289313| 69.663311| 11009| 379771| 1.789987| 71.453298| $ rwstats sample.rw --fields=sport --count=5 INPUT: 21216409 Records for 64527 Bins and 21216409 Total Records OUTPUT: Top 5 Bins by Records sPort| Records| %Records| cumul_%| 53| 2826909| 13.324163| 13.324163| 443| 728080| 3.431683| 16.755847| 137| 280973| 1.324319| 18.080166| 5723| 273755| 1.290298| 19.370465| 0| 189215| 0.891833| 20.262298|
This sample data is derived from anonymized enterprise packet header traces obtained from Lawrence Berkeley National Laboratory and ICSI, and is used here with their permission. This data covers selected hours on selected dates in late 2004 and early 2005. For more information on the source of this data set, see http://www.icir.org/enterprise-tracing/Overview.html
The packet capture files were processed with yaf to create IPFIX flow records. The rwflowpack tool read the IPFIX records and created a data repository of 452 hourly SiLK Flow files. The LBNL/ICSI packet data is separated into non-scanning traffic and scanning traffic, and the SiLK data is packaged similarly.
(SHA256=fe353d647346a069ac357ad70525f73b0f8556e5aec3d67368b8008bf32b3187)
(SHA256=f89ea3b15dd02a08a0a539a5b508c98575781192fd3163acf1986a2ac203d0ea)
To unpack the files, run
gzip -d -c SiLK-LBNL-05-nonscan.tar.gz | tar xf - gzip -d -c SiLK-LBNL-05-scanners.tar.gz | tar xf -
Each file unpacks into the SiLK-LBNL-05 subdirectory of your current working directory. Under the SiLK-LBNL-05 directory is a silk.conf file and a tree of subdirectories and SiLK Flow files that comprise a SiLK data repository which may be queried using rwfilter. To easiest way to use this data repository is to set the environment variable SILK_DATA_ROOTDIR to that directory's location.
export SILK_DATA_ROOTDIR=/full/path/SiLK-LBNL-05
To avoid having to give individual file names on the rwfilter command line, you need to provide the --start-date and --end-date switches to specify the time range these data files span. The first hour for which data is available is 2004/10/04:20 UTC, and the final hour is 2005/01/08:05 UTC.
When the data was packaged by rwflowpack, the non-scanning traffic was assigned to sensor S0 and the scanning traffic to sensor S1. Use the --sensor switch on rwfilter to select a single type of traffic for analysis. (When analyzing data from both sensors simultaneously, note that LBNL anonymized the two sets of data differently, such that an individual real host address was mapped to different IPs in the two data sets.)
rwflowpack split the data into incoming flows and outgoing flows by considering the list of CIDR blocks below as the internal network. TCP traffic on common HTTP ports (80,443,8080) was split into the web subtype. Thus, there are four possible values for rwfilter's --type switch: in, inweb, out, and outweb.
128.3.0.0/16 128.55.0.0/16 131.243.0.0/16 198.125.133.0/24 198.128.24.0/22 198.129.88.0/22
The following rwfilter command visits all the hourly data files and pipes all the flow records to some other SiLK tool. (Specifying --type=all would have a similar effect, and the --sensor switch is not necessary since rwfilter processes data for all sensors by default.) Use a file name as the argument to --all-destination if you want to merge all the hourly files into a single file for easier analysis.
rwfilter --start-date=2004/10/04:20 --end-date=2005/01/08:05 \ --sensor=S0,S1 --type=in,inweb,out,outweb \ --all-destination=stdout \ | ...
Below are sample commands you may invoke in your shell to unpack the
data sets and use them. The leading dollar sign ($
)
represents your shell's prompt. Lines that end in a backslash
(\
) have been wrapped for improved readability. These
commands assume you have downloaded both files to the
/tmp directory, and the commands assume the SiLK tools
are installed on your PATH.
The sample rwfilter command scans all files to find flow records where the protocol is 1 (ICMP), 6 (TCP), or 17 (UDP). The binary flow records are written to the standard output, and a summary of rwfilter's findings are written to the standard error. The output from rwfilter is the input to the rwuniq command, which bins the flow records by protocol and, for each bin, prints the number of records, their byte and packet counts, the starting time of the earliest record and the ending time of latest record.
$ cd /tmp $ gzip -d -c SiLK-LBNL-05-nonscan.tar.gz | tar xf - $ gzip -d -c SiLK-LBNL-05-scanners.tar.gz | tar xf - $ export SILK_DATA_ROOTDIR=/tmp/SiLK-LBNL-05 $ rwfilter --start-date=2004/10/04:20 --end-date=2005/01/08:05 \ --sensor=S0,S1 --type=all --proto=1,6,17 --print-volume \ --threads=4 --pass-destination=stdout \ | rwuniq --fields=proto --sort-output \ --values=records,bytes,packets,stime,etime | Recs| Packets| Bytes| Files| Total| 5866314| 155520999| 88858102591| 452| Pass| 5851584| 155228649| 88779771406| | Fail| 14730| 292350| 78331185| | pro|Records| Bytes| Packets| sTime-Earliest| eTime-Latest| 1| 321678| 58471992| 865991|2004/10/04T20:03:44|2005/01/08T05:28:34| 6|1935300|75022603954|127277668|2004/10/04T20:03:41|2005/01/08T05:28:37| 17|3594606|13698695460| 27084990|2004/10/04T20:03:41|2005/01/08T05:28:37|