The following data sets are in the SiLK Flow record format. These files are provided as reference data or sample data for use with the SiLK tool suite.

LBNL-05

This sample data is derived from anonymized enterprise packet header traces obtained from Lawrence Berkeley National Laboratory and ICSI, and is used here with their permission. This data covers selected hours on selected dates in late 2004 and early 2005. For more information on the source of this data set, see http://www.icir.org/enterprise-tracing/Overview.html

The packet capture files were processed with yaf to create IPFIX flow records. The rwflowpack tool read the IPFIX records and created a data repository of 452 hourly SiLK Flow files. The LBNL/ICSI packet data is separated into non-scanning traffic and scanning traffic, and the SiLK data is packaged similarly.

Non-scanning traffic, 278 hourly SiLK Flow files

(MD5=9b173ada1a3ffdcf94edee6df5dc1ec3)

(SHA1=d5b375408727414ec30acb8a87cc6b1bd556601f)

Scanning traffic, 174 hourly SiLK Flow files

(MD5=0fe65a7d8008546473db7c1d48cd528f)

(SHA1=aa6d7457772645d0e37107dc39ab8e90790981f8)

To unpack the files, run

      gzip -d -c SiLK-LBNL-05-nonscan.tar.gz | tar xf -
      gzip -d -c SiLK-LBNL-05-scanners.tar.gz | tar xf -

Each file unpacks into the SiLK-LBNL-05 subdirectory of your current working directory. Under the SiLK-LBNL-05 directory is a silk.conf file and a tree of subdirectories and SiLK Flow files that comprise a SiLK data repository which can be queried using rwfilter. To easiest way to use this data repository is to set the environment variable SILK_DATA_ROOTDIR to that directory's location.

      export SILK_DATA_ROOTDIR=/full/path/SiLK-LBNL-05

To avoid having to give individual file names on the rwfilter command line, you need to provide the --start-date and --end-date switches to specify the time range these data files span. The first hour for which data is available is 2004/10/04:20 UTC, and the final hour is 2005/01/08:05 UTC.

When the data was packaged by rwflowpack, the non-scanning traffic was assigned to sensor S0 and the scanning traffic to sensor S1. Use the --sensor switch on rwfilter to select a single type of traffic for analysis. (When analyzing data from both sensors simultaneously, note that LBNL anonymized the two sets of data differently, such that an individual real host address was mapped to different IPs in the two data sets.)

rwflowpack split the data into incoming flows and outgoing flows by considering the list of CIDR blocks below as the internal network. TCP traffic on common HTTP ports (80,443,8080) was split into the web subtype. Thus, there are four possible values for rwfilter's --type switch: in, inweb, out, and outweb.

      128.3.0.0/16
      128.55.0.0/16
      131.243.0.0/16
      198.125.133.0/24
      198.128.24.0/22
      198.129.88.0/22

The following rwfilter command visits all the hourly data files and pipes all the flow records to some other SiLK tool. (Specifying --type=all would have a similar effect, and the --sensor switch is not necessary since rwfilter processes data for all sensors by default.) Use a file name as the argument to --all-destination if you want to merge all the hourly files into a single file for easier analysis.

      rwfilter --start-date=2004/10/04:20 --end-date=2005/01/08:05   \
            --sensor=S0,S1 --type=in,inweb,out,outweb                \
            --all-destination=stdout                                 \
        | ...

Below are sample commands you can invoke in your shell to unpack the data sets and use them. The leading dollar sign ($) represents your shell's prompt. Lines that end in a backslash (\) have been wrapped for improved readability. These commands assume you have downloaded both files to the /tmp directory, and the commands assume the SiLK tools are installed on your PATH.

The sample rwfilter command scans all files to find flow records where the protocol is 1 (ICMP), 6 (TCP), or 17 (UDP). The binary flow records are written to the standard output, and a summary of rwfilter's findings are written to the standard error. The output from rwfilter is the input to the rwuniq command, which bins the flow records by protocol and, for each bin, prints the number of records, their byte and packet counts, the starting time of the earliest record and the ending time of latest record.

$ cd /tmp
$ gzip -d -c SiLK-LBNL-05-nonscan.tar.gz | tar xf -
$ gzip -d -c SiLK-LBNL-05-scanners.tar.gz | tar xf -
$ export SILK_DATA_ROOTDIR=/tmp/SiLK-LBNL-05
$ rwfilter --start-date=2004/10/04:20 --end-date=2005/01/08:05  \
        --sensor=S0,S1 --type=all --proto=1,6,17 --print-volume \
        --threads=4 --pass-destination=stdout                   \
  | rwuniq --fields=proto --sort-output                         \
        --values=records,bytes,packets,stime,etime

     |     Recs|    Packets|        Bytes|  Files|
Total|  5866314|  155520999|  88858102591|    452|
 Pass|  5851584|  155228649|  88779771406|       |
 Fail|    14730|     292350|     78331185|       |
pro|Records|      Bytes|  Packets|     sTime-Earliest|       eTime-Latest|
  1| 321678|   58471992|   865991|2004/10/04T20:03:44|2005/01/08T05:28:34|
  6|1935300|75022603954|127277668|2004/10/04T20:03:41|2005/01/08T05:28:37|
 17|3594606|13698695460| 27084990|2004/10/04T20:03:41|2005/01/08T05:28:37|