CERT/CC
background
background
CERT NetSA Security Suite 
Open Source Tools for Network Monitoring 
News | Documentation | Downloads
YAF 0.8.1 | NAF 0.6.0 | SiLK 1.0.1 | RAVE 1.9.9
fixbuf 0.7.3 | ipa 0.2.1 | airdbc 0.2.2 | airframe 0.7.2 | Portal 0.8.0
SiLK - F.A.Q.
Documentation | Downloads | Release Notes | FAQ | License | Credits | Reference Data | Live CD

Background

Configuration

Operations

ANALYSIS

BACKGROUND

What is SiLK?

SiLK is a collection of traffic analysis tools developed by the CERT Network Situational Awareness Team (CERT NetSA) to facilitate security analysis of large networks. The SiLK tool suite supports the efficient collection, storage, and analysis of network flow data, enabling network security analysts to rapidly query large historical traffic data sets.

Does SiLK support IPv6?

SiLK 1.0 provides IPv6 support in the following tools: rwfilter, rwuniq, and rwcut. To use IPv6, SiLK must be configured for IPv6 by using the --enable-ipv6 switch to the configure script when you are building SiLK.

What platforms does SiLK run on?

SiLK should run on most UNIX-like operating systems. It is most heavily tested on Linux, Solaris, and Mac OS X.

What license is SiLK released under?

SiLK is released under two licenses:

  • GNU Public License (GPL) Rights pursuant to Version 2, June 1991
  • Government Purpose License Rights (GPLR) pursuant to DFARS 252.225-7013
Where can I download SiLK?

The latest Open Source version of SiLK and selected previous releases are available from http://tools.netsa.cert.org/silk/silk_download.html.

Something is not working as expected, where can I check for errors?

The applications that make up the packing system (flowcap, rwflowpack, rwflowappend, rwsender, and rwreceiver) write error messages to log files. The location of these log files is set when the daemon is started, with the default location being /usr/local/var/silk.

All other applications write error messages to the standard error (stderr).

Whom do I contact for support?

Your primary support person should be the person or group that installs and maintains SiLK at your site. Also see the answer to the following question.

How do I report a bug?

Send a detailed bug report to contact_email. Your bug report should include:

  • the operating system you are using (the output of uname -a)
  • the version of the tool that is causing the bug. You can determine this by running TOOL --version, e.g., rwfilter --version. Include the entire output so will we know what optional features the tool may be using.
  • If you cannot run TOOL --version or it exits without printing anything, send the output of ldd TOOL (or the ldd equivalent on your operating system).
  • If you cannot build the tool, the version of SiLK you are attempting to install and the complete error message that make gives you.
  • the exact command line that caused the problem. If the command is part of a pipeline, include the entire pipeline since the bug may be caused by something happening upstream. You may obfuscate IP addresses or sensor names in the command, but please let us know that you've modified the command.
  • the complete error message you receive
  • If the command is reading SiLK data files, the output of running rwfileinfo on those files may be helpful.

This page provides a good description of writing an effective bug report.

How do I contribute a patch or fix?

You may send bug fixes and patches to contact_email.

CONFIGURATION

What applications and hardware can generate the flows for use in SiLK?

SiLK accepts flows in the NetFlow v5 format from a router. These flows are sometimes called Protocol Data Units (PDU).

SiLK also accepts flows in the IPFIX (Internet Protocol Flow Information eXport) format. You can use the YAF flow meter to generate IPFIX flows from libpcap (tcpdump) data or by live capture.

Why does SiLK create unidirectional flows?

SiLK's origins are in processing NetFlow v5 data, which is unidirectional. Changing SiLK to support bidirectional flows would be major change to the software. Even if SiLK supported bidirectional flows, you would still face the task of mating flows, since a site with a many access points to the Internet will display asymmetric routing (where each half of the a conversion goes through different border routers).

Can I make it bidirectional?

No, SiLK does not support bidirectional flows. You will need mate the unidirectional flows.

I have a stack of pcap (tcpdump) files, can I use SiLK to analyze them?

There are four ways to handle pcap files:

  1. Use the yaf program (from the YAF suite) to convert the pcap data to the IPFIX format, and use the rwipfix2silk program from SiLK to convert from IPFIX to a stream of SiLK Flow records. For maximum compatibility, you should pass the --silk switch to yaf. SiLK provides the rwp2yaf2silk Perl script to make this task easier. The rwipfix2silk program is only available when SiLK has been configured with fixbuf support; see the Installation Handbook for details.
  2. Use the yaf program to convert the pcap data to the IPFIX format, and use rwflowpack to convert from IPFIX to a respository of SiLK Flow data. This also requires that SiLK be configured with fixbuf support.
  3. Use the rwptoflow program, included with SiLK, to convert each packet to a SiLK Flow. Note that this tool does not combine packets into a flow, it simply converts each pcap record into a 1-packet SiLK Flow record.
  4. Search the web for software to convert the pcap data to Netflow v5 data, and use rwflowpack to convert the NetFlow v5 data to a repository of SiLK Flow data.
What is IPFIX?

IPFIX is the Internet Protocol Flow Information eXport format. Based on the NetFlow v9 format from CISCO, IPFIX is the draft IETF standard for representing flow data. The rwipfix2silk and rwsilk2ipfix programs in SiLK---which are available when SiLK has been configured with fixbuf support---will convert between the SiLK Flow format and the IPFIX format.

How many sensors does SiLK support?

The SiLK Flow format is capable of representing 65534 unique sensors.

Can I copy SiLK data between machines?

Yes, a binary file produced by a SiLK application will store its format, version, byte order, and compression method near the beginning of the file (in the file's header). (You can use the rwfileinfo tool to get a description of the contents of the file's header.) Any release of SiLK that understands that file version should be able to read the file. However, note that if the file's data is compressed, the SiLK tools on the second machine must have been compiled with support for that compression library. The SiLK tools will print an error and exit if they are unable to read a file because the tool does not understand the file's format, version, or compression method.

What ports do I need to open in a firewall?

SiLK does not use any hard-coded ports. All SiLK tools that do network communication (flowcap, rwflowpack, rwsender, and rwreceiver) have some way to specify which ports to use for communication.

When flowcap or rwflowpack collect flows from a router, you will need to open a port for UDP traffic between the router and the collection machine.

When flowcap or rwflowpack collect flows from a yaf sensor running on a different machine, you will need to open a port for TCP (or SCTP) traffic between these two machines.

Finally, when you are using flowcap on remote sensor(s) that feed data to rwflowpack running on a central data repository, you will need to open a port between each sensor and your repository. Configure flowcap or rwsender on the sensor and rwflowpack or rwreceiver on repository to use that port.

See the Installation Handbook for details on specifying ports.

Can I split flows seen by one flow meter into different sensors?

Currently this is not possible. Each flow collection point (called a probe in the SiLK documentation) corresponds to one unique sensor.

When I configure --with-python, I get error messages saying, "warning: Not importing directory 'site': missing __init__.py". How do I fix this?

This error message happens if you are running Python >= 2.5, and the PYTHONPATH environment variable includes the current working directory. Examples of bad PYTHONPATH values are any path beginning or ending with a colon (':'), and any path including a period ('.') as an element.

The solution to this problem is to either unset the PYTHONPATH before running configure, or to ensure that all references to the current working directory are removed from PYTHONPATH before running configure.

OPERATIONS

How are the SiLK Flow files organized and written to disk?

SiLK Flows are stored in binary files, where each file corresponds to unique class-type-sensor-hour tuple. Multiple data repositories may exist on a machine; however, rwfilter is only capable of examining a single data repository per invocation.

A default respository location is compiled into rwfilter. (This default is set by the --enable-data-rootdir=DIR switch to configure and defaults to /data). You may tell rwfilter to use a different repository by setting the SILK_DATA_ROOTDIR environment variable or specifying the --data-rootdir switch to rwfilter.

The directory tree for each repository is determined by the path-format entry in the silk.conf file. Traditionally, the directory structure has been /DATA_ROOTDIR/class/type/year/month/day/hourly-files

How much disk does a single SiLK Flow record occupy on disk?

A fully-expanded, uncompressed, SiLK Flow record requires 52 bytes (this is 88 bytes for IPv6 records). These records are written by rwcat --compression=none.

Records in the SiLK data repository require less space since common attributes (sensor, class, type, hour) are stored once in the file's header. The smallest record (uncompressed) in the data repository is that representing a web flow which requires only 22 bytes.

In addition, one can enable data compression in an individual SiLK application (with the --compression-method switch) or in all SiLK applications when SiLK is configured (pass --enable-output-compression to the configure script). Compression with the lzo1x algorithm reduces the overall file size by about 50%. Using zlib gives a better compression ratio, but the at the cost of access time.

The rwfileinfo command will tell you the (uncompressed) size of records in a SiLK file.

Where is the SiLK Flow file format documented?

Currently, the SiLK Flow file formats are only documented in the comments of the source files. See the rw*io.c files in the silk/src/libsilk directory.

What confidentiality and integrity properties are provided for SILK data sent across machines?

The rwsender and rwreceiver programs can use GnuTLS to provide a secure layer over a reliable transport layer. For this support to be available, SiLK's configure script must have found v1.4.1 or later of the GnuTLS library. Using GnuTLS also requires creating certificates, which is described in an appendix of the Installation Handbook.

When GnuTLS is not used or not available, communication between rwsender and rwreceiver has no confidentiality or integrity checking beyond that provided by standard TCP.

Legacy systems that use a direct connection between flowcap and rwflowpack have no confidentiality or integrity checking beyond that provided by standard TCP, and there is no way to secure this communcation without using some outside method (such as creating an ssh tunnel).

If communication between the sensor and the packer go down, are flows lost?

It depends on what you mean by "sensor". If the "sensor" is the flow generator (that is, a router or an IPFIX sensor) which is communicating directly with rwflowpack, the flows are lost when the connection goes down.

To avoid this, you can run flowcap on the sensor. flowcap acts as a flow capacitor, storing flows on the sensor until the communication link between the sensor and packer is restored. Flows will still be lost if the connection between the flow generator and flowcap goes down, but by running flowcap on a machine near the flow generator (or running both on the same machine), the communication between the generator and flowcap should be more reliable, leading to fewer dropped connections.

Can flowcap function as a "tee", both storing files and forwarding the flow stream onto some place else?

The flowcap program cannot do this itself; however, the rwsender program can send files to multiple rwreceivers. To get the "tee" functionality, have flowcap drops its files into a directory for processing by rwsender.

How do I list all sensors that are installed for a deployment?

The mapsid command will print all the sensors that have been defined at your site.

How do I rotate the SiLK log files?

If you invoke a SiLK daemon with the --log-destination=syslog switch, the daemon will use the syslog(3) command to write log messages, and syslog will manage log rotation.

If you pass the --log-directory switch to a daemon, the daemon will manage the log files itself. The first message received after midnight local time will cause the daemon to close the current log file, compress it, and open a new log file.

ANALYSIS

How do I import flow data into Excel?

To get SiLK Flow data into Excel, use the rwcut command to convert the binary SiLK data to a textual CSV (comma separated value) file, and import the file into Excel. You need to provide the --delimited=, --legacy-timestamps switches to rwcut. Use the --output-path=FILE.csv switch to have rwcut write its output to a file.

How do I convert packet data to flows?

See this FAQ entry.

How can I create obfuscated (anonymized) data?

The rwrandomizeip application will obfuscate the source and destination IP addresses in a SiLK data file. It can operate in one of two modes:

  1. In default mode, rwrandomizeip substitutes a pseudo-random, non-routable IP address for each source and destination IP address it sees. An IP address that appears multiple times in the input will be mapped to different output address each time, and no structural information in the input will be maintained.
  2. In consistent mode, rwrandomizeip creates four shuffle tables, each having 256 entries where the value is a pseudo-random value from 0 to 255. These tables represent the possible values for each octet in an IPv4 address. rwrandomizeip uses the tables to modify the IP addresses in a consistent way, which allows a conversation between two IP addresses to be visible in the anonymized data.
How secure is the anonymized data?

Anonymizing/Obfuscating data is hard. You should be cautious of how widely you distribute data that rwrandomizeip has processed:

  • The rwrandomizeip program only anonymizes the source and destination IP address. Any additional information in the data (such as the existence of services that run on well known ports or protocols) is still visible.
  • In consistent mode, the data is much less random, since the value in an octet is always mapped to the same value. Given the structure of IP addresses on the Internet, reversing the mapping would not be difficulte.
  • The default mode does not suffer from that problem, but you cannot do any meaningful traffic analysis on the anonymized data since the mapping is not consistent.