Use of the SiLK system and related source code is subject to the terms of the following licenses:
GNU Public License (GPL) Rights pursuant to Version 2, June 1991
Government Purpose License Rights (GPLR) pursuant to DFARS 252.225-7013 NO WARRANTY ANY INFORMATION, MATERIALS, SERVICES, INTELLECTUAL PROPERTY OR OTHER PROPERTY OR RIGHTS GRANTED OR PROVIDED BY CARNEGIE MELLON UNIVERSITY PURSUANT TO THIS LICENSE (HEREINAFTER THE "DELIVERABLES") ARE ON AN "AS-IS" BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, INFORMATIONAL CONTENT, NONINFRINGEMENT, OR ERROR-FREE OPERATION. CARNEGIE MELLON UNIVERSITY SHALL NOT BE LIABLE FOR INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES, SUCH AS LOSS OF PROFITS OR INABILITY TO USE SAID INTELLECTUAL PROPERTY, UNDER THIS LICENSE, REGARDLESS OF WHETHER SUCH PARTY WAS AWARE OF THE POSSIBILITY OF SUCH DAMAGES. LICENSEE AGREES THAT IT WILL NOT MAKE ANY WARRANTY ON BEHALF OF CARNEGIE MELLON UNIVERSITY, EXPRESS OR IMPLIED, TO ANY PERSON CONCERNING THE APPLICATION OF OR THE RESULTS TO BE OBTAINED WITH THE DELIVERABLES UNDER THIS LICENSE. Licensee hereby agrees to defend, indemnify, and hold harmless Carnegie Mellon University, its trustees, officers, employees, and agents from all claims or demands made against them (and any related losses, expenses, or attorney’s fees) arising out of, or relating to Licensee’s and/or its sub licensees’ negligent use or willful misuse of or negligent conduct or willful misconduct regarding the Software, facilities, or other rights or assistance granted by Carnegie Mellon University under this License, including, but not limited to, any claims of product liability, personal injury, death, damage to property, or violation of any laws or regulations. Carnegie Mellon University Software Engineering Institute authored documents are sponsored by the U.S. Department of Defense under Contract F19628-00-C-0003. Carnegie Mellon University retains copyrights in all material produced under this contract. The U.S. Government retains a non-exclusive, royalty-free license to publish or reproduce these documents, or allow others to do so, for U.S. Government purposes only pursuant to the copyright license under the contract clause at 252.227.7013. |
SiLK, the System for Internet-Level Knowledge, is a collection of traffic analysis tools developed by the CERT Network Situational Awareness Team (CERT NetSA) to facilitate security analysis of large networks. The SiLK tool suite supports the efficient collection, storage, and analysis of network flow data, enabling network security analysts to rapidly query large historical traffic data sets. SiLK is ideally suited for analyzing traffic on the backbone or border of a large, distributed enterprise or mid-sized ISP.
SiLK supports the collection of the following types of flow data:
This handbook provides instructions to configure and install the SiLK Collection and Analysis Suite. It is intended for individuals comfortable with the following tasks:
Additionally, if SiLK will be accepting NetFlow data from a router, the installer should be comfortable with router configuration.
In order to build SiLK, you will need to have:
To get the full functionality of SiLK, these additional libraries and their header files are recommended:
Note that many Linux systems have one package for the run-time shared libraries and another for the header files, and both must be installed when building SiLK from source. For example, to build SiLK with zlib support on a Red Hat Enterprise Linux AS release 4 system, you will need to install both the zlib-1.2.1.2-1.2 and the zlib-devel-1.2.1.2-1.2 RPMs (your version numbers may be different).
New releases of SiLK are always capable of reading SiLK Flow data files created by previous releases of SiLK, and support for nearly all other SiLK file formats is maintained in newer releases. When upgrading to a new release of SiLK in an enterprise that uses separate collection, packing, and analysis machines, you should upgrade the analysis host(s) first, then the packing host(s), and finally the collectors. You may also choose to only upgrade the analysis hosts, and leave the packing and collection hosts at previous releases.
In addition, note that any change to the SiLK file formats will always require a change in the minor version number of SiLK (the SiLK version number follows the pattern major.minor.revision). Practically, this means that you can upgrade a collection machine to a newer release, say SiLK-0.13.9, and yet maintain the packing machines at an older release, SiLK-0.13.2. (These version numbers are for illustrative purposes only.) However, a bump in the minor version number does not always signal a change to the SiLK file formats. An analysis host at SiLK-0.13.2 may be able to read files created by SiLK-0.14.1 on the packing host; it depends on whether the SiLK file formats changed at SiLK-0.14.0. Changes to the SiLK file formats are always documented in the release notes, which are included in the source distribution and are available on the web site (http://tools.netsa.cert.org/silk/).
There are two categories of applications that comprise a SiLK installation:
Installation of the analysis tools is relatively straightforward since they are installed on systems that have direct access to the SiLK data files and require little configuration.
Installing the packing tools is more complex: the tools run as background processes (with every operating system having a unique way to start these processes) that must cooperate with each other and with additional software and/or network devices. The packing tools are designed to provide a great amount of flexibility in their installation, and with this flexibility comes additional complexity. The tools that make up the SiLK packing system are:
There are several possible configurations of the SiLK system which are introduced in this chapter. The detailed installation instructions are presented in subsequent chapters. In the subsections that follow, the term “remote” is with respect to the machine where rwflowpack is running.
In the single machine (all-in-one) configuration, all processing occurs on a single machine: You configure the rwflowpack program to collect flows, convert them to the SiLK Flow format, categorize them, and store the SiLK Flow records to the local disk. The analysis tools are installed on this same machine and read the files from local disk. Figure 1.1 shows how this configuration would look when flows are collected from a NetFlow router, and Figure 1.2 shows this configuration when the YAF flow collector is used.
This is the simplest complete installation. To use it, follow the instructions in Section 2 to configure and build the source code, Section 3 to customize the analysis tools, and Section 4 to configure rwflowpack.
It is not uncommon to have a situation in which the sensor(s) generating the flow records are not close to the data storage location. You could configure the flow generators to send the data to the data storage location; however, due to network reliability and bandwidth issues, it is desirable to collect flow data as close to where it is produced as possible. (This is especially true if the flow generator uses an unreliable transport protocol, such as UDP-based NetFlow generated by a router.) In these situations, the flowcap daemon can be installed on a machine close to the sensor where it will collect, compress, and forward the data to rwflowpack for packing.
Also, suppose the machine where rwflowpack is running is not the same machine on which you are storing the SiLK Flow files, or perhaps you want the SiLK files to be available on multiple machines for use by groups of analysts. In such cases, you configure rwflowpack to write the SiLK Flows into small files called incremental files, and these incremental files are distributed over the network to machine(s) where the rwflowappend daemon writes the SiLK Flow records to their final location. The analysis tools read the records from this final location.
This configuration is the most complex and it is illustrated in Figure 1.3 collecting NetFlow. When the YAF flow collector is used, the top third of the drawing would resemble Figure 1.4.
In this configuration, the rwsender and rwreceiver daemons transfer files between the machines. rwsender monitors a directory and transfers the files it finds there to one or more rwreceivers on the downstream side. rwreceiver accepts files from one or more rwsenders and places the files into a directory where the next tool in the packing chain can process them.
rwsender and rwreceiver only transfer files; they do not consider the contents of the files. Instead of using rwsender and rwreceiver, you could (with some stipulations) use other software, such as rsync or scp, to transfer the files between the machines.
If this describes your installation, follow the instructions in Section 2 to install SiLK on each machine, in Section 3 to customize the analysis tools on each machine where analysis occurs, and in Section 5 to configure the daemons on all the machines where the packing tools run.
This configuration is a subset of the previous one: flowcap is used to capture the flows near the point where they are generated, and the rwsender and rwreceiver daemons transfer the flows to the machine where rwflowpack packs them and the analysis tools process them. Figure 1.5 depicts this configuration with a NetFlow router. When a YAF sensor is used, the top half of the figure would be replaced with Figure 1.4.
This installation will largely follow the same instructions as those described previously; however, the configuration of rwflowpack is slightly different as described in Section 6. That section will refer you to the parts of Section 5 you must follow to configure flowcap. You will use Section 3 to configure the the analysis tools on the machine where rwflowpack is installed.
This configuration, shown in Figure 1.6, is also a subset of that described in Section 1.3.2, except that rwflowpack is used to collect the flows instead of flowcap.
For this configuration, you will install the source code on the packing machine and the analysis machine (Section 2), customize the analysis tools on the machine where rwflowappend is to run (Section 3), and configure rwflowpack and rwflowappend (Section 7).
Finally, if you only plan to use the software to analyze existing SiLK Flow files and/or packet capture (pcap) data such as that created by tcpdump, you would use this configuration (Figure 1.7). For this configuration, you need to build the source code (Section 2) and customize the analysis tools (Section 3).
The instructions in the next two sections of this handbook will allow you to use SiLK to analyze existing SiLK files and analyze packet capture (pcap) data such as that created by tcpdump: Section 2 describes how to configure and install the SiLK software from source, and Section 3 describes how to customize the analysis tools to get the most use from the system.
The other sections of the handbook describe how to use SiLK to capture flow data, categorize the flows as incoming or outgoing, convert the data to the SiLK format, and store the SiLK Flows in binary flat files indexed by hour, sensor, and direction: The simplest configuration is the Single machine configuration (Section 4), where one machine collects the flow records, packs them, and stores them locally for use by the analysis tools. Having collection, categorization, and storage on separate machines is the most complex configuration (Section 5), and other configurations are possible (Sections 6 and 7).
Section 8 describes how to configure the flow generator to send its data to the SiLK collector(s).
To assist you in the configuration process, Appendix A describes how SiLK categorizes flows as incoming or outgoing (including a description of the data storage hierarchy), and Appendix B provides instructions on how to collect NetFlow data from the router and use that data as part of the configuration.
This handbook describes the installation of SiLK. For a discussion of the analysis tools, see their individual manual pages, the complete set of manual pages in The SiLK Reference Guide, and the tutorial information in Using SiLK for Network Traffic Analysis: Analysts’ Handbook. These documents are available at http://tools.netsa.cert.org/silk/silk_docs.html.
In this section you will
(You may need to become the root user to install the software.)
You may continue to Section 3.
Note: As of SiLK-1.0.0, you no longer specify the site when you configure the software since the packing logic is (normally) determined by a run-time plug-in loaded by rwflowpack.
Download and unpack the source code distribution:
For the remainder of these instructions, the full path to the top of the source tree (i.e., the silk-1.0.1 directory, which contains the configure file) will be referred to as $SUITEROOT; it may be set in your (Bourne-compatible) shell by entering the command:
You should decide where to install the tools and where your SiLK Flow data files will reside, and specify this information to the configure script. Some of these locations are compiled into the code, and others are used to initialize the start-up scripts and configuration files for rwflowpack and the other packing tool daemons.
This value will be compiled into the analysis tools, and it will be the default location that rwfilter uses when looking for the hourly data files. This directory must be accessible by the final program in the packing chain (typically rwflowpack) which writes the packed SiLK flow files and by the analysis machine(s) which reads them. The path to the directory tree can be different on the analysis and packing machines, as long as the actual physical location is the same.
When running the tools, the value of the SILK_DATA_ROOTDIR environment variable will override this compiled-in value. In addition, rwfilter allows you to override this value with the --data-rootdir switch.
For historical reasons, the default value for this location is /data. We use a separate disk for the SiLK flow data since the space it requires can be large and depends on the size of the monitored network, the amount of traffic the network sees, and the aging policy for historical data.
The following table shows the subdirectories of $SILK_PATH where files are normally installed, but you can change these by specifying switches to configure. Use configure’s --help switch to see the full list of directory choices.
| bin | analysis tools, such as rwfilter |
| sbin | system administrator tools, for example rwflowpack |
| share/man | manual pages |
| lib/silk | optional plug-in support, such as the country code support |
| share/silk | support files, such as the country-code mapping file |
| share/silk/etc | sample configuration files and scripts to assist the system administrator in running the packing system daemons |
| etc | configuration files used by the packing system daemons (see SCRIPT_CONFIG_LOCATION below) |
| var | directory root used by packing tools (see DAEMON_STATE_DIRECTORY below) |
| var/log | log files generated by the packing system daemons |
| var/lib | incomplete data files generated by the packing tools and files awaiting processing |
| lib | libraries required to run the tools and used to build end-user plug-ins |
| include/silk | header files used to build end-user plug-ins |
Note: The applications work best when they have access to configuration files and plug-ins, and the code that searches for these files depend on the directory tree as it will be upon installation. If you do not plan to use the tools outside of your own tree, you may want to specify --prefix=‘pwd‘ (note the back quotes) to the configure script. When you run make install, the tools will be installed into the top of the source tree.
To adapt the source code to your operating system and environment, the configure shell script will run several tests to check for various features. By giving command line switches to configure, you can include additional features or instruct configure to use libraries from particular locations. You can also control where SiLK will be installed. You can display the full list of switches that configure accepts by running configure --help. The remainder of this section describes many of these switches.
SiLK-1.0 provides support for accessing SiLK from within Python and for using Python code as part of an rwfilter invocation. This support is called PySiLK and it requires Python 2.4 or later.
To include PySiLK support, you must provide the --with-python switch to configure. To use a particular Python interpreter, you may use --with-python=path . For information on using PySiLK, see the rwfilter man page and SiLK in Python, available from http://tools.netsa.cert.org/silk/silk_docs.html.
(Unless Python and SiLK are installed in the same directory, you will need to follow the instructions in 3.2 to allow Python to find the PySiLK modules.)
Some SiLK applications have been modified to support handling IPv6 addresses. To enable this behavior, specify the --enable-ipv6 switch on the configure command line. Currently, SiLK supports collecting IPv6 data from IPFIX data, which requires that you build and install libfixbuf v0.7.3 (see 2.3.5) or later before installing SiLK.
To reduce the size of the data files, the rwflowpack daemon and many analysis tools have the ability to use an external library to automatically compress their binary output when writing and uncompress their input when reading. (This compression occurs on the ‘data’ section of the file; the file’s header remains uncompressed.) You can specify whether a particular tool uses this external compression via a switch on the tool’s command line. The default setting for this behavior is determined by the --enable-output-compression=type switch to configure. SiLK supports the following parameters to the switch:
| none | use no compression; this is the default |
| zlib | use the widely available zlib general compression library |
| lzo1x | use the LZO (LZO 1.08 or LZO 2.02) real-time data compression library |
The latter two options require the support of external libraries as described next.
The configure script will attempt to find the zlib general compression library and its header file. Specifying the --with-zlib=dir switch tells configure that the header and library are located in dir/include/zlib.h and dir/lib/libz.a, respectively.
Note: Several operating system vendors distribute the libraries and header files in separate packages. To take zlib on RedHat as an example, the zlib package contains the zlib library, and the header file (and manual page) is in the separate zlib-devel package. In order to build SiLK from source, you need to have both packages installed.
The configure script will also attempt to find the LZO ( http://www.oberhumer.com/opensource/lzo/) real-time data compression library and headers. SiLK will work with either LZO 1.08 or LZO 2.02. You may use the --with-lzo=dir switch to specify the location of LZO.
If SiLK is compiled with libfixbuf support, the SiLK packer can read flow data generated by an IPFIX (Internet Protocol Flow Information eXport) compliant flow generator such as the YAF v1.0 flow sensor technology ( http://tools.netsa.cert.org/yaf/). (libfixbuf is not part of SiLK, you must download it from http://tools.netsa.cert.org/fixbuf/and install it prior to installing SiLK. To use this feature, SiLK requires libfixbuf-0.7.3 or later.)
In addition, if configure finds libfixbuf, the rwipfix2silk and rwsilk2ipfix command line tools will be built. These tools support converting between the SiLK Flow record format and IPFIX.
When libfixbuf support is included, the SiLK data files contain additional information: the TCP flags are broken into two fields, one containing the flags on the first packet of a flow and the other containing the flags on all other packets in the flow. This feature is automatically enabled when libfixbuf is found. Specifying --enable-initial-tcpflags also enables this feature, but note that the separate TCP flag fields will only contain valid values when used with the enhanced flow collection software.
The configure script will look for the pkg-config(1) specification file for libfixbuf (libfixbuf.pc) in the standard pkg-config directories, and if libfixbuf is installed in a standard location, configure should be able to locate it. If you have installed libfixbuf but configure does not find it, you can run configure with the --with-libfixbuf=dir switch to add the directory dir to pkg-config’s search path (configure will add dir to the PKG_CONFIG_PATH environment variable). The libfixbuf.pc file is normally installed in the lib/pkgconfig subdirectory of the location where libfixbuf was installed.
As of SiLK-1.0.0, the packing logic used by rwflowpack to categorize flow records as incoming or outgoing, web or non-web, et cetera, is determined by plug-in that is loaded when rwflowpack is invoked. The name of this plug-in must be passed to rwflowpack via the --packing-logic switch.
Using a plug-in for flow categorization makes it easier to change the packing logic or to test new categorization schemes. However, it requires that the plug-in be available and that you not have disabled plug-in support by building statically-linked applications (Section 2.3.8).
If you wish to compile the packing-logic into rwflowpack, you must specify the --enable-packing-logic switch when you run configure. The argument to this switch is the C source file containing the packing logic to use for this SiLK installation. For example, if you wish to use the twoway packing logic described in Appendix A, run
All of the SiLK applications (i.e., both the analysis tools and the packing [flow collection and storage] daemons) and their associated manual pages will be built and installed unless the --disable-packing-tools or --disable-analysis-tools switches are passed to configure. You can speed the building of the software if you disable the parts of the system you do not require. For example, a remote collection machine does not need the analysis tools (though they can be useful to have for debugging).
The configure script will build SiLK with support for dynamic-linking, where the common library functions of SiLK are maintained in separate files that the operating system automatically loads when you invoke an application. (The alternative is called static-linking.) While dynamic-linking allows the kernel to maintain one image of the library for simultaneous invocations of SiLK tools, it makes moving the binaries almost impossible since the libraries must move as well, and often the binaries are configured to look in a particular location for the libraries.
If you wish to build without dynamic-linking support, give configure the --enable-static-applications switch, which forces the applications to be statically linked. However, this may result in some plug-ins not working correctly.
An alternative is to specify the --disable-shared switch to configure, but note that this results in the plug-ins not being compiled at all.
If you specify --enable-static-applications or --disable-shard to configure, you also need to specify the --enable-packing-logic switch since rwflowpack will not be able to load the packing logic as a plug-in. See Section 2.3.6 for a description of the --enable-packing-logic switch and the argument the switch requires.
If SiLK is compiled with GnuTLS support, the communication between rwsender and rwreceiver can be encrypted and authenticated once the appropriate certificates have been created and distributed. GnuTLS is the GNU Project’s Transport Layer Security Library, and it is available from http://www.gnu.org/software/gnutls/. Note that SiLK requires GnuTLS v1.4.1 or greater.
The configure script will look for the pkg-config(1) specification file for GnuTLS (gnutls.pc) in the standard pkg-config directories, and if GnuTLS is installed in a standard location, configure should be able to locate it. If you have installed GnuTLS but configure does not find it, you can run configure with the --with-gnutls=dir switch to add the directory dir to pkg-config’s search path (configure will add dir to the PKG_CONFIG_PATH environment variable). The gnutls.pc file is normally installed in the lib/pkgconfig subdirectory of the location where GnuTLS was installed.
By default, SiLK uses UTC when printing timestamps to the user, and it expects timestamps from the user to be in UTC. Giving configure the --enable-localtime switch will modify SiLK to print and expect times in the local timezone. (Data files are always indexed by UTC.)
The configure script will attempt to locate the pcap library and header files. If they are not found or if they do not have the required functions, SiLK will be built without support for the packet-flow conversion tools rwptoflow and rwpmatch.
If you wish to specify that SiLK use a particular version of the pcap library, pass the --with-pcap=dir switch to configure, where dir contains include/pcap.h and lib/libpcap.a (or a shared version of the library).
If SiLK is compiled with libipa support, the rwipaimport and rwipaexport programs will be compiled. These tools interact with an IPA (IP Association) database, which stores information about IP addresses. rwipaimport takes an existing SiLK IPset, Bag, or Prefix Map and stores it in the database; rwipaexport reads data from the IPA database to create a SiLK IPset, Bag, or Prefix Map. libipa is a separate library available from http://tools.netsa.cert.org/ipa/. SiLK-1.0 requires libipa-0.3.0 or greater.
The configure script will look for the pkg-config(1) specification file for libipa (libipa.pc) in the standard pkg-config directories, and if libipa is installed in a standard location, configure should be able to locate it. If you have installed libipa but configure does not find it, you can run configure with the --with-libipa=dir switch to add the directory dir to pkg-config’s search path (configure will add dir to the PKG_CONFIG_PATH environment variable). The libipa.pc file is normally installed in the lib/pkgconfig subdirectory of the location where libipa was installed.
By default, SiLK is built with full optimization (assuming the compiler accepts -O3 for optimization), with no debugging, and with assert()s disabled. Pass the --disable-optimization, --enable-debugging, and --enable-assert switches to configure to modify these settings. If your compiler uses a different switch to enable optimization (such as -x04 for Solaris’ cc), you may specify it with --enable-optimization=-x04.
You will need to configure the source code for each machine that runs any part of the SiLK Collection and Analysis Suite.
Run the configure script to configure the SiLK source code. The following command would configure the software to use /data as the location of the data repository and to expect to be installed into the /usr/local directory:
Consult the previous section for additional switches that you may need or wish to pass to configure to help it find a library or to enable an optional feature.
configure will run several tests on your platform and use the results of these tests to create several files. When configure has finished, it will print a summary of how it has configured the SiLK source code:
The above message is also written to the silk-summary.txt file in the directory where you ran configure.
Verify that the configuration matches your expectations. The configure script does not complain when it is given a switch it does not recognize, which makes it easy for a simple “typo” to go unnoticed.
To build SiLK, simply type make from the top of the source tree:
You can then install the software. Depending on where you chose to install, you may need to become the root user first. This command will install the applications, the support libraries, the plug-ins, and the manual pages:
This section describes the customization of the analysis tools. The manual page for each tool will be installed under $SILK_PATH/share/man/man1/ when you install SiLK. In addition, http://tools.netsa.cert.org/silk/silk_docs.html provides the manual pages as individual web pages and as a single volume in The SiLK Reference Guide. The web site also contains a tutorial on using the analysis suite: Using SiLK for Network Traffic Analysis: Analysts’ Handbook.
While nothing in this section is required to use SiLK, these steps will enhance the utility of the software.
In addition to the information contained in the NetFlow or IPFIX flow record (e.g., source and destination addresses and ports, IP protocol, time stamps, data volume), every SiLK flow record has two additional pieces of information:
The purpose of the SiLK site configuration file, silk.conf, is to define the sensors, classes, and types to use when packing and accessing the SiLK flow data. The first time you install SiLK, and any time you add new sensors (IPFIX or NetFlow generators) to a deployment, you will need to update silk.conf.
Note: If you are upgrading from SiLK-0.11.x to SiLK-1.0, no changes to the silk.conf file are required, though you may want to read about the new packing-logic statement that silk.conf supports.
Once you have made the changes, rename the file silk.conf and save it in the root of your data repository, normally /data.
You may continue to Section 3.2.
When you install SiLK, sample site configuration files are installed in $SILK_PATH/share/silk/SITE-silk.conf. The various files provide different sets of classes and types, and must coordinate with the packing rules that you will use at your site. For information on the twoway and generic site files, see Appendix A. We recommend use of the twoway-silk.conf file.
Copy the twoway-silk.conf file to a temporary location, renaming the file silk.conf, and open silk.conf in a text editor. If you are using the twoway-silk.conf file, you will see the following near the beginning of the file:
Each line of form
defines a sensor, where
As distributed, the twoway-silk.conf is configured with 15 sensors having names S0, S1, through S14. (If you have 15 or fewer sensors and these names are satisfactory, you may save the silk.conf file to the root of your data repository, typically /data, and skip ahead to Section 3.2.)
You may add, remove, or rename the sensors. Often the sensor names reflect the location of a router or the ISP the router connects to. There are some important things to keep in mind when modifying the list of sensors:
Once you have edited the sensor definitions, you must update the sensors command in the same file (line 10) to contain the list of sensor names.
For example, if you had three routers Alpha, Bravo, and Charlie you would edit the site configuration file to read: