SiLK is a suite of network traffic collection and analysis tools developed and maintained by the CERT Network Situational Awareness Team (CERT NetSA) at Carnegie Mellon University to facilitate security analysis of large networks. The SiLK tool suite supports the efficient collection, storage, and analysis of network flow data, enabling network security analysts to rapidly query large historical traffic data sets.
SiLK 2.x has IPv6 support in the tools listed below. The next major release of SiLK will have IPv6 support nearly everywhere. To use IPv6, SiLK must be configured for IPv6 by specifying the --enable-ipv6 switch to the configure script when you are building SiLK. See the Installation Handbook for details.
| flowcap | rwflowappend | rwsilk2ipfix |
| rwappend | rwflowpack | rwsort |
| rwcat | rwgroup | rwstats |
| rwcount | rwipfix2silk | rwtuc |
| rwcut | rwnetmask | rwuniq |
| rwfilter |
SiLK should run on most UNIX-like operating systems. It is most heavily tested on Linux, Solaris, and Mac OS X.
SiLK is released under two licenses:
The applications that make up the packing system (flowcap, rwflowpack, rwflowappend, rwsender, and rwreceiver) write error messages to log files. The location of these log files is set when the daemon is started, with the default location being /usr/local/var/silk.
All other applications write error messages to the standard error (stderr).
Your primary support person should be the person or group that
installs and maintains SiLK at your site. You may also send
email to
.
You may email a detailed bug report to
. Your bug
report should include:
You can help us help you by writing an effective bug report.
We welcome bug fixes and patches. You may send them to
.
The BibTeX entry format would be:
@MISC{SiLK,
author = "{CERT/NetSA at Carnegie Mellon University}",
title = "{SiLK (System for Internet-Level Knowledge)}",
howpublished = "[Online]. Available:
\url{http://tools.netsa.cert.org/silk}.",
note = "[Accessed: July 13, 2009]"}
Update the "Accessed" date to the day you accessed the SiLK
website, and then you can cite the software in a LaTeX document
using \cite{SiLK}.
The final output should look like this:
CERT/NetSA at Carnegie Mellon University. SiLK (System for Internet-Level Knowledge). [Online]. Available: http://tools.netsa.cert.org/silk. [Accessed: July 13, 2009].
(Taken from Chapter 2 of the SiLK Analysts' Handbook .) NetFlow is a traffic-summarization format that was first implemented by Cisco Systems, primarily for billing purposes. Network flow data (or Network flow) is a generalization of NetFlow.
Network flow collection differs from direct packet capture, such as tcpdump, in that it builds a summary of communications between sources and destinations on a network. This summary covers all traffic matching seven particular keys that are relevant for addressing: the source and destination IP addresses, the source and destination ports, the protocol type, the type of service, and the interface on the router. We use five of these attributes to constitute the flow label in SiLK: the source and destination addresses, the source and destination ports, and the protocol. These attributes (sometimes called the 5-tuple), together with the start time of each network flow, distinguish network flows from each other.
A network flow often covers multiple packets, which are grouped together under a common flow label. A flow record thus provides the label and statistics on the packets that the network flow covers, including the number of packets covered by the flow, the total number of bytes, and the duration and timing of those packets. Because network flow is a summary of traffic, it does not contain packet payload data.
SiLK accepts flows in the NetFlow v5 format from a router. These flows are sometimes called Protocol Data Units (PDU). You can also find software that will generate NetFlow v5 records from various types of input.
When compiled with libfixbuf support, SiLK can accept NetFlow v9 and flows in the IPFIX (Internet Protocol Flow Information eXport) format. You can use the yaf flow meter to generate IPFIX flows from libpcap (tcpdump) data or by live capture.
The definition of NetFlow v5 format is available in the following tables copied from Cisco (October 2009). A NetFlow v5 packet has a 24 byte header and up to thirty 48 byte records. The maximum NetFlow v5 packet is 1464 bytes. The NetFlow v5 header and record formats are specified in the following tables. The record table also lists the SiLK field name, where applicable, but note that SiLK packs the fields differently than NetFlow.
| Bytes | Contents | Description |
|---|---|---|
| 0-1 | version | NetFlow export format version number |
| 2-3 | count | Number of flows exported in this packet (1-30) |
| 4-7 | SysUptime | Current time in milliseconds since the export device booted |
| 8-11 | unix_secs | Current count of seconds since 0000 UTC 1970 |
| 12-15 | unix_nsecs | Residual nanoseconds since 0000 UTC 1970 |
| 16-19 | flow_sequence | Sequence counter of total flows seen |
| 20 | engine_type | Type of flow-switching engine |
| 21 | engine_id | Slot number of the flow-switching engine |
| 22-23 | sampling_interval | First two bits hold the sampling mode; remaining 14 bits hold value of sampling interval |
| Bytes | Contents | Description | SiLK Field |
|---|---|---|---|
| 0-3 | srcaddr | Source IP address | sIP |
| 4-7 | dstaddr | Destination IP address | dIP |
| 8-11 | nexthop | IP address of next hop router | nhIP |
| 12-13 | input | SNMP index of input interface | in |
| 14-15 | output | SNMP index of output interface | out |
| 16-19 | dPkts | Packets in the flow | packets |
| 20-23 | dOctets | Total number of Layer 3 bytes in the packets of the flow | bytes |
| 24-27 | First | SysUptime at start of flow | sTime |
| 28-31 | Last | SysUptime at the time the last packet of the flow was received | eTime |
| 32-33 | srcport | TCP/UDP source port number or equivalent | sPort |
| 34-35 | dstport | TCP/UDP destination port number or equivalent | dPort |
| 36 | pad1 | Unused (zero) bytes | - |
| 37 | tcp_flags | Cumulative OR of TCP flags | flags |
| 38 | prot | IP protocol type (for example, TCP = 6; UDP = 17) | protocol |
| 39 | tos | IP type of service (ToS) | n/a |
| 40-41 | src_as | Autonomous system number of the source, either origin or peer | n/a |
| 42-43 | dst_as | Autonomous system number of the destination, either origin or peer | n/a |
| 44 | src_mask | Source address prefix mask bits | n/a |
| 45 | dst_mask | Destination address prefix mask bits | n/a |
| 46-47 | pad2 | Unused (zero) bytes | - |
IPFIX is the Internet Protocol Flow Information eXport format. Based on the NetFlow v9 format from CISCO, IPFIX is the draft IETF standard for representing flow data. The rwipfix2silk and rwsilk2ipfix programs in SiLK---which are available when SiLK has been configured with libfixbuf support---will convert between the SiLK Flow format and the IPFIX format.
For input, the IPFIX information elements supported by SiLK are listed in the following table. (The SiLK tools that read IPFIX are flowcap, rwflowpack, and rwipfix2silk.) Elements marked with "(P)" are defined in CERT's Private Enterprise space, PEN 6871. The third column denotes whether the element is reversible. Internally, SiLK stores flow duration instead of end time.
| IPFIX Element (ID) | IE Length (octets) |
Rev | SiLK Field |
|---|---|---|---|
|
octetDeltaCount (1) octetTotalCount (85) |
8 8 |
R R |
bytes |
|
packetDeltaCount (2) packetTotalCount (86) |
8 8 |
R R |
packets |
| protocolIdentifier (4) | 1 | protocol | |
| tcpControlBits (6) | 1 | R | flags |
| sourceTransportPort (7) | 2 | sPort | |
|
sourceIPv4Address (8) sourceIPv6Address (27) |
4 16 |
sIP | |
|
ingressInterface (10) vlanId (58) |
4 2 |
R |
in |
| destinationTransportPort (11) | 2 | dPort | |
|
destinationIPv4Address (12) destinationIPv6Address (30) |
4 16 |
dIP | |
|
egressInterface (14) postVlanId (59) |
4 2 |
R |
out |
|
ipNextHopIPv4Address (15) ipNextHopIPv6Address (62) |
4 16 |
nhIP | |
|
flowEndSysUpTime (21) flowEndSeconds (151) flowEndMilliseconds (153) flowEndMicroseconds (155) flowEndDeltaMicroseconds (159) flowDurationMilliseconds (161) flowDurationMicroseconds (162) |
4 4 8 8 4 4 4 |
duration | |
|
flowStartSysUpTime (22) flowStartSeconds (150) flowStartMilliseconds (152) flowStartMicroseconds (154) flowStartDeltaMicroseconds (158) systemInitTimeMilliseconds (160) reverseFlowDeltaMilliseconds (P, 21) |
4 4 8 8 4 8 4 |
sTime | |
|
flowEndReason (136) silkTCPState (P, 32) |
1 1 |
attributes | |
| initialTCPFlags (P, 14) | 1 | R | initialFlags |
| unionTCPFlags (P, 15) | 1 | R | sessionFlags |
| silkFlowType (P, 30) | 1 | class & type | |
| silkFlowSensor (P, 31) | 2 | sensor | |
| silkAppLabel (P, 33) | 2 | application |
On output, rwsilk2ipfix writes the IPFIX information elements specified in the following table when producing IPFIX from SiLK flow records. The output includes both IPv4 and IPv6 addresses, but only one set of IP addresses will contain valid values; the other set will contain only 0s. Elements marked "(P)" are defined in CERT's Private Enterprise space, PEN 6871.
| SiLK Field | IPFIX Element (ID) | IE Length (Octets) |
|---|---|---|
| sTime | flowStartMilliseconds (150) | 8 |
| sTime + duration | flowEndMilliseconds (151) | 8 |
| sIP | sourceIPv6Address (27) | 16 |
| dIP | destinationIPv6Address (30) | 16 |
| sIP | sourceIPv4Address (8) | 4 |
| dIP | destinationIPv4Address (12) | 4 |
| sPort | sourceTransportPort (7) | 2 |
| dPort | destinationTransportPort (11) | 2 |
| nhIP | ipNextHopIPv4Address (15) | 4 |
| nhIP | ipNextHopIPv6Address (62) | 16 |
| in | ingressInterface (10) | 4 |
| out | egressInterface (14) | 4 |
| packets | packetDeltaCount (2) | 8 |
| bytes | octetDeltaCount (1) | 8 |
| protocol | protocolIdentifier (4) | 1 |
| class & type | silkFlowType (P, 30) | 1 |
| sensor | silkFlowSensor (P, 31) | 2 |
| flags | tcpControlBits (6) | 1 |
| initialFlags | initialTCPFlags (P, 14) | 1 |
| sessionFlags | unionTCPFlags (P, 15) | 1 |
| attributes | silkTCPState (P, 32) | 1 |
| application | silkAppLabel (P, 33) | 2 |
| - | paddingOctets (210) | 6 |
SiLK's origins are in processing NetFlow v5 data, which is unidirectional. Changing SiLK to support bidirectional flows would be major change to the software. Even if SiLK supported bidirectional flows, you would still face the task of mating flows, since a site with many access points to the Internet will often display asymmetric routing (where each half of a conversion passes through different border routers).
No, SiLK does not support bidirectional flows. You will need to mate the unidirectional flows, as described in the FAQ entry How do I mate unidirectional flows to get both sides of the conversation?.
Yes you can. Please see the answer to How do I convert packet data (pcap) to flows?.
There are a variety of reasons that rwflowpack (or flowcap) may fail to receive NetFlow v9 flow records, and since NetFlow v9 uses UDP (which is a connectionless protocol), problems receiving NetFlow v9 can be hard to diagnose. Here are potential issues and solutions, from the minor to the substantial:
sensor.conf file for
rwflowpack, you may have configured the probe
as netflow, which is an alias for
netflow-v5. You must use netflow-v9
for rwflowpack to accept NetFlow v9 flow
records.
listen-on-port and
listen-as-host values in the
sensor.conf file for rwflowpack
match the ip flow-export values you used when you
configured the router.
ip flow-export
command on the router, use the IPv4 address of the host where
rwflowpack is running.
template data timeout setting
of the router.
packetTotalCount
element, which means rwflowpack treats the
record as having zero packets. (Cisco considers the missing
packetTotalCount element a low-priority bug.)
Due to the way that flow records are stored in SiLK,
rwflowpack ignores records that have a packet
count of zero. In SiLK-2.4.5, rwflowpack
would print a warning about the records it was dropping:
"Record's packet count is zero while writing to file...",
while flowcap would store the records and
ship them to rwflowpack (where they would be
ignored). In SiLK-2.4.7, records with a zero-packet count are
dropped at a different location, and there is no indication
that rwflowpack or flowcap
has ignored these records. We are currently investigating the
best way to handle these flow records.
The options templates do not affect the collection of flow data, and they are ignored by SiLK. That message is generated by libfixbuf when it ignores the options template.
The likely cause for these messages is that the flow generator is putting the number of FlowSets into the NetFlow v9 message header. According to RFC-3954, the message header is supposed to contain the number of Flow Records, not FlowSets.
Other than being a nuisance in the log file, the messages are harmless. The NetFlow v9 processing library, libfixbuf, processes the entire packet, so it is reading all the flow records, despite the header having an incorrect count.
The messages are generated by libfixbuf, and currently there is no way to suppress the messages (other than editing the libfixbuf sources yourself to comment out the message).
In our experience, the flow interfaces (or SNMP interfaces) and Next Hop IP do not provide much useful information for security analysis, and by default SiLK does not include them in our packed data files. However, if you wish to store these values or use them for debugging your packing configuration, you can instruct rwflowpack to store the SNMP interfaces and Next Hop IP by giving the it the --pack-interfaces switch.
The SiLK Flow format is capable of representing 65534 unique sensors.
Yes, a binary file produced by a SiLK application will store its format, version, byte order, and compression method near the beginning of the file (in the file's header). (You can use the rwfileinfo tool to get a description of the contents of the file's header.) Any release of SiLK that understands that file version should be able to read the file. However, note that if the file's data is compressed, the SiLK tools on the second machine must have been compiled with support for that compression library. The SiLK tools will print an error and exit if they are unable to read a file because the tool does not understand the file's format, version, or compression method.
SiLK does not use any hard-coded ports. All SiLK tools that do network communication (flowcap, rwflowpack, rwsender, and rwreceiver) have some way to specify which ports to use for communication.
When flowcap or rwflowpack collect flows from a router, you will need to open a port for UDP traffic between the router and the collection machine.
When flowcap or rwflowpack collect flows from a yaf sensor running on a different machine, you will need to open a port for TCP (or SCTP) traffic between these two machines.
Finally, when you are using flowcap on remote sensor(s) that feed data to rwflowpack running on a central data repository, you will need to open a port between each sensor and your repository. Configure flowcap or rwsender on the sensor and rwflowpack or rwreceiver on repository to use that port.
See the tools' manual pages and the Installation Handbook for details on specifying ports.
In the rwflowpack configuration file sensor.conf, a flow collection point is called a probe. In that file, you may have two sensor blocks process data collected by a single probe.
You may want to use the discard-when or
discard-unless keywords to avoid storing duplicate
flow records for each sensor, as shown in the One Probe to
Two Sensors example configuration.
The latest Open Source version of SiLK and selected previous releases are available from http://tools.netsa.cert.org/silk/download.html.
Although we would like to provide you with SiLK-3, you may access SiLK-3 only if you are an employee of the U.S. federal government.
Due to changes in the oversight of the SEI that are outside of our control, major new releases of all NetSA software are required to go through release review by the Office of the Secretary of Defense (OSD) before the software may be given to anyone who is not a federal government employee. New releases of SiLK are not available to state governments or to universities despite what you may have read elsewhere. Unfortunately, many NetSA software packages have been stuck in this process for a long time, and currently there is no estimate as to when the release review will be completed.
Because there are many configuration options for SiLK, we recommend that you build your own RPMs as described in the "Create RPMs" section of the SiLK Installation Handbook.
That said, the CERT Forensics Team has a Linux Tools Repository that includes RPMs of SiLK and other NetSA tools.
A Live CD image is also available, which contains SiLK and additional NetSA tools.
Currently, the PySiLK extension requires Python 2.x, where x is 4 or greater. The Python 3.x series is not yet supported.
This error message occurs because Python is attempting to treat the site directory in the SiLK source tree as a Python module directory. This happens when you are running Python >= 2.5, and the PYTHONPATH environment variable includes the current working directory. Examples of PYTHONPATH values that can cause this error are when the value begins or ends with a colon (':') or if any element of the value is a single period ('.').
The solution to this problem is to either unset the PYTHONPATH before running configure, or to ensure that all references to the current working directory are removed from PYTHONPATH before running configure.
This is a difficult question to answer, because there are so many variables that will affect the results.
On a beefy machine, rwfilter was invoked using the --any-addr switch to look for a /16 (IPv4-only). rwfilter was told only to print the number of records that matched---rwfilter did not produce any other output. Therefore, the times below are only for scanning the input.
rwfilter was invoked with --threads=12 to query a data store of 3260 files that contained 12.886 billion IPv4 records, and rwfilter took 19:18 minutes to run the query. That corresponds to a scan rate of 11.1 million records per second, or 0.927 million records per thread per second.
When the query was run a second time, rwfilter completed in 6:28 minutes, or 2.76 million records per thread per second. This machine has a large disk cache which is why the second run was so much faster than the first.
For another run, rwfilter was run with a single thread to query 4996 files that contained 3.27 billion IPv4 records, and rwfilter completed the query in 9:10 minutes. That is a scan rate of 5.95 million records second, which would require approximately 28 minutes to scan 10 billion records.
As seen in this simple example, there are many things that can affect performance. Some items that will affect the run time are:
As analysts, it seems we spend a lot of time waiting for rwfilter to pull data from the repository. One way to reduce the wait time is to write efficient queries. Here are some good practices to follow:
$ rwfilter --protocol=6,17 --pass=temp.rwf ... $ rwfilter --proto=6 --pass=tcp.rwf --fail=udp.rwf temp.rwf
$ rwsetbuild myips.txt myset.set $ rwfilter ... --dipset=myset.set
SiLK Flows are stored in binary files, where each file corresponds to unique class-type-sensor-hour tuple. Multiple data repositories may exist on a machine; however, rwfilter is only capable of examining a single data repository per invocation.
A default repository location is compiled into rwfilter. (This default is set by the --enable-data-rootdir=DIR switch to configure and defaults to /data). You may tell rwfilter to use a different repository by setting the SILK_DATA_ROOTDIR environment variable or specifying the --data-rootdir switch to rwfilter.
The structure of the directory tree beneath the root is determined by the path-format entry in the silk.conf file for each data repository. Traditionally, the directory structure has been /DATA_ROOTDIR/class/type/year/month/day/hourly-files
A fully-expanded, uncompressed, SiLK Flow record requires 52 bytes (this is 88 bytes for IPv6 records). These records are written by rwcat --compression=none.
Records in the SiLK data repository require less space since common attributes (sensor, class, type, hour) are stored once in the file's header. The smallest record (uncompressed) in the data repository is that representing a web flow which requires only 22 bytes.
In addition, one can enable data compression in an individual SiLK application (with the --compression-method switch) or in all SiLK applications when SiLK is configured (specify the --enable-output-compression switch when you invoke the configure script). Compression with the lzo1x algorithm reduces the overall file size by about 50%. Using zlib gives a better compression ratio, but the at the cost of access time.
The rwfileinfo command will tell you the (uncompressed) size of records in a SiLK file.
SiLK uses many different file formats: There are file formats for IPsets, for Bags, for Prefix Maps, and for SiLK Flow records. The files that contain SiLK Flow records come in several different formats as well, where the differences include whether
In addition to various file and record formats, the records in a file may be stored in big endian or little endian byte order. Finally, groups of flow records may be written as a block, where the block is compressed with the zlib or LZO compression libraries.
The recommended way to put one or more files of SiLK Flow
records into a known format is to use the rwcat tool. The rwcat
command to use is:
rwcat --compression=none --byte-order=big [--ipv4-output] FILE1 FILE2 ...
That command will produce an output stream/file having a standard SiLK header followed by 0 or more records in the format given in the following table. The length of the SiLK header is the same as the size of the records in the file.
When SiLK is not compiled with IPv6 support or the --ipv4-output switch is given, each record will be 52 bytes long, and the header is 52 bytes; otherwise each record is 88 bytes and the file's header is 88 bytes.
The other SiLK Flow file formats are only documented in the comments of the source files. See the rw*io.c files in the silk/src/libsilk directory.
| IPv4 Bytes | IPv6 Bytes | Field | Description |
|---|---|---|---|
| 0-7 | 0-7 | sTime | Flow start time as milliseconds since UNIX epoch |
| 8-11 | 8-11 | dur | Duration of flow in milliseconds (allows for a 49 day flow) |
| 12-13 | 12-13 | sPort | Source port |
| 14-15 | 14-15 | dPort | Destination port |
| 16 | 16 | protocol | IP protocol |
| 17 | 17 | class,type | Class & Type (Flowtype) value as set by SiLK packer (integer to name mapping determined by silk.conf) |
| 18-19 | 18-19 | sensor | Sensor ID as set by SiLK packer (integer to name mapping determined by silk.conf) |
| 20 | 20 | flags | Cumulative OR of all TCP flags (NetFlow flags) |
| 21 | 21 | initialFlags | TCP flags in first packet or blank |
| 22 | 22 | sessionFlags | Cumulative OR of TCP flags on all but initial packet or blank |
| 23 | 23 | attributes | Specifies various attributes of the flow record |
| 24-25 | 24-25 | application | Guess as to the content the flow. Some software that generates flow records from packet data, such as yaf, will inspect the contents of the packets that make up a flow and use traffic signatures to label the content of the flow. The application is the port number that is traditionally used for that type of traffic (see the /etc/services file on most UNIX systems). |
| 26-27 | 26-27 | n/a | Unused |
| 28-29 | 28-29 | in | Router incoming SNMP interface |
| 30-31 | 30-31 | out | Router outgoing SNMP interface |
| 32-35 | 32-35 | packets | Count of packets in the flow |
| 36-39 | 36-39 | bytes | Count of bytes on all packets in the flow |
| 40-43 | 40-55 | sIP | Source IP |
| 44-47 | 55-71 | dIP | Destination IP |
| 48-51 | 72-87 | nhIP | Router Next Hop IP |
Every binary file produced by SiLK (including flow files, IPsets, Bags) begins with a header describing the contents of the file. The header information can be displayed using the rwfileinfo utility. The remainder of this entry describes the binary header that has existed since SiLK 1.0. (This FAQ entry does not apply to the output of rwsilk2ipfix, which is an IPFIX stream.)
The header begins with 16 bytes that have well-defined values. (All values that appear in the header are in network byte order; the header is not compressed.)
| Offset | Length | Field | Description | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 4 | Magic Number | A value to identify the file as a SiLK binary file. The SiLK magic number is 0xDEADBEEF. | |||||||||
| 4 | 1 | File Flags | Bit flags describing the file. Currently one flag exists: The least significant bit will be high if the data section of the file is encoded in network (big endian) byte order, and it will be low if the data is little endian. | |||||||||
| 5 | 1 | Record Format | The format of the data section of the file; i.e., the type of data that this file contains. This will be one of the fileOutputFormats values defined in the silk_files.h header file. For a file containing IPv4 records produced by rwcat, the value is 0x16 (decimal 22, FT_RWGENERIC). For an IPv6 file, the value is 0x0C, (decimal 12, FT_RWIPV6ROUTING). | |||||||||
| 6 | 1 | File Version | This describes the overall format of the file, and it is always 0x10 (decimal 16) for any file produced by SiLK 1.0 or later. (The version of the records in the file is at byte offset 14.) | |||||||||
| 7 | 1 | Compression |
This value describes how the data section of the file is
compressed.
|
|||||||||
| 8 | 4 | SiLK Version | The version of SiLK that produced this file. This value is computed by transforming a SiLK version, X.Y.Z, as X*1,000,000 + Y*1,000 + Z. For SiLK 1.2.3, the value is 1,002,003. | |||||||||
| 12 | 2 | Record Size | Number of bytes required per record in this file. This is 52 (0x0034) for the current version of FT_RWGENERIC records, and 88 (0x0058) for the current version of FT_RWIPV6ROUTING records. For some files, this value is unused and it is set to 1. | |||||||||
| 14 | 2 | Record Version | The version of the record format used in this file. Currently this is 5 for FT_RWGENERIC records and 1 for FT_RWIPV6ROUTING records. |
Following those 16 bytes are one or more variable-length header entries; each header entry begins with two 4 bytes values: the header entry's identifier and the byte length of the header entry (this length includes the two 4 byte values). The content of the header entry follows those 8 bytes. Currently there is no restriction that a header entry begin at a particular offset. The following header entries exist:
| ID | Length | Description |
|---|---|---|
| 0 | variable | This is the final header entry, and it marks the end of the header. Every SiLK binary file contains this header entry immediately before the data section of the file. The length of this header entry will include padding so that the size of the complete file header is an integer multiple of the record size. Any padding bytes will be set to 0x00. |
| 1 | 24 | Used by the hourly files located in the data store (/data). This entry contains the starting hour, flowtype, and sensor for the records in that file. |
| 2 | variable | Contains an invocation line, like those captured by rwfilter. This header entry may appear multiple times. |
| 3 | variable | Contains an annotation that was created using the --notes-add switch on several tools. This header entry may appear multiple times. |
| 4 | variable | Used by flowcap to store the name of the probe where flow records were collected. |
| 5 | variable | Used by prefix map files to record the map-name. |
| 6 | variable | UNUSED. Reserved for use by Bag files. |
| 7 | variable | UNUSED. Reserved for use by IPset files. |
The minimum SiLK header is 24 bytes: 16 bytes of well-defined values followed by the end-of-header header entry containing no padding.
rwcat will remove all header entries from a file and leave only the end-of-header header entry, which will padded so that the entire SILK header is either 52 bytes for IPv4 (FT_RWGENERIC) files or 88 bytes for IPv6 (FT_RWIPV6ROUTING) files.
The rwsender and rwreceiver daemons are indifferent to the types of files they transfer. However, you must ensure that files are added to rwsender's incoming-directory in accordance with SiLK's directory polling logic.
The SiLK daemons that use directory polling (including rwsender) treat any file whose name does not begin with a dot and whose size is non-zero as a potential candidate for processing. To become an actual candidate for processing, the file must have the same size as on the previous directory poll. Once the file becomes an actual candidate for processing, the daemon will not notice if the file's size and/or timestamp changes.
To work with directory polling, SiLK daemons that write files normally create a zero length placeholder file, create a working file whose name begins with a dot followed by the name of the placeholder file, write the data into the working file, and replace the placeholder file with the working file once writing is complete.
Any process that follows a similar procedure will interoperate correctly with SiLK. Any that does not risks having its files removed out from under it.
The yaf daemon does not follow this procedure; instead, it uses .lock files. When yaf is invoked with the --lock switch, it creates a flows.yaf.lock file while it is writing data to flows.yaf, and yaf removes flows.yaf.lock once it closes flows.yaf.
For yaf and rwsender to interoperate correctly, an intermediate process is required. The suggested process is the filedaemon program that comes as part of the libairframe library that is bundled with yaf. filedaemon supports the .lock extension, and it can move the completed files from yaf's output directory to rwsender's incoming directory. The important parts of tool chain resemble:
Tell yaf to use the .lock suffix, and rotate files every 900 seconds:
yaf --out /var/yaf/output/foo --lock --rotate 900 ...
Have filedaemon watch that directory, respect *.lock files, move the files it processes to /var/rwsender/incoming, and run the "no-op" command /bin/true on those files:
filedaemon --in '/var/yaf/output/foo*yaf' --lock \
--next /var/rwsender/incoming ... \
-- /bin/true
Tell rwsender to watch filedaemon's next directory:
rwsender --incoming-directory /var/rwsender/incoming ...
There are many factors that determine the amount of space required, including (1) the size of the link being monitored, (2) the link's utilization, (3) the type of traffic being collected and stored (NetFlow-v5, IPFIX-IPv4, or IPFIX-IPv6), (4) the amount of legacy data to store, and (5) the number of flows records generated from the data. The SiLK Provisioning Spreadsheet allows one to see how modifying the first four factors affects the disk space required. (The spreadsheet specifies a value for the fifth factor based on our experience.)
The factors that affect the bandwidth required by rwsender to transfer to the storage center flows collected by a flowcap daemon running near a sensor are nearly identical to those that determine the amount of disk space required (see previous entry). The SiLK Provisioning Spreadsheet includes bandwidth calculations.
The latency of the packing system (the time from a flow being collected to it being available for analysis in the SiLK data repository) depends on how the packing system has been configured and additional factors. It can be a few seconds for a simple configuration or a few minutes for a complex one.
Before the SiLK packing system sees the flow record, the act of generating a flow record itself involves latency. For a long-lived connection (e.g., ssh), the flow generator (a router or yaf) may generate the flow record 30 minutes after the first packets for that session were seen. The active timeout is defined as amount of time a flow generator waits before creating a flow record for an active connection.As described in the SiLK Installation Handbook, there are numerous ways the SiLK packing system can be configured. The latency will depend on the number of steps in your particular collection system.
For each type of configuration, we give a summary, a table itemizing the contributions to the total, and an explanation of those numbers.
Latency: typically small, but up to 120 seconds
| Description | Min | Max |
|---|---|---|
| rwflowpack buffering | 0 | 120 |
| TOTAL | 0 | 120 |
For a configuration where rwflowpack collects the flow records itself and packs them directly into the data repository, the latency is typically small, but with the default settings it can be as large as two minutes: As rwflowpack creates SiLK records, it buffers them in memory until it has a 64kb block of them, and then writes that block to disk. (The buffering improves performance since there is less interaction with the disk. When compression is enabled, the 64kb blocks can provide for better overall compression.)
If the flow collector is monitoring a busy link, flows arrive quickly and the 64kb buffers will fill quickly and be written to disk, making the latency small. However, on a less-busy link, the buffers will be slower to fill. In addition, depending on the flow collector's active timeout setting, the flow collector may generate flow records that have a start time in the previous hour. These flows become less frequent as time passes, slowing the rate that the 64kb buffers associated with the previous hour's files are filled.
To make certain that flows reach the disk in a timely fashion and to reduce the number of flows that would potentially be lost due to a sudden shutdown of rwflowpack, rwflowpack flushes all its open files every so often. By default, this occurs every 120 seconds. The default can be changed by specifying the --flush-timeout switch on the rwflowpack command line.
If a flow arrives just before rwflowpack flushes the file, it will appear almost instantly, so the minimum latency is 0 seconds. A flow arriving just after the files are flushed could be delayed by 120 seconds.
Latency: 30 seconds to 255 seconds or more
| Description | Min | Max |
|---|---|---|
| flowcap accumulation | 0 | 60 |
| rwsender directory polling | 15 | 30 |
| waiting for other files to be sent | 0 | d1 |
| rwsender transmission to rwreceiver | 0 | 15 |
| rwflowpack directory polling | 15 | 30 |
| waiting for other files to be packed | 0 | d2 |
| rwflowpack buffering | 0 | 120 |
| TOTAL | 30 | 255 + d1 + d2 |
When flowcap is added to the collection configuration, the latency will be larger. In this configuration, flowcap is used to collect the flows from the flow generator, an rwsender/rwreceiver pair moves the flows from flowcap to rwflowpack, and rwflowpack packs the flows and writes them to the data repository.
Once the flow collector generates the flow record, it should arrive at flowcap in negligible time. flowcap accumulates the flows into files for transport to a packing location. The files are released to rwsender once they reach a particular size or after a certain amount of time, whichever occurs first. By default, the timeout is 60 seconds; it can be specified with the --timeout switch on the flowcap command line. Decreasing the timeout has two effects:
Once flowcap releases the file of accumulated flows, it gets moved to a directory being monitored by an rwsender process. rwsender checks this directory every 15 seconds (by default) to see what files are present. (Specify the --polling-interval switch to change the setting from the default.) If a file's size has not changed since the previous check, rwsender will accept the file for sending to an rwreceiver process. In the best case, a file will be accepted in just over 15 seconds; in the worst case, it can take up to 30 seconds before the file is accepted. In addition, if the directory has a large number of files (a few thousand), the time to scan the directory and determine the size of each file will add measurable overhead to each rwsender directory poll.
Files in the rwsender queue may not be sent immediately if other files are backlogged, but that number is hard to quantify, so we define it as the delay d1. Under most circumstances, we expect this to be a few seconds at most.
Transmission of a file from rwsender to rwreceiver can be relatively quick if the network lag is low, or slow if there is high network lag. This time is hard to determine without empirical data, and it will vary as the load on the network varies. We do not have any hard data, but our past experiences on our networks say that most files from flowcap make it from rwsender to rwreceiver in less than 15 seconds.
The rwsender process may be configured to send its data to multiple rwreceivers. Although these transfers can happen simultaneously, they may add latency:
The administrator can also configure rwsender to prioritize files by filename. For example, if certain sensors contain more time-sensitive (important) data, they can be set to a higher priority. This will cause these files to "jump the queue" over other files, and it will increase the delay of the lower priority files.
After the file has arrived at rwreceiver, the file is handed off to rwflowpack via another round of directory polling. The same issues exist here that exist for rwsender:
When a single rwflowpack process is packing files from multiple flowcap processes, the directory scan overhead can become large. In addition, the value of d2 is much harder to quantify, as it is an aggregation point from multiple sensors.
Finally, there is the latency associated with rwflowpack itself, as described in the previous section.
Latency: 30 seconds to 195 seconds or more
| Description | Min | Max |
|---|---|---|
| rwflowpack accumulation | 0 | 120 |
| rwsender directory polling | 15 | 30 |
| waiting for other files to be sent | 0 | d3 |
| rwsender transmission to rwreceiver | 0 | 15 |
| rwflowappend directory polling | 15 | 30 |
| waiting for other files to be written | 0 | d4 |
| TOTAL | 30 | 195 + d3 + d4 |
Some configurations of the SiLK packing system do not use rwflowpack to write to the data repository, but instead use an rwsender/rwreceiver pair between rwflowpack and another tool that writes the SiLK flows to the data repository: rwflowappend.
In this configuration, rwflowpack collects the flows directly from the flow generator (yaf or a router) and writes the flow records to small files called "incremental" files. After some time, rwflowpack releases the incremental files to an rwsender process. rwflowpack's --flush-timeout switch controls this time, and the default is 120 seconds.
The issues that were detailed above in for rwsender/rwreceiver exist here as well, and this rwsender process is more likely to experience the issues related to handling many small files. We call time that rwsender holds the files prior to transferring to rwreceiver delay d3. The network transfer from rwsender to one or more rwreceiver processes was discussed above, and although this value is hard to quantify and can vary, we will again use 15 seconds for this delay.
rwreceiver places the incremental files into a directory that rwflowappend polls. This could add an additional 30 seconds. The time that rwflowappend holds the files prior to processing them is hard to quantify; we use d4 for this value.
Once rwflowappend begins to process an incremental file, it writes its contents to the appropriate data file in the repository, and then closes the repository file. There should be very little time required for this operation.
Latency: 60 seconds to 330 seconds or more
| Description | Min | Max |
|---|---|---|
| flowcap accumulation | 0 | 60 |
| rwsender directory polling | 15 | 30 |
| waiting for other files to be sent | 0 | d1 |
| rwsender transmission to rwreceiver | 0 | 15 |
| rwflowpack directory polling | 15 | 30 |
| waiting for other files to be packed | 0 | d2 |
| rwflowpack accumulation | 0 | 120 |
| directory polling by rwsender | 15 | 30 |
| waiting for other files to be sent | 0 | d3 |
| rwsender transmission to rwreceiver | 0 | 15 |
| rwflowappend directory polling | 15 | 30 |
| waiting for other files to be written | 0 | d4 |
| TOTAL | 60 | 330 + d1 + d2 + d3 + d4 |
For this configuration, we combine the analysis of the previous two configurations. One item to note: Since rwflowpack splits the flows it receives from flowcap into files based on the flowtype (class/type pair) and the hour, a single file rwflowpack receives from flowcap can generate many incremental files to be sent to rwflowappend.
This configuration is also subject to the "flooding" problem when processing is restarted after a stoppage.
The rwsender and rwreceiver programs can use GnuTLS to provide a secure layer over a reliable transport layer. For this support to be available, SiLK's configure script must have found v1.4.1 or later of the GnuTLS library. Using GnuTLS also requires creating certificates, which is described in an appendix of the Installation Handbook.
We recommend creating a local certificate authority (CA) file, and creating program-specific certificates signed by that local CA. The local CA and program-specific certificates are copied onto the machines where rwsender and rwreceiver are running. The local CA acts as a shared secret: it is on both machines and it is used to verify the asymmetric keys between the rwsender and rwreceiver certificates.
If someone else has access to the local CA, they would not be able to decipher the conversation, since the conversation is encrypted with a private key that was negotiated during the initialization of the TLS session.
However, anyone with access to the CA would be able to set up a new session with an rwsender (to download files) or an rwreceiver (to spoof files). The certificates should be one part of your security; additional measures (such as firewall rules) should be enabled to mitigate these issues.
When GnuTLS is not used or not available, communication between rwsender and rwreceiver has no confidentiality or integrity checking beyond that provided by standard TCP.
Legacy systems that use a direct connection between flowcap and rwflowpack have no confidentiality or integrity checking beyond that provided by standard TCP, and there is no way to secure this communication without using some outside method (such as creating an ssh tunnel).
It depends on what you mean by "sensor". If the "sensor" is the flow generator (that is, a router or an IPFIX sensor) which is communicating directly with rwflowpack, the flows are lost when the connection goes down.
To avoid this, you can run flowcap on the sensor. flowcap acts as a flow capacitor, storing flows on the sensor until the communication link between the sensor and packer is restored. Flows will still be lost if the connection between the flow generator and flowcap goes down, but by running flowcap on a machine near the flow generator (or running both on the same machine), the communication between the generator and flowcap should be more reliable, leading to fewer dropped connections.
The flowcap program cannot do this itself; however, the rwsender program can send files to multiple rwreceivers. To get the "tee" functionality, have flowcap drop its files into a directory for processing by rwsender.
The mapsid command will print all the sensors that have been defined at your site.
If you invoke a SiLK daemon with the --log-destination=syslog switch, the daemon will use the syslog(3) command to write log messages, and syslog will manage log rotation.
If you pass the --log-directory switch to a daemon, the daemon will manage the log files itself. The first message received after midnight local time will cause the daemon to close the current log file, compress it, and open a new log file.
PySiLK support involves loading several shared object files, and a misconfiguration can cause PySiLK support to be unavailable. There are several issues that may cause problems when using the --python-file switch.
The time switches on rwfilter can cause confusion. The --start-date and --end-date switches are selection switches, while the --stime, --etime, and --active-time switches are partitioning switches.
The --start-date and --end-date switches are used only to select hourly files from the data repository, and these switches cannot be used when processing files specified on the command line. The switches take a single date---with an optional hour---as an argument. Since the switches select hourly files, any precision you specify finer than the hour is ignored. The switches cause rwfilter to select hourly files between start-date and end-date inclusive. See the rwfilter manual page for what happens when only --start-date is specified.
The --stime, --etime, and --active-time switches partition flow records. The switches operate on a per-record basis, and they write the record to the --pass or --fail stream depending on the result of the test. These switches take a date-time range as an argument. --stime asks whether the flow record started within the specified range, --etime asks whether the flow record ended within the specified range, and --active-time asks whether any part of the flow record overlaps with the specified range. When a single time is given as the argument, the range contains a single millisecond. The time arguments must have at least day precision and may have up to millisecond precision. When the start of the range is more course than millisecond precision, the missing values are set to 0. When the end of the range is more more course than millisecond precision, the missing values are set to the maximum value.
To query the repository for records that were active during a particular 10 minute window, you would need to specify not only the --start-date switch for the hour but also the --active-time switch that covers the 10 minutes of interest. In addition, note that the repository stores flow records by their start-time, so when using --etime or --active-time, you may need to include the previous hour's files. Flows active during the first 10 minutes of July 2009 can be found by:
rwfilter --start-date=2009/06/30:23 --end-date=2009/07/01:00 \
--active-time=2009/07/01:00-2009/07/01:00:10 ...
To summarize, it is important to remember the distinction between selection switches and partitioning switches. rwfilter works by first determining which hourly files it needs to process, which it does using the selection switches. Once it has the files, rwfilter then goes through each flow record in the files and uses the partitioning switches to decide whether to pass or fail it.
SiLK categorizes a flow as web if the protocol is TCP and either the source port or destination port is one of 80, 443, or 8080. Since SiLK does not inspect the contents of packets, it cannot ensure that only HTTP traffic is written to this type, nor can it find HTTP traffic on other ports.
Using the default settings, rwfilter will only examine incoming data unless you specify the --types or --flowtypes switch on its command line. To have rwfilter always examine incoming and outgoing data, modify the silk.conf file at your site. Find the default-types statement in that file, and modify it to include out outweb outicmp.
To get SiLK Flow data into Excel, use the rwcut command to convert the binary SiLK data to a textual CSV (comma separated value) file, and import the file into Excel. You need to provide the --delimited=, --legacy-timestamps switches to rwcut. Use the --output-path=FILE.csv switch to have rwcut write its output to a file.
Several of the SiLK tools support extending their capabilities by writing code and including that code into the application:
The code for these extensions can be written either in C or in Python. (To use Python, SiLK must have been built with the Python extension, PySiLK. See the Installation Handbook for the instructions.)
To use C, one writes the code, compiles it into a shared object, and loads the shared object into the application using the --plugin switch. This process is documented in the silk-plugin(3) manual page.
To use Python, one writes the code and loads it into the application using the --python-file switch. This process is documented in the silkpython(3) manual page.
There are four ways to handle pcap files.
$ rwptoflow --flow-output=my-data.rwf my-data.pcap
$ yaf --silk --in=my-data.pcap --out=- | rwipfix2silk > my-data.rwfTo make this task easier, SiLK provides the rwp2yaf2silk Perl script which is a wrapper around the calls to those two tools. (For rwp2yaf2silk to work, both yaf and rwipfix2silk must be on your $PATH.)
$ rwp2yaf2silk --in=my-data.pcap --out=my-data.rwf
probe S0 ipfix
poll-directory /tmp/rwflowpack/incoming
end probe
sensor S0
ipfix-probes S0
source-network external
destination-network external
end sensor
Have yaf write the IPFIX files into the directory
specified in the sensor.conf file.
$ yaf --silk --in=my-data.pcap \
--out=/tmp/rwflowpack/incoming/my-data.yaf
The invocation of rwflowpack will resemble
$ rwflowpack --sensor-conf=sensor.conf --root-directory=/data \
--log-directory=/tmp/rwflowpack/log
Both rwp2yaf2silk and rwptoflow read a packet capture file and produce SiLK Flow records. The primary difference that rwp2yaf2silk assembles multiple packets into a single flow record, whereas rwptoflow does not; instead, it simply creates a 1-packet flow record for every packet it reads.
If both tools are available, rwp2yaf2silk is usually the better tool, but rwptoflow can be useful if you want to use the SiLK Flow records as an index into the pcap file (for example, when using rwpmatch).
Behind the scenes, rwp2yaf2silk is a Perl script that invokes the yaf and rwipfix2silk programs, so both of those programs must exist on your PATH. rwptoflow is a compiled C program that uses libpcap directly to read the pcap file.
Please see the Converting data to SiLK format entry on our Tooltips wiki.
A prefix map file in SiLK provides a label for every IPv4 address. (We have not yet extended prefix map files to support IPv6 addresses.) Use the rwpmapbuild tool to convert a text file of CIDR-block/label pairs to a binary prefix map file. The rwcut, rwfilter, rwuniq, and rwsort tools provide support for printing, partitioning by, binning by, and sorting by the labels you defined.
The rwmatch program can be used to mate flows. Create two files that contain the data you are interested in mating. Use rwsort to order the records in each file. (When matching TCP and/or UDP flows, the recommended sort order is shown below.) Run rwmatch over the sorted files to mate the flows. rwmatch writes a match parameter into the next hop IP field on each record that it matches. When using rwcut to display the output file produced by rwmatch, consider using the cutmatch.so plug-in to display the match parameter that rwmatch writes into the next hop IP field.
$ rwsort --fields=1,4,2,3,5,9 incoming.rwf > incoming-query.rwf
$ rwsort --fields=2,3,1,4,5,9 outgoing.rwf > outgoing-response.rwf
$ rwmatch --relate=1,2 --relate=4,3 --relate=2,1 --relate=3,4 \
incoming-query.rwf outgoing-response.rwf mated.rwf
$ rwcut --plugin=cutmatch.so --fields=1,3,match,2,4,5 mated.rwf
Yes, you can use the rwmatch program as described in the previous FAQ entry to mate across sensors.
The rwrandomizeip application will obfuscate the source and destination IP addresses in a SiLK data file. It can operate in one of two modes:
In addition, note that the file's header may contain information that you would rather not make public (such as a history of commands). You can use rwfileinfo to see these headers. To remove the headers, invoke rwcat on the file.
For a different approach, consider converting the data to text with rwcut, obfuscating the IPs, and then converting back to SiLK format with rwtuc. The procedure is documented in this tooltip.
Anonymizing/Obfuscating data is hard. You should be cautious of how widely you distribute data that rwrandomizeip has processed: