Background
1. What is SiLK?
2. Does SiLK support IPv6?
3. What platforms does SiLK run on?
4. What license is SiLK released under?
5. Something is not working as expected, where do I check for errors?
6. Whom do I contact for support?
7. How do I report a bug?
8. How do I contribute a patch or fix?
9. How do I reference SiLK in a publication?
Configuration
10. What is network flow data?
11. What applications and hardware can generate the flows for use in SiLK?
12. What is the NetFlow v5 format?
13. What is IPFIX?
14. What IPFIX information elements does SiLK support?
15. Does SiLK support sFlow?
16. Why does SiLK create unidirectional flows?
17. Can I make it bidirectional?
18. I have a stack of pcap (tcpdump) files, can I use SiLK to analyze them?
19. How can I process data from a Cisco ASA (Adaptive Security Appliance)?
20. Why is rwflowpack (or flowcap) ignoring NetFlow v9 flow records?
21. Why do I see the following log message in rwflowpack (or flowcap): NetFlow V9 Option Templates are NOT Supported, Flow Set was Removed.?
22. Why do I see the following log message in rwflowpack (or flowcap): NetFlow V9 Record Count Discrepancy. Reported: 1. Found: 15.?
23. Why is rwflowpack discarding the flow interfaces and Next Hop IP?
24. How many sensors does SiLK support?
25. Can I copy SiLK data between machines?
26. What ports do I need to open in a firewall?
27. How do I split flows seen by one flow meter into different sensors?
28. How do I create and use my own classes and types that can be used with a SiLK repository's storing and packing logic?
Building and Installing
29. Where can I download SiLK?
30. Where can I find RPMs for SiLK?
31. What release of Python do I need if I want to use the PySiLK extension?
32. When I configure --with-python, I get the error message warning: Not importing directory 'site': missing __init__.py. How do I fix this?
Operations
33. How long would it take to find the all the flow records to or from an IP address, when your data size is 10 billion records?
34. How can I improve the performance of the SiLK queries?
35. How are the SiLK Flow files organized and written to disk?
36. How many bytes does a single SiLK Flow record occupy on disk?
37. Where is the SiLK Flow file format documented?
38. What is the format of the header of a binary SiLK file?
39. How can I use rwsender to transfer files created by yaf?
40. How much disk do I need to store on a link of a particular size?
41. How much bandwidth will be used by rwsender?
42. What is the latency of the SiLK packing system?
43. What confidentiality and integrity properties are provided for SILK data sent across machines?
44. If communication between the sensor and the packer go down, are flows lost?
45. Can flowcap function as a "tee", both storing files and forwarding the flow stream onto some place else?
46. How do I list all sensors that are installed for a deployment?
47. How do I rotate the SiLK log files?
Analysis
48. I get an error when I try to use the --python-file switch in the SiLK analysis applications. What is wrong?
49. Someone gave me an IPset file, and my version of the IPset tools will not read the file. What is wrong?
50. What do all these time switches on rwfilter do?
51. How do the --start-date and --end-date switches on rwfilter affect which files rwfilter examines?
52. Why does --type=inweb contain non-web data?
53. How can I make rwfilter always process incoming and outgoing data?
54. Why do different installations of SiLK show different timestamps and how can I fix this?
55. How do I import flow data into Excel?
56. How can I use plug-ins (or dynamic-libraries) to extend the SiLK tools?
57. How do I convert packet data to flows?
58. What is the difference between rwp2yaf2silk and rwptoflow?
59. I have data in some other format. How do I incorporate that into SiLK?
60. How do I make lists of IP addresses and label them?
61. How do I mate unidirectional flows to get both sides of the conversation?
62. I have SiLK deployed in an asymmetric routing environment, can I mate across sensors?
63. How can I create obfuscated (anonymized) data?
64. How secure is the anonymized data?

Background

1. What is SiLK?

SiLK is a suite of network traffic collection and analysis tools developed and maintained by the CERT Network Situational Awareness Team (CERT NetSA) at Carnegie Mellon University to facilitate security analysis of large networks. The SiLK tool suite supports the efficient collection, storage, and analysis of network flow data, enabling network security analysts to rapidly query large historical traffic data sets.

2. Does SiLK support IPv6?

As of SiLK 3.0.0, IPv6 support is available in most of the SiLK tool suite, including in IPsets, Bags, and Prefix Maps. To process, store, and query IPv6 flow records, SiLK must be configured for IPv6 by specifying the --enable-ipv6 switch to the configure script when you are building SiLK. See the Installation Handbook for details. Note the following:

3. What platforms does SiLK run on?

SiLK should run on most UNIX-like operating systems. It is most heavily tested on Linux, Solaris, and Mac OS X.

4. What license is SiLK released under?

SiLK is released under two licenses:

  • GNU General Public License (GPL) Rights pursuant to Version 2, June 1991
  • Government Purpose License Rights (GPLR) pursuant to DFARS 252.227.7013

5. Something is not working as expected, where do I check for errors?

The applications that make up the packing system (flowcap, rwflowpack, rwflowappend, rwsender, and rwreceiver) write error messages to log files. The location of these log files is set when the daemon is started, with the default location being /usr/local/var/silk.

All other applications write error messages to the standard error (stderr).

6. Whom do I contact for support?

Your primary support person should be the person or group that installs and maintains SiLK at your site. You may also send email to contact_email.

In Spring 2014, the netsa-tools-discuss public mailing list was created for questions about and discussion of the NetSA tools. You may subscribe and read the archives from here.

7. How do I report a bug?

If some behavior in SiLK is different than what you expect, please write an email specifying what you did, what happened, and how that differed from what you expected. Send your email tocontact_email.

The following pieces of information may help us to diagnose the issue, and we ask that you please include them in your bug report.

  • The exact command that caused the problem. If the failing tool is part of UNIX pipe (e.g., rwfilter ... | rwuniq ...), please include the entire command since the bug may be caused by something happening upstream. You may obfuscate IP addresses or sensor names in the command, but please let us know that you have modified the command.
  • The complete error message you receive.
  • For daemons (rwflowpack, rwsender, rwreceiver, flowcap, rwflowappend, rwpollexec), please include the relevant portions of the log file or syslog entries. If the behavior is repeatable, getting it to happen while using the "debug" log-level may give additional information.
  • If the error is related to data collection in rwflowpack or flowcap, please include the portions of the sensor.conf file related to the probe or sensor that is causing problems. You may obfuscate IP addresses. Also, please mention the version of the libfixbuf library you are using.
  • The version of the tool that is causing the bug. You can determine this by running TOOL --version, e.g., rwfilter --version. Include the entire output so will we know what optional features the tool may be using.
  • If you cannot run TOOL --version or it exits without printing anything, send the output of ldd TOOL (or the ldd equivalent on your operating system).
  • If you cannot build the tool, the version of SiLK you are attempting to install and the complete error message that make gives you.
  • If the configure script fails, include the config.log file, which includes additional information as to why configure failed.
  • If the command is reading SiLK data files, the output of running rwfileinfo on those files may be helpful.
  • The operating system you are using (for example, the distribution of Linux and its version)

You can help us help you by writing an effective bug report.

8. How do I contribute a patch or fix?

We welcome bug fixes and patches. You may send them to contact_email.

9. How do I reference SiLK in a publication?

The BibTeX entry format would be:

@MISC{SiLK,
 author = "{CERT/NetSA at Carnegie Mellon University}",
 title = "{SiLK (System for Internet-Level Knowledge)}",
 howpublished = "[Online]. Available:
    \url{http://tools.netsa.cert.org/silk}.",
 note = "[Accessed: July 13, 2009]"}

Update the "Accessed" date to the day you accessed the SiLK website, and then you can cite the software in a LaTeX document using \cite{SiLK}.

The final output should look like this:

CERT/NetSA at Carnegie Mellon University. SiLK (System for Internet-Level Knowledge). [Online]. Available: http://tools.netsa.cert.org/silk. [Accessed: July 13, 2009].

Configuration

10. What is network flow data?

(Taken from Chapter 2 of the SiLK Analysts' Handbook .) NetFlow is a traffic-summarization format that was first implemented by Cisco Systems, primarily for billing purposes. Network flow data (or Network flow) is a generalization of NetFlow.

Network flow collection differs from direct packet capture, such as tcpdump, in that it builds a summary of communications between sources and destinations on a network. This summary covers all traffic matching seven particular keys that are relevant for addressing: the source and destination IP addresses, the source and destination ports, the protocol type, the type of service, and the interface on the router. We use five of these attributes to constitute the flow label in SiLK: the source and destination addresses, the source and destination ports, and the protocol. These attributes (sometimes called the 5-tuple), together with the start time of each network flow, distinguish network flows from each other.

A network flow often covers multiple packets, which are grouped together under a common flow label. A flow record thus provides the label and statistics on the packets that the network flow covers, including the number of packets covered by the flow, the total number of bytes, and the duration and timing of those packets. Because network flow is a summary of traffic, it does not contain packet payload data.

11. What applications and hardware can generate the flows for use in SiLK?

SiLK accepts flows in the NetFlow v5 format from a router. These flows are sometimes called Protocol Data Units (PDU). You can also find software that will generate NetFlow v5 records from various types of input.

When compiled with libfixbuf support, SiLK can accept NetFlow v9, flows in the IPFIX (Internet Protocol Flow Information eXport) format, and sFlow v5 records. You can use the yaf flow meter to generate IPFIX flows from libpcap (tcpdump) data or by live capture.

12. What is the NetFlow v5 format?

The definition of NetFlow v5 format is available in the following tables copied from Cisco (October 2009). A NetFlow v5 packet has a 24 byte header and up to thirty 48 byte records. The maximum NetFlow v5 packet is 1464 bytes. The NetFlow v5 header and record formats are specified in the following tables. The record table also lists the SiLK field name, where applicable, but note that SiLK packs the fields differently than NetFlow.

Count Contents Octet
Position
Octet
Length
Description
1 version 0-1 2 NetFlow export format version number
2 count 2-3 2 Number of flows exported in this packet (1-30)
3 SysUptime 4-7 4 Current time in milliseconds since the export device booted
4 unix_secs 8-11 4 Current count of seconds since 0000 UTC 1970
5 unix_nsecs 12-15 4 Residual nanoseconds since 0000 UTC 1970
6 flow_sequence 16-19 4 Sequence counter of total flows seen
7 engine_type 20 1 Type of flow-switching engine
8 engine_id 21 1 Slot number of the flow-switching engine
9 sampling_interval 22-23 2 First two bits hold the sampling mode; remaining 14 bits hold value of sampling interval
Count Contents Octet
Position
Octet
Length
Description SiLK Field
1 srcaddr 0-3 4 Source IP address sIP
2 dstaddr 4-7 4 Destination IP address dIP
3 nexthop 8-11 4 IP address of next hop router nhIP
4 input 12-13 2 SNMP index of input interface in
5 output 14-15 2 SNMP index of output interface out
6 dPkts 16-19 4 Packets in the flow packets
7 dOctets 20-23 4 Total number of Layer 3 bytes in the packets of the flow bytes
8 First 24-27 4 SysUptime at start of flow sTime
9 Last 28-31 4 SysUptime at the time the last packet of the flow was received eTime
10 srcport 32-33 2 TCP/UDP source port number or equivalent sPort
11 dstport 34-35 2 TCP/UDP destination port number or equivalent dPort
12 pad1 36 1 Unused (zero) bytes -
13 tcp_flags 37 1 Cumulative OR of TCP flags flags
14 prot 38 1 IP protocol type (for example, TCP = 6; UDP = 17) protocol
15 tos 39 1 IP type of service (ToS) n/a
16 src_as 40-41 2 Autonomous system number of the source, either origin or peer n/a
17 dst_as 42-43 2 Autonomous system number of the destination, either origin or peer n/a
18 src_mask 44 1 Source address prefix mask bits n/a
19 dst_mask 45 1 Destination address prefix mask bits n/a
20 pad2 46-47 2 Unused (zero) bytes -

13. What is IPFIX?

IPFIX is the Internet Protocol Flow Information eXport format. Based on the NetFlow v9 format from CISCO, IPFIX is the draft IETF standard for representing flow data. The rwipfix2silk and rwsilk2ipfix programs in SiLK---which are available when SiLK has been configured with libfixbuf support---will convert between the SiLK Flow format and the IPFIX format.

14. What IPFIX information elements does SiLK support?

For input, the IPFIX information elements supported by SiLK are listed in the following table. (The SiLK tools that read IPFIX are flowcap, rwflowpack, and rwipfix2silk.) Elements marked with "(P)" are defined in CERT's Private Enterprise space, PEN 6871. The third column denotes whether the element is reversible. Internally, SiLK stores flow duration instead of end time.

IPFIX information elements read by SiLK
IPFIX Element (ID) IE Length
(octets)
Rev SiLK Field
octetDeltaCount (1)
octetTotalCount (85)
initiatorOctets (231)
responderOctets (232)
8
8
8
8
R
R

bytes
packetDeltaCount (2)
packetTotalCount (86)
initiatorPackets (298)
responderPackets (299)
8
8
8
8
R
R

packets
protocolIdentifier (4) 1 protocol
tcpControlBits (6) 1 R flags
sourceTransportPort (7) 2 sPort
sourceIPv4Address (8)
sourceIPv6Address (27)
4
16
sIP
ingressInterface (10)
vlanId (58)
4
2

R
in
destinationTransportPort (11) 2 dPort
destinationIPv4Address (12)
destinationIPv6Address (28)
4
16
dIP
egressInterface (14)
postVlanId (59)
4
2

R
out
ipNextHopIPv4Address (15)
ipNextHopIPv6Address (62)
4
16
nhIP
flowEndSysUpTime (21)
flowEndSeconds (151)
flowEndMilliseconds (153)
flowEndMicroseconds (155)
flowEndDeltaMicroseconds (159)
flowDurationMilliseconds (161)
flowDurationMicroseconds (162)
4
4
8
8
4
4
4
duration
flowStartSysUpTime (22)
flowStartSeconds (150)
flowStartMilliseconds (152)
flowStartMicroseconds (154)
flowStartDeltaMicroseconds (158)
systemInitTimeMilliseconds (160)
reverseFlowDeltaMilliseconds (P, 21)
4
4
8
8
4
8
4
sTime
flowEndReason (136)
silkTCPState (P, 32)
flowAttributes (P, 40)
1
1
2


R
attributes
initialTCPFlags (P, 14) 1 R initialFlags
unionTCPFlags (P, 15) 1 R sessionFlags
silkFlowType (P, 30) 1 class & type
silkFlowSensor (P, 31) 2 sensor
silkAppLabel (P, 33) 2 application

On output, rwsilk2ipfix writes the IPFIX information elements specified in the following table when producing IPFIX from SiLK flow records. The output includes both IPv4 and IPv6 addresses, but only one set of IP addresses will contain valid values; the other set will contain only 0s. Elements marked "(P)" are defined in CERT's Private Enterprise space, PEN 6871.

IPFIX information elements written by SiLK
Count SiLK Field IPFIX Element (ID) IE Length
(Octets)
Octet
Position
1 sTime flowStartMilliseconds (152) 8 0-7
2 sTime + duration flowEndMilliseconds (153) 8 8-15
3 sIP sourceIPv6Address (27) 16 16-31
4 dIP destinationIPv6Address (28) 16 32-47
5 sIP sourceIPv4Address (8) 4 48-51
6 dIP destinationIPv4Address (12) 4 52-55
7 sPort sourceTransportPort (7) 2 56-57
8 dPort destinationTransportPort (11) 2 58-59
9 nhIP ipNextHopIPv4Address (15) 4 60-63
10 nhIP ipNextHopIPv6Address (62) 16 64-79
11 in ingressInterface (10) 4 80-83
12 out egressInterface (14) 4 84-87
13 packets packetDeltaCount (2) 8 88-95
14 bytes octetDeltaCount (1) 8 96-103
15 protocol protocolIdentifier (4) 1 104
16 class & type silkFlowType (P, 30) 1 105
17 sensor silkFlowSensor (P, 31) 2 106-107
18 flags tcpControlBits (6) 1 108
19 initialFlags initialTCPFlags (P, 14) 1 109
20 sessionFlags unionTCPFlags (P, 15) 1 110
21 attributes silkTCPState (P, 32) 1 111
22 application silkAppLabel (P, 33) 2 112-113
23 - paddingOctets (210) 6 114-119

15. Does SiLK support sFlow?

Support for sFlow v5 is available as of SiLK 3.9.0 when you configure and build SiLK to use v1.6.0 or later of the libfixbuf library.

16. Why does SiLK create unidirectional flows?

SiLK's origins are in processing NetFlow v5 data, which is unidirectional. Changing SiLK to support bidirectional flows would be major change to the software. Even if SiLK supported bidirectional flows, you would still face the task of mating flows, since a site with many access points to the Internet will often display asymmetric routing (where each half of a conversion passes through different border routers).

17. Can I make it bidirectional?

No, SiLK does not support bidirectional flows. You will need to mate the unidirectional flows, as described in the FAQ entry How do I mate unidirectional flows to get both sides of the conversation?.

18. I have a stack of pcap (tcpdump) files, can I use SiLK to analyze them?

Yes you can. Please see the answer to How do I convert packet data (pcap) to flows?.

19. How can I process data from a Cisco ASA (Adaptive Security Appliance)?

When configuring rwflowpack or flowcap to capture data from a Cisco ASA, you must include a quirks statement in the probe block of the sensor.conf file. The quirks statement must include firewall-event and zero-packets, as shown in this example probe:

  probe S20 netflow-v9
      listen-on-port 9988
      protocol udp
      quirks firewall-event zero-packets
  end probe

There are several things to keep in mind when analyzing flow records that originated from a Cisco ASA.

  • The NetFlow v9 templates do not include an IE that gives the TCP flags for the record, and the flags field is always empty.
  • The NetFlow v9 templates used by many ASAs do not include an information element (IE) that provides the number of packets in the flow record. Normally SiLK would treat these records as having a packet count of 0, but the zero-packets quirk causes SiLK to set packet count to 1 for these flow records.
  • The IEs exported by the ASA that SiLK uses for the bytes field are different that what SiLK traditionally expects. The bytes field in SiLK is based on the dOctets field in the NetFlow v5 record. This field counts the number of Layer 3 octets which includes IP headers and IP payload. (The IPFIX version of this field is octetDeltaCount, IE#1.) The ASA exports initiatorOctets and responderOctets (IE#231 and IE#232), which count only Layer 4 (payload) bytes. It is possible for the ASA to create a flow record that has a byte count of zero (consider a SYN packet to a closed port). As of SiLK 3.11.0, SiLK sets the byte count of such a record to 1. (Previous releases of SiLK ignored these records.)

20. Why is rwflowpack (or flowcap) ignoring NetFlow v9 flow records?

There are a variety of reasons that rwflowpack (or flowcap) may fail to receive NetFlow v9 flow records, and since NetFlow v9 uses UDP (which is a connectionless protocol), problems receiving NetFlow v9 can be hard to diagnose. Here are potential issues and solutions, from the minor to the substantial:

  • In the sensor.conf file for rwflowpack, you may have configured the probe as netflow, which is an alias for netflow-v5. You must use netflow-v9 for rwflowpack to accept NetFlow v9 flow records.
  • Your firewall may be blocking the packets. Check the settings of your firewall (e.g., iptables) to ensure the router may connect to the host and port where rwflowpack is listening.
  • Your router may sending the records to a host or port other the one where rwflowpack is listening. Ensure the listen-on-port and listen-as-host values in the sensor.conf file for rwflowpack match the ip flow-export values you used when you configured the router.
  • There may an IPv4/IPv6 mismatch between the address where rwflowpack is listening and the destination used by the router. For best results, use an IP address in the listen-as-host setting in sensor.conf and the ip flow-export setting in the router.
  • Perhaps you being affected by the template timeout. NetFlow v9 and IPFIX are template based, where a template describes the flow records. SiLK (via libfixbuf) cannot process the data stream until it has seen the templates (the data stream is just "random" data until libfixbuf has seen the template that describes it). For an IPFIX session over TCP, the templates are sent at the beginning of the session, and libfixbuf can process the data stream immediately. For UDP, the templates are sent periodically by the router, and when the router is started before rwflowpack, rwflowpack ignores the data until the router resends the templates. (The data is not entirely ignored: there may be error messages in rwflowpack's log regarding "No Template Present for Domain".) For some devices the resend timeout is large, and you may want to reduce it using the template data timeout setting of the router.
  • You may be using a Cisco ASA router. See the answer to this question to configure rwflowpack or flowcap to receive data from an ASA.

21. Why do I see the following log message in rwflowpack (or flowcap): NetFlow V9 Option Templates are NOT Supported, Flow Set was Removed.?

This message occurs when using a version of libfixbuf that does not have support for NetFlow v9 Option Templates. As of libfixbuf-1.4.0, NetFlow v9 Option Templates and Records are collected and translated to IPFIX.

22. Why do I see the following log message in rwflowpack (or flowcap): NetFlow V9 Record Count Discrepancy. Reported: 1. Found: 15.?

The likely cause for these messages is that the flow generator is putting the number of FlowSets into the NetFlow v9 message header. According to RFC 3954, the message header is supposed to contain the number of Flow Records, not FlowSets.

Other than being a nuisance in the log file, the messages are harmless. The NetFlow v9 processing library, libfixbuf, processes the entire packet, and it is reading all the flow records despite the header having an incorrect count.

The messages are generated by libfixbuf. Currently the only way to suppress the messages is by disabling all warnings from libfixbuf, which you may do by setting the SILK_LIBFIXBUF_SUPPRESS_WARNINGS environment variable to 1 prior to starting rwflowpack or flowcap.

23. Why is rwflowpack discarding the flow interfaces and Next Hop IP?

In our experience, the flow interfaces (or SNMP interfaces, ifIndex values) and the Next Hop IP do not provide much useful information for security analysis, and by default SiLK does not include them in our packed data files. If you wish to store these values or use them for debugging your packing configuration, you can instruct rwflowpack to store the SNMP interfaces and Next Hop IP by giving the it the --pack-interfaces switch. If you are using the rwflowpack.conf file, set the PACK_INTERFACES value to 1 and restart rwflowpack. The change will be noticeable once rwflowpack creates new hourly files, since flow records that are appended to existing files use the format of that file.

24. How many sensors does SiLK support?

The SiLK Flow format is capable of representing 65534 unique sensors.

25. Can I copy SiLK data between machines?

Yes, a binary file produced by a SiLK application will store its format, version, byte order, and compression method near the beginning of the file (in the file's header). (You can use the rwfileinfo tool to get a description of the contents of the file's header.) Any release of SiLK that understands that file version should be able to read the file. However, note that if the file's data is compressed, the SiLK tools on the second machine must have been compiled with support for that compression library. The SiLK tools will print an error and exit if they are unable to read a file because the tool does not understand the file's format, version, or compression method.

26. What ports do I need to open in a firewall?

SiLK does not use any hard-coded ports. All SiLK tools that do network communication (flowcap, rwflowpack, rwsender, and rwreceiver) have some way to specify which ports to use for communication.

When flowcap or rwflowpack collect flows from a router, you will need to open a port for UDP traffic between the router and the collection machine.

When flowcap or rwflowpack collect flows from a yaf sensor running on a different machine, you will need to open a port for TCP (or SCTP) traffic between these two machines.

Finally, when you are using flowcap on remote sensor(s) that feed data to rwflowpack running on a central data repository, you will need to open a port between each sensor and your repository. Configure flowcap or rwsender on the sensor and rwflowpack or rwreceiver on repository to use that port.

See the tools' manual pages and the Installation Handbook for details on specifying ports.

27. How do I split flows seen by one flow meter into different sensors?

In the rwflowpack configuration file sensor.conf, a flow collection point is called a probe. In that file, you may have two sensor blocks process data collected by a single probe.

You may want to use the discard-when or discard-unless keywords to avoid storing duplicate flow records for each sensor, as shown in the Single Source Becoming Multiple Sensors example configuration.

28. How do I create and use my own classes and types that can be used with a SiLK repository's storing and packing logic?

The classes and types in SiLK are defined in the silk.conf configuration file. Adding a new type to that file allows all of analysis tools in SiLK to know that that type is valid.

For that type to be populated with flow records, you need to have rwflowpack categorize records as that type and store those records in the data repository so rwfilter can find them. The code that categorizes flow records is called the packing logic, and packing logic is normally loaded into rwflowpack as a plug-in.

SiLK uses the term site to denote the combination of a silk.conf file and a packing logic plug-in. The SiLK source code has two sites named generic and twoway.

While you may modify one of these sites, we suggest that you create a new site for your customization so that your changes are not overwritten when you update your SiLK installation.

Since you must write C code, creating a new type in SiLK takes a fair amount of effort. It is not necessarily difficult, but there are several details to handle.

The following uses silk to denote the top-level directory of the SiLK source code and $prefix to denote the directory where SiLK is installed.

There are four major steps to customizing SiLK's packing logic: (A) Create a site, (B) modify the silk.conf file, (C) modify the packing logic C code, and (D) build and install SiLK.

  1. Create a site (this step may be skipped).
    1. To create a new site named enhanced, create a directory silk/site/enhanced. Copy the files from the silk/site/twoway directory into the silk/site/enhanced directory, and then, for each file in that directory, replace all instances of twoway with enhanced.
    2. To integrate the site/enhanced directory into the build system, you must have these utilities installed (version shown is the minimum supported):
    3. Go into the top-level silk directory and run autoreconf -fiv. That command should regenerate the silk/site/enhanced/Makefile.in file and the silk/configure script.
  2. Modify the silk.conf file.
    1. Next you need to modify the silk.conf file. Assuming you have created the enhanced site, open silk/site/enhanced/silk.conf in a text editor.
    2. If you choose to create a new site, you may delete all the existing types and start clean. If you are modifying the twoway or generic site and you have existing data to want to maintain access to, you should only add new types.
    3. Each type is defined with a type statement inside a class block. A sample type statement is
      type 2 inweb iw
      where
      • The first argument is the numeric ID that is stored on each flow record associated with this type; that ID must be unique across all class/type pairs within a site. (These values may be displayed by specifying --fields=id-flowtype to the rwsiteinfo utility.)
      • The second argument is the type name used in SiLK's interface. Each type name must be unique within a class.
      • The final (optional) argument is the prefix given to these files in the data repository, and it is the flowtype name. When this argument is not specified, the flowtype name is created by joining the class name and the type name. (These values are displayed by specifying --fields=flowtype to rwsiteinfo.)
    4. The default-types statement in that block tells rwfilter which types to select when the user does not specify any on rwfilter's command line. Update that statement as you desire.
    5. The packing-logic statement specifies the name of the plug-in that rwflowpack should load. If you did a global replace of twoway with enhanced, it should say
      packing-logic "packlogic-enhanced.so"
    6. Once you have made your changes, save the silk.conf file.
    7. To test that the syntax of this file is correct, you can use the rwsiteinfo tool and use its --site-configuration switch to specify the location of the silk.conf file you modified.
  3. Modify the packing logic C file.
    1. To modify the packing logic, open the silk/site/enhanced/packlogic-enhanced.c file in a code editor.
    2. If the goal of your change is to add types similar to the inweb and outweb types, create a macro or a function that determines whether a SiLK Flow record meets your criteria. For example, if you want to store DNS data in the types indns and outdns, you may use the macro
      #define RWREC_IS_DNS(r)                                         \
          ((6 == rwRecGetProto(r) || 17 == rwRecGetProto(r))          \
           && (53 == rwRecGetSPort(r) || 53 == rwRecGetDPort(r)))
    3. To make the packing logic easier to follow, we recommend #define-ing macros that reflect the numeric values of the types you defined in Step B.3, such as
      #define RW_IN_WEB 2
    4. Depending on what you are trying to accomplish with your packing logic, you may want to define additional networks. A network is a name that reflects a set of IP addresses or SNMP interfaces. The IPs or interfaces for a network are specified in the sensor.conf file, and the packing logic code compares the record's values to those specified for the network. The values of the NETWORK_ macros and the names in the net_names[] array must be in agreement.
    5. The filetypeFormats[] array reflects the fact that sometimes flow records for a class/type pair use a specific data file format. The number of entries in that array must be equal to the number types you defined in the silk.conf file. The values in the array are ignored when SiLK is compiled with IPv6 support.
    6. Part of the job of the packLogicSetup() function is to ensure that packing logic plug-in loaded by rwflowpack is in agreement with the silk.conf file. For each type in Steps B.3 and C.3, there should be a statement similar to
      FT_ASSERT(RW_IN_WEB,   "inweb");
      That statement causes rwflowpack to exit with an error if the numeric ID of the inweb type from the silk.conf file is not 2.
    7. The FT_ASSERT macro assumes the class of the data is all. If you define a new class, you will need to replace FT_ASSERT() with a call to sksiteFlowtypeAssert().
    8. The packLogicSetup() function also ensures that filetypeFormats[] array contains the correct number of entries. If your configuration is going to require additional information (say from an external file), the packLogicSetup() function is the best place to load or set that information.
    9. The packLogicTeardown() function is used to clean-up any state or memory that the plug-in owns.
    10. The job of the packLogicVerifySensor() function is to ensure that the packing logic code has everything it needs to work correctly by verifying that the user specified the correct values in the sensor.conf file. The function returns 0 to denote okay and non-zero for error. Whether you need to make changes to this function depends on the changes you make elsewhere in the file and how much checking of users' input you wish to do.
    11. The meat of the packing logic is defined in the packLogicDetermineFlowtype() function. The function is called on an individual record, rwrec, that was collected at probe. The function must fill the ftypes and sensorids arrays with the numeric flowtype(s) and numeric sensor ID(s) into which the flow record should be categorized, and it returns the number of entries it added to each array.
      Examine the code in the packLogicDetermineFlowtype() function in both the twoway and generic sites to see examples of how that function is used. The helper functions that start with skpc are defined in the C files in the silk/src/libflowsource directory.
    12. The packLogicDetermineFileFormat() function specifies the file format to use when rwflowpack writes the record to disk. Typically no changes will be required to this function.
    13. Save the packlogic-enhanced.c file.
  4. Build and install
    1. Run the new configure script you created in Step A.3 and verify that the silk/site/enhanced/Makefile file is created.
    2. Run make to compile your code.
    3. Run make install to install the code.
    4. You should be able to run
      $prefix/sbin/rwflowpack \
          --site-conf=$prefix/share/silk/enhanced-silk.conf
      to test the loading of your packing logic.
    5. If necessary, update the sensor.conf file to define and use the new networks you defined in Step C.4.
    6. Use the instructions in the SiLK Installation Handbook as a guide for configuring and running rwflowpack.

Building and Installing

29. Where can I download SiLK?

The latest Open Source version of SiLK and selected previous releases are available from http://tools.netsa.cert.org/silk/download.html.

30. Where can I find RPMs for SiLK?

Because there are many configuration options for SiLK, we recommend that you build your own RPMs as described in the "Create RPMs" section of the SiLK Installation Handbook.

That said, the CERT Forensics Team has a Linux Tools Repository that includes RPMs of SiLK and other NetSA tools.

31. What release of Python do I need if I want to use the PySiLK extension?

The PySiLK extension requires Python 2.4 or later, and Python 2.6 or later is highly recommended. PySiLK is known to work with Python releases up to Python 3.3.

32. When I configure --with-python, I get the error message warning: Not importing directory 'site': missing __init__.py. How do I fix this?

This error message occurs because Python is attempting to treat the site directory in the SiLK source tree as a Python module directory. This happens when you are running Python >= 2.5, and the PYTHONPATH environment variable includes the current working directory. Examples of PYTHONPATH values that can cause this error are when the value begins or ends with a colon (':') or if any element of the value is a single period ('.').

The solution to this problem is to either unset the PYTHONPATH before running configure, or to ensure that all references to the current working directory are removed from PYTHONPATH before running configure.

Operations

33. How long would it take to find the all the flow records to or from an IP address, when your data size is 10 billion records?

This is a difficult question to answer, because there are so many variables that will affect the results.

On a beefy machine, rwfilter was invoked using the --any-addr switch to look for a /16 (IPv4-only). rwfilter was told only to print the number of records that matched---rwfilter did not produce any other output. Therefore, the times below are only for scanning the input.

rwfilter was invoked with --threads=12 to query a data store of 3260 files that contained 12.886 billion IPv4 records, and rwfilter took 19:18 minutes to run the query. That corresponds to a scan rate of 11.1 million records per second, or 0.927 million records per thread per second.

When the query was run a second time, rwfilter completed in 6:28 minutes, or 2.76 million records per thread per second. This machine has a large disk cache which is why the second run was so much faster than the first.

For another run, rwfilter was run with a single thread to query 4996 files that contained 3.27 billion IPv4 records, and rwfilter completed the query in 9:10 minutes. That is a scan rate of 5.95 million records second, which would require approximately 28 minutes to scan 10 billion records.

As seen in this simple example, there are many things that can affect performance. Some items that will affect the run time are:

  • The speed of your processors and your disks, and how many other tasks they are performing.
  • Whether the files being queried are in the machine's disk cache or are being read "cold".
  • The number of threads you tell rwfilter to use. Additional threads can speed rwfilter's processing time, but at some point you reach the point of diminishing returns. When we first tested the threading in rwfilter several years ago, we found a sweet spot of about three threads per processor (before the days of commodity multi-core processors).
  • The source of the input. In these test runs, there were a few thousand files to process, and the threading in rwfilter was able to assign the input files to the different threads. If the input was coming from a single source, rwfilter would run in single threaded mode.
  • How much output rwfilter produces. These test runs only told the number of matching records, but you probably want to output those flow records for further analysis. Consider the two extremes: When the IP address you are searching for does not match any records, the performance of rwfilter will be similar to these test runs. When the IP address matches every record, rwfilter must write all the input records to its output. Producing output will slow rwfilter in two ways: the first is in writing bytes to the output, the second is that there is more thread contention as they vie for the output stream mutex.

34. How can I improve the performance of the SiLK queries?

As analysts, it seems we spend a lot of time waiting for rwfilter to pull data from the repository. One way to reduce the wait time is to write efficient queries. Here are some good practices to follow:

  1. Only look at the files that have the data you are interested in.
    • Specify the hour to the --start-date and --end-date switches to reduce the time window.
    • If traffic for the IPs you are interested in normally passes through particular border routers, use the --sensor switch to limit your search to those sensors.
    • Limit the query to the relevant class(es) and type(s). For example, when looking at DNS traffic you do not need the web traffic, so specify --type=in or --type=out to eliminate the web traffic from your data pull.
  2. Instead of repeating the same rwfilter command multiple times and piping the results to different applications, save the rwfilter results to a local file, and use the file as input to the different applications.
  3. Rather than querying the same time range multiple times with slightly different parameters, consolidate the query into a single rwfilter invocation, and then split the result. For example:
    • Instead of issuing two rwfilter commands to pull TCP and then UDP traffic, pull both protocols at once and then split the result:
      $ rwfilter --protocol=6,17 --pass=temp.rw ...
      $ rwfilter --proto=6 --pass=tcp.rw --fail=udp.rw temp.rw
    • If you want to pull data for a set of IP addresses, build an IPset with rwsetbuild, and use one of the set switches on rwfilter:
      $ rwsetbuild myips.txt myset.set
      $ rwfilter ... --dipset=myset.set
  4. Take advantage of additional filtering options for your initial pull to restrict the query to the traffic of interest.
    • You can use country code and protocol to restrict the traffic in a coarse grain way--i.e., cast a sufficiently broad net so you don't have to re-issue queries for the same time period.
    • If you are only interested in completed TCP connections, you can filter using TCP flags (e.g., --flags-initial) and byte and packet counts (e.g., flows with more than 5 packets --packets=5-).
    • Outgoing traffic is always smaller than incoming, due to incoming scan traffic. If you are looking at TCP traffic and you just need evidence of communication, consider specifying the outgoing types (--type=out,outweb) rather than incoming.
  5. Instead of using IPsets, consider using the --tuple options to rwfilter. The tuple options allow you to search both directions at once and to limit your search to traffic between particular IP addresses and/or particular ports.
  6. Sometimes it is easier to specify what you don't need. Use the --fail switch on rwfilter to select the flows that don't match the partitioning parameters.

35. How are the SiLK Flow files organized and written to disk?

SiLK Flows are stored in binary files, where each file corresponds to unique class-type-sensor-hour tuple. Multiple data repositories may exist on a machine; however, rwfilter is only capable of examining a single data repository per invocation.

A default repository location is compiled into rwfilter. (This default is set by the --enable-data-rootdir=DIR switch to configure and defaults to /data). You may tell rwfilter to use a different repository by setting the SILK_DATA_ROOTDIR environment variable or specifying the --data-rootdir switch to rwfilter.

The structure of the directory tree beneath the root is determined by the path-format entry in the silk.conf file for each data repository. Traditionally, the directory structure has been /DATA_ROOTDIR/class/type/year/month/day/hourly-files

36. How many bytes does a single SiLK Flow record occupy on disk?

A fully-expanded, uncompressed, SiLK Flow record requires 52 bytes (this is 88 bytes for IPv6 records). These records are written by rwcat --compression=none.

Records in the SiLK data repository require less space since common attributes (sensor, class, type, hour) are stored once in the file's header. The smallest record (uncompressed) in the data repository is that representing a web flow which requires only 22 bytes.

In addition, one can enable data compression in an individual SiLK application (with the --compression-method switch) or in all SiLK applications when SiLK is configured (specify the --enable-output-compression switch when you invoke the configure script). Compression with the lzo1x algorithm reduces the overall file size by about 50%. Using zlib gives a better compression ratio, but the at the cost of access time.

The rwfileinfo command will tell you the (uncompressed) size of records in a SiLK file.

37. Where is the SiLK Flow file format documented?

SiLK uses many different file formats: There are file formats for IPsets, for Bags, for Prefix Maps, and for SiLK Flow records. The files that contain SiLK Flow records come in several different formats as well, where the differences include whether

  • the sensor and class/type information is stored on every record or in the file's header
  • the records support the additional flow information that yaf provides
  • the records contain the next hop IP and the router's input and output interface numbers
  • the file contains only flow records on ports 80/tcp, 443/tcp, and 8080/tcp

In addition to various file and record formats, the records in a file may be stored in big endian or little endian byte order. Finally, groups of flow records may be written as a block, where the block is compressed with the zlib or LZO compression libraries.

The recommended way to put one or more files of SiLK Flow records into a known format is to use the rwcat tool. The rwcat command to use is:
rwcat --compression=none --byte-order=big [--ipv4-output] FILE1 FILE2 ...

That command will produce an output stream/file having a standard SiLK header followed by 0 or more records in the format given in the following table. The length of the SiLK header is the same as the size of the records in the file.

When SiLK is not compiled with IPv6 support or the --ipv4-output switch is given, each record will be 52 bytes long, and the header is 52 bytes; otherwise each record is 88 bytes and the file's header is 88 bytes.

The other SiLK Flow file formats are only documented in the comments of the source files. See the rw*io.c files in the silk/src/libsilk directory.

IPv4 Bytes IPv6 Bytes Field Description
0-7 0-7 sTime Flow start time as milliseconds since UNIX epoch
8-11 8-11 duration Duration of flow in milliseconds (allows for a 49 day flow)
12-13 12-13 sPort Source port
14-15 14-15 dPort Destination port
16 16 protocol IP protocol
17 17 class,type Class & Type (Flowtype) value as set by SiLK packer (integer to name mapping determined by silk.conf)
18-19 18-19 sensor Sensor ID as set by SiLK packer (integer to name mapping determined by silk.conf)
20 20 flags Cumulative OR of all TCP flags (NetFlow flags)
21 21 initialFlags TCP flags in first packet or blank
22 22 sessionFlags Cumulative OR of TCP flags on all but initial packet or blank
23 23 attributes Specifies various attributes of the flow record
24-25 24-25 application Guess as to the content the flow. Some software that generates flow records from packet data, such as yaf, will inspect the contents of the packets that make up a flow and use traffic signatures to label the content of the flow. The application is the port number that is traditionally used for that type of traffic (see the /etc/services file on most UNIX systems).
26-27 26-27 n/a Unused
28-29 28-29 in Router incoming SNMP interface
30-31 30-31 out Router outgoing SNMP interface
32-35 32-35 packets Count of packets in the flow
36-39 36-39 bytes Count of bytes on all packets in the flow
40-43 40-55 sIP Source IP
44-47 55-71 dIP Destination IP
48-51 72-87 nhIP Router Next Hop IP

38. What is the format of the header of a binary SiLK file?

Every binary file produced by SiLK (including flow files, IPsets, Bags) begins with a header describing the contents of the file. The header information can be displayed using the rwfileinfo utility. The remainder of this entry describes the binary header that has existed since SiLK 1.0. (This FAQ entry does not apply to the output of rwsilk2ipfix, which is an IPFIX stream.)

The header begins with 16 bytes that have well-defined values. (All values that appear in the header are in network byte order; the header is not compressed.)

Offset Length Field Description
0 4 Magic Number A value to identify the file as a SiLK binary file. The SiLK magic number is 0xDEADBEEF.
4 1 File Flags Bit flags describing the file. Currently one flag exists: The least significant bit will be high if the data section of the file is encoded in network (big endian) byte order, and it will be low if the data is little endian.
5 1 Record Format The format of the data section of the file; i.e., the type of data that this file contains. This will be one of the fileOutputFormats values defined in the silk_files.h header file. For a file containing IPv4 records produced by rwcat, the value is 0x16 (decimal 22, FT_RWGENERIC). For an IPv6 file, the value is 0x0C, (decimal 12, FT_RWIPV6ROUTING).
6 1 File Version This describes the overall format of the file, and it is always 0x10 (decimal 16) for any file produced by SiLK 1.0 or later. (The version of the records in the file is at byte offset 14.)
7 1 Compression This value describes how the data section of the file is compressed.
0 SK_COMPMETHOD_NONE no compression
1 SK_COMPMETHOD_ZLIB libz (gzip) using default compression level
2 SK_COMPMETHOD_LZO1X lzo1x() method from LZO
8 4 SiLK Version The version of SiLK that produced this file. This value is computed by transforming a SiLK version, X.Y.Z, as X*1,000,000 + Y*1,000 + Z. For SiLK 1.2.3, the value is 1,002,003.
12 2 Record Size Number of bytes required per record in this file. This is 52 (0x0034) for the current version of FT_RWGENERIC records, and 88 (0x0058) for the current version of FT_RWIPV6ROUTING records. For some files, this value is unused and it is set to 1.
14 2 Record Version The version of the record format used in this file. Currently this is 5 for FT_RWGENERIC records and 1 for FT_RWIPV6ROUTING records.

Following those 16 bytes are one or more variable-length header entries; each header entry begins with two 4 bytes values: the header entry's identifier and the byte length of the header entry (this length includes the two 4 byte values). The content of the header entry follows those 8 bytes. Currently there is no restriction that a header entry begin at a particular offset. The following header entries exist:

ID Length Description
0 variable This is the final header entry, and it marks the end of the header. Every SiLK binary file contains this header entry immediately before the data section of the file. The length of this header entry will include padding so that the size of the complete file header is an integer multiple of the record size. Any padding bytes will be set to 0x00.
1 24 Used by the hourly files located in the data store (/data). This entry contains the starting hour, flowtype, and sensor for the records in that file.
2 variable Contains an invocation line, like those captured by rwfilter. This header entry may appear multiple times.
3 variable Contains an annotation that was created using the --notes-add switch on several tools. This header entry may appear multiple times.
4 variable Used by flowcap to store the name of the probe where flow records were collected.
5 variable Used by prefix map files to record the map-name.
6 16 Used by Bag files (e.g. rwbag) to store the key type, key length, value type, and value length of the entries.
7 32 Used by some IPset files (e.g. rwset) to describe the structure of the tree that contains the IP addresses.

The minimum SiLK header is 24 bytes: 16 bytes of well-defined values followed by the end-of-header header entry containing no padding.

rwcat will remove all header entries from a file and leave only the end-of-header header entry, which will padded so that the entire SILK header is either 52 bytes for IPv4 (FT_RWGENERIC) files or 88 bytes for IPv6 (FT_RWIPV6ROUTING) files.

39. How can I use rwsender to transfer files created by yaf?

The rwsender and rwreceiver daemons are indifferent to the types of files they transfer. However, you must ensure that files are added to rwsender's incoming-directory in accordance with SiLK's directory polling logic.

The SiLK daemons that use directory polling (including rwsender) treat any file whose name does not begin with a dot and whose size is non-zero as a potential candidate for processing. To become an actual candidate for processing, the file must have the same size as on the previous directory poll. Once the file becomes an actual candidate for processing, the daemon will not notice if the file's size and/or timestamp changes.

To work with directory polling, SiLK daemons that write files normally create a zero length placeholder file, create a working file whose name begins with a dot followed by the name of the placeholder file, write the data into the working file, and replace the placeholder file with the working file once writing is complete.

Any process that follows a similar procedure will interoperate correctly with SiLK. Any that does not risks having its files removed out from under it.

The yaf daemon does not follow this procedure; instead, it uses .lock files. When yaf is invoked with the --lock switch, it creates a flows.yaf.lock file while it is writing data to flows.yaf, and yaf removes flows.yaf.lock once it closes flows.yaf.

For yaf and rwsender to interoperate correctly, an intermediate process is required. The suggested process is the filedaemon program that comes as part of the libairframe library that is bundled with yaf. filedaemon supports the .lock extension, and it can move the completed files from yaf's output directory to rwsender's incoming directory. The important parts of tool chain resemble:

Tell yaf to use the .lock suffix, and rotate files every 900 seconds:

yaf --out /var/yaf/output/foo --lock --rotate 900 ...

Have filedaemon watch that directory, respect *.lock files, move the files it processes to /var/rwsender/incoming, and run the "no-op" command /bin/true on those files:

filedaemon --in '/var/yaf/output/foo*yaf' --lock   \
    --next /var/rwsender/incoming ...              \
    -- /bin/true

Tell rwsender to watch filedaemon's next directory:

rwsender --incoming-directory /var/rwsender/incoming ...
40. How much disk do I need to store a link of a particular size?

There are many factors that determine the amount of space required, including (1) the size of the link being monitored, (2) the link's utilization, (3) the type of traffic being collected and stored (NetFlow-v5, IPFIX-IPv4, or IPFIX-IPv6), (4) the amount of legacy data to store, and (5) the number of flows records generated from the data. The SiLK Provisioning Spreadsheet allows one to see how modifying the first four factors affects the disk space required. (The spreadsheet specifies a value for the fifth factor based on our experience.)

41. How much bandwidth will be used by rwsender?

The factors that affect the bandwidth required by rwsender to transfer to the storage center flows collected by a flowcap daemon running near a sensor are nearly identical to those that determine the amount of disk space required (see previous entry). The SiLK Provisioning Spreadsheet includes bandwidth calculations.

42. What is the latency of the SiLK packing system?

The latency of the packing system (the time from a flow being collected to it being available for analysis in the SiLK data repository) depends on how the packing system has been configured and additional factors. It can be a few seconds for a simple configuration or a few minutes for a complex one.

Before the SiLK packing system sees the flow record, the act of generating a flow record itself involves latency. For a long-lived connection (e.g., ssh), the flow generator (a router or yaf) may generate the flow record 30 minutes after the first packets for that session were seen. The active timeout is defined as amount of time a flow generator waits before creating a flow record for an active connection.

As described in the SiLK Installation Handbook, there are numerous ways the SiLK packing system can be configured. The latency will depend on the number of steps in your particular collection system.

For each type of configuration, we give a summary, a table itemizing the contributions to the total, and an explanation of those numbers.

rwflowpack only

Latency: typically small, but up to 120 seconds

Description Min Max
rwflowpack buffering 0 120
TOTAL 0 120

For a configuration where rwflowpack collects the flow records itself and packs them directly into the data repository, the latency is typically small, but with the default settings it can be as large as two minutes: As rwflowpack creates SiLK records, it buffers them in memory until it has a 64kb block of them, and then writes that block to disk. (The buffering improves performance since there is less interaction with the disk. When compression is enabled, the 64kb blocks can provide for better overall compression.)

If the flow collector is monitoring a busy link, flows arrive quickly and the 64kb buffers will fill quickly and be written to disk, making the latency small. However, on a less-busy link, the buffers will be slower to fill. In addition, depending on the flow collector's active timeout setting, the flow collector may generate flow records that have a start time in the previous hour. These flows become less frequent as time passes, slowing the rate that the 64kb buffers associated with the previous hour's files are filled.

To make certain that flows reach the disk in a timely fashion and to reduce the number of flows that would potentially be lost due to a sudden shutdown of rwflowpack, rwflowpack flushes all its open files every so often. By default, this occurs every 120 seconds. The default can be changed by specifying the --flush-timeout switch on the rwflowpack command line.

If a flow arrives just before rwflowpack flushes the file, it will appear almost instantly, so the minimum latency is 0 seconds. A flow arriving just after the files are flushed could be delayed by 120 seconds.

flowcap to rwsender/rwreceiver to rwflowpack

Latency: 30 seconds to 255 seconds or more

Description Min Max
flowcap accumulation 0 60
rwsender directory polling 15 30
waiting for other files to be sent 0 d1
rwsender transmission to rwreceiver 0 15
rwflowpack directory polling 15 30
waiting for other files to be packed 0 d2
rwflowpack buffering 0 120
TOTAL 30 255 + d1 + d2

When flowcap is added to the collection configuration, the latency will be larger. In this configuration, flowcap is used to collect the flows from the flow generator, an rwsender/rwreceiver pair moves the flows from flowcap to rwflowpack, and rwflowpack packs the flows and writes them to the data repository.

flowcap

Once the flow collector generates the flow record, it should arrive at flowcap in negligible time. flowcap accumulates the flows into files for transport to a packing location. The files are released to rwsender once they reach a particular size or after a certain amount of time, whichever occurs first. By default, the timeout is 60 seconds; it can be specified with the --timeout switch on the flowcap command line. Decreasing the timeout has two effects:

  1. Each file has a small header (less than 100 bytes) describing the file. As the file size becomes smaller, the overhead due to the header increases.
  2. Many small files can adversely affect rwsender, as described below.

rwsender and rwreceiver

Once flowcap releases the file of accumulated flows, it gets moved to a directory being monitored by an rwsender process. rwsender checks this directory every 15 seconds (by default) to see what files are present. (Specify the --polling-interval switch to change the setting from the default.) If a file's size has not changed since the previous check, rwsender will accept the file for sending to an rwreceiver process. In the best case, a file will be accepted in just over 15 seconds; in the worst case, it can take up to 30 seconds before the file is accepted. In addition, if the directory has a large number of files (a few thousand), the time to scan the directory and determine the size of each file will add measurable overhead to each rwsender directory poll.

Files in the rwsender queue may not be sent immediately if other files are backlogged, but that number is hard to quantify, so we define it as the delay d1. Under most circumstances, we expect this to be a few seconds at most.

Transmission of a file from rwsender to rwreceiver can be relatively quick if the network lag is low, or slow if there is high network lag. This time is hard to determine without empirical data, and it will vary as the load on the network varies. We do not have any hard data, but our past experiences on our networks say that most files from flowcap make it from rwsender to rwreceiver in less than 15 seconds.

The rwsender process may be configured to send its data to multiple rwreceivers. Although these transfers can happen simultaneously, they may add latency:

  • the increase in traffic from sending to multiple rwreceivers can add load to the network
  • the increase in disk I/O may can add load to the system
  • the additional thread(s) may add some small overhead

The administrator can also configure rwsender to prioritize files by filename. For example, if certain sensors contain more time-sensitive (important) data, they can be set to a higher priority. This will cause these files to "jump the queue" over other files, and it will increase the delay of the lower priority files.

rwflowpack

After the file has arrived at rwreceiver, the file is handed off to rwflowpack via another round of directory polling. The same issues exist here that exist for rwsender:

  • It will take two directory scans (up to 30 seconds) for rwflowpack to decide that the file is ready for processing.
  • A large number of files will slow the directory scan.
  • Once accepted, the file could sit in rwflowpack's queue waiting for other files to be processed. We will call this delay d2.

When a single rwflowpack process is packing files from multiple flowcap processes, the directory scan overhead can become large. In addition, the value of d2 is much harder to quantify, as it is an aggregation point from multiple sensors.

Finally, there is the latency associated with rwflowpack itself, as described in the previous section.

The "flooding" problem:
Under most circumstances, the values d1 and d2 should be no more than few seconds. If part of the system goes down (aside from the flow generator or flowcap, which are injecting flows into the system), or if the network between rwsender and rwreceiver becomes disconnected, the two directory polling locations can act as accumulation points, where the files will pile up (as behind a dam). Once the system is brought back up or the network connection is re-established, the resulting flood can drastically increase d1 and/or d2 and affect downstream latency for all sensors.
rwflowpack to rwsender/rwreceiver to rwflowappend

Latency: 30 seconds to 195 seconds or more

Description Min Max
rwflowpack accumulation 0 120
rwsender directory polling 15 30
waiting for other files to be sent 0 d3
rwsender transmission to rwreceiver 0 15
rwflowappend directory polling 15 30
waiting for other files to be written 0 d4
TOTAL 30 195 + d3 + d4

Some configurations of the SiLK packing system do not use rwflowpack to write to the data repository, but instead use an rwsender/rwreceiver pair between rwflowpack and another tool that writes the SiLK flows to the data repository: rwflowappend.

In this configuration, rwflowpack collects the flows directly from the flow generator (yaf or a router) and writes the flow records to small files called "incremental" files. After some time, rwflowpack releases the incremental files to an rwsender process. rwflowpack's --flush-timeout switch controls this time, and the default is 120 seconds.

The issues that were detailed above in for rwsender/rwreceiver exist here as well, and this rwsender process is more likely to experience the issues related to handling many small files. We call time that rwsender holds the files prior to transferring to rwreceiver delay d3. The network transfer from rwsender to one or more rwreceiver processes was discussed above, and although this value is hard to quantify and can vary, we will again use 15 seconds for this delay.

rwreceiver places the incremental files into a directory that rwflowappend polls. This could add an additional 30 seconds. The time that rwflowappend holds the files prior to processing them is hard to quantify; we use d4 for this value.

Once rwflowappend begins to process an incremental file, it writes its contents to the appropriate data file in the repository, and then closes the repository file. There should be very little time required for this operation.

flowcap to rwsender/rwreceiver to rwflowpack to rwsender/rwreceiver to rwflowappend

Latency: 60 seconds to 330 seconds or more

Description Min Max
flowcap accumulation 0 60
rwsender directory polling 15 30
waiting for other files to be sent 0 d1
rwsender transmission to rwreceiver 0 15
rwflowpack directory polling 15 30
waiting for other files to be packed 0 d2
rwflowpack accumulation 0 120
directory polling by rwsender 15 30
waiting for other files to be sent 0 d3
rwsender transmission to rwreceiver 0 15
rwflowappend directory polling 15 30
waiting for other files to be written 0 d4
TOTAL 60 330 + d1 + d2 + d3 + d4

For this configuration, we combine the analysis of the previous two configurations. One item to note: Since rwflowpack splits the flows it receives from flowcap into files based on the flowtype (class/type pair) and the hour, a single file rwflowpack receives from flowcap can generate many incremental files to be sent to rwflowappend.

This configuration is also subject to the "flooding" problem when processing is restarted after a stoppage.

43. What confidentiality and integrity properties are provided for SILK data sent across machines?

The rwsender and rwreceiver programs can use GnuTLS to provide a secure layer over a reliable transport layer. For this support to be available, SiLK's configure script must have found v1.4.1 or later of the GnuTLS library. Using GnuTLS also requires creating certificates, which is described in an appendix of the Installation Handbook.

We recommend creating a local certificate authority (CA) file, and creating program-specific certificates signed by that local CA. The local CA and program-specific certificates are copied onto the machines where rwsender and rwreceiver are running. The local CA acts as a shared secret: it is on both machines and it is used to verify the asymmetric keys between the rwsender and rwreceiver certificates.

If someone else has access to the local CA, they would not be able to decipher the conversation, since the conversation is encrypted with a private key that was negotiated during the initialization of the TLS session.

However, anyone with access to the CA would be able to set up a new session with an rwsender (to download files) or an rwreceiver (to spoof files). The certificates should be one part of your security; additional measures (such as firewall rules) should be enabled to mitigate these issues.

When GnuTLS is not used or not available, communication between rwsender and rwreceiver has no confidentiality or integrity checking beyond that provided by standard TCP.

Legacy systems that use a direct connection between flowcap and rwflowpack have no confidentiality or integrity checking beyond that provided by standard TCP, and there is no way to secure this communication without using some outside method (such as creating an ssh tunnel).

44. If communication between the sensor and the packer go down, are flows lost?

It depends on what you mean by "sensor". If the "sensor" is the flow generator (that is, a router or an IPFIX sensor) which is communicating directly with rwflowpack, the flows are lost when the connection goes down.

To avoid this, you can run flowcap on the sensor. flowcap acts as a flow capacitor, storing flows on the sensor until the communication link between the sensor and packer is restored. Flows will still be lost if the connection between the flow generator and flowcap goes down, but by running flowcap on a machine near the flow generator (or running both on the same machine), the communication between the generator and flowcap should be more reliable, leading to fewer dropped connections.

45. Can flowcap function as a "tee", both storing files and forwarding the flow stream onto some place else?

The flowcap program cannot do this itself; however, the rwsender program can send files to multiple rwreceivers. To get the "tee" functionality, have flowcap drop its files into a directory for processing by rwsender.

46. How do I list all sensors that are installed for a deployment?

The rwsiteinfo command will print information about your site's configuration. To list the sensors and their desciptions, run rwsiteinfo --fields=sensor,describe-sensor.

47. How do I rotate the SiLK log files?

If you invoke a SiLK daemon with the --log-destination=syslog switch, the daemon will use the syslog(3) command to write log messages, and syslog will manage log rotation.

If you pass the --log-directory switch to a daemon, the daemon will manage the log files itself. The first message received after midnight local time will cause the daemon to close the current log file, compress it, and open a new log file.

Analysis

48. I get an error when I try to use the --python-file switch in the SiLK analysis applications. What is wrong?

PySiLK support involves loading several shared object files, and a misconfiguration can cause PySiLK support to be unavailable. There are several issues that may cause problems when using the --python-file switch.

  1. Make certain the application has PySiLK support. PySiLK support is only available in the following applications: rwfilter, rwcut, rwgroup, rwsort, rwstats, and rwuniq. Note that PySiLK support in rwgroup and rwstats did not exist prior to SiLK 2.0.
  2. Make certain that you compiled SiLK with Python support. To determine if PySiLK support is available, run the command rwcut --version | grep -i pysilk.
    • If the output includes a directory path, PySiLK support was included when you built SiLK. Continue to the next item.
    • If you get the value PySiLK support: no, Python support was not included in your build of SiLK. To get PySiLK support, you need to reconfigure and rebuild SiLK.
  3. Determine whether the application is able to load the silkpython.so plug-in file, which is normally installed in the $prefix/lib/silk/ directory. Run rwcut --help | grep python-file.
    • If there is output from the command, silkpython.so is being properly loaded and you can go to the next item.
    • If there is no output, there is a problem loading the plug-in. To debug the issue, first check to see if other plug-ins are available by running rwcut --plugin=flowrate.so --help | grep payload-rate. If you get output, the problem is limited to PySiLK. Perhaps you need to set the LD_LIBRARY_PATH environment variable to include the location of the Python library (libpython2.so or similar). If you do not get output, there is probably an issue loading all SiLK plug-ins. You may need to set SILK_PATH or set LD_LIBRARY_PATH to include the directory $prefix/lib/silk/. To help debug the issue, you can try running SILK_PLUGIN_DEBUG=1 rwcut --version.
  4. Determine whether the error is in your Python script. Run the command rwcut --python-file=/dev/null --help.
    • If you get the error rwcut: Could not load the "silk.plugin" python module, you need to set the PYTHONPATH environment variable to the location specified by the command shown in (1).
    • If that works, the problem is in your Python file. You may want to set the SILK_PYTHON_TRACEBACK environment variable to get more debugging information.

49. Someone gave me an IPset file, and my version of the IPset tools will not read the file. What is wrong?

Often an IPset tool (for example, rwsetcat) provides a useful error message when it is unable to read an IPset file (e.g., set1.set), but sometimes the IPset library suppresses the actual error message and you see the generic message "Unable to read IPset from 'set1.set': File header values incompatible with this compile of SiLK".

The tool that can help you determine what is wrong is rwsiteinfo. Run rwfileinfo set1.set, and then run rwsetcat --version. There are three things you need to check: the record version, the compression, and IPv6 support.

Record Version: Use the record-version value in the rwfileinfo output and the following table to determine which version of SiLK is required to read the file. The version of SiLK is printed in the first line of the output from rwsetcat --version.

IPset File Version Minimum SiLK Version
0, 1, 2 any
3 3.0.0
4 3.7.0

If your version of SiLK is not new enough to understand the record version, see the end of this answer for possible solutions.

Compression: If SiLK is new enough to understand the record version, next check whether the IPset file is compressed with a library that your version of SiLK does not support. Compare the compression(id) field in the rwfileinfo output with the Available compression methods field in the rwsetcat --version output. If the compression used by the file is not available in your build of SiLK, you will be unable to read the file. See the end of this answer for possible solutions.

(When the compression library is not available in SiLK, running rwfileinfo set1.set tool may also report the warning "rwfileinfo: Specified compression method is not available 'set1.set'".)

IPv6: If the record version of the IPset file is 3 or 4, the file may contain IPv6 addresses. To read an IPv6 IPset file, you must use SiLK 3.0.0 or later and your build of SiLK must include support for IPv6 Flow records, which you can determine by checking the IPv6 flow record support field in the output from rwsetcat --version.

To check whether an IPset file contains IPv6 addresses look at the record version and ipset fields of the rwfileinfo output.

Record Version IPSet Field Contents
0, 1, 2 not present IPv4
3 ...80b nodes...8b leaves IPv4
3 ...96b nodes...24b leaves IPv6
4 IPv4 IPv4
4 IPv6 IPv6

If the IPset file contains IPv6 addresses, you must use a build of SiLK that includes IPv6 support.

Solutions: There are two solutions to IPset incompatibility.

  • The first is for you to upgrade or rebuild your version of SiLK to include whatever feature is missing. In the output from SiLK's configure script, ensure that the compression library is found or that IPv6 is enabled as necessary.
  • The second is to ask the author of the IPset file to rebuild the file and disable whatever feature is causing issues. If set1.set contains only IPv4 addresses, the author can use the following command to convert it to a file of maximum portability:
    rwsettool --union --record-version=2 --compression-method=none \
        --output-path=set1-new.set set1.set
    If set1.set contains IPv6 addresses, the author should use the following command:
    rwsettool --union --record-version=3 --compression-method=none \
        --output-path=set1-new.set set1.set

50. What do all these time switches on rwfilter do?

The time switches on rwfilter can cause confusion. The --start-date and --end-date switches are selection switches, while the --stime, --etime, and --active-time switches are partitioning switches.

The --start-date and --end-date switches are used only to select hourly files from the data repository, and these switches cannot be used when processing files specified on the command line. The switches take a single date---with an optional hour---as an argument. Since the switches select hourly files, any precision you specify finer than the hour is ignored. The switches cause rwfilter to select hourly files between start-date and end-date inclusive. See the rwfilter manual page for what happens when only --start-date is specified.

The --stime, --etime, and --active-time switches partition flow records. The switches operate on a per-record basis, and they write the record to the --pass or --fail stream depending on the result of the test. These switches take a date-time range as an argument. --stime asks whether the flow record started within the specified range, --etime asks whether the flow record ended within the specified range, and --active-time asks whether any part of the flow record overlaps with the specified range. When a single time is given as the argument, the range contains a single millisecond. The time arguments must have at least day precision and may have up to millisecond precision. When the start of the range is more course than millisecond precision, the missing values are set to 0. When the end of the range is more more course than millisecond precision, the missing values are set to the maximum value.

To query the repository for records that were active during a particular 10 minute window, you would need to specify not only the --start-date switch for the hour but also the --active-time switch that covers the 10 minutes of interest. In addition, note that the repository stores flow records by their start-time, so when using --etime or --active-time, you may need to include the previous hour's files. Flows active during the first 10 minutes of July 2009 can be found by:

rwfilter --start-date=2009/06/30:23 --end-date=2009/07/01:00 \
    --active-time=2009/07/01:00-2009/07/01:00:10 ...

To summarize, it is important to remember the distinction between selection switches and partitioning switches. rwfilter works by first determining which hourly files it needs to process, which it does using the selection switches. Once it has the files, rwfilter then goes through each flow record in the files and uses the partitioning switches to decide whether to pass or fail it.

51. How do the --start-date and --end-date switches on rwfilter affect which files rwfilter examines?

The rules that rwfilter and rwfglob use to select files given arguments to the --start-date and --end-date switches can be confusing. The set of rules are:

  1. When neither start-date nor end-date is given, rwfilter processes files starting from midnight today to the current hour.
  2. When end-date is not specified and start-date is specified as YYYY/MM/DD (to day precision), files for that complete day are processed.
  3. When end-date is not specified and start-date is specified as YYYY/MM/DD:HH (to hour precision) or as seconds since the UNIX epoch, files for that single hour are processed.
  4. When both start-date and end-date are specified as YYYY/MM/DD, files for the hours YYYY/MM/DDstart:00 to YYYY/MM/DDend:23 are processed.
  5. When both start-date and end-date are specified as YYYY/MM/DD:HH, files for all hours within that time range are processed.
  6. When both start-date and end-date are specified seconds since the UNIX epoch, files for all hours within that time range are processed.
  7. When start-date is specified as YYYY/MM/DD and end-date is specified as YYYY/MM/DD:HH, the hour on the end-date is ignored and Rule 4 is followed.
  8. When start-date is specified as YYYY/MM/DD:HH and end-date is specified as YYYY/MM/DD, the hour of the start-date is used as the hour for the end-date and Rule 5 is followed.
  9. When end-date is specified as seconds since the UNIX epoch, the start-date is considered to be in hour precision and Rule 5 is followed.
  10. When start-date is specified in epoch seconds and end-date is specified as either YYYY/MM/DD or YYYY/MM/DD:HH, the start-date is checked to see if it is evenly divisible by 86400. If it is, the start-date is considered to be in day precision, the hour on the end-date (if any) is ignored, and Rule 4 is followed. If the start-date is not evenly divisible by 86400, the start-date is considered to be in hour precision and either Rule 4 (if end-date includes an hour) or Rule 8 (if no hour on end-date) is followed.
  11. It is an error to specify end-date without specifying start-date.

The following table provides some examples that may make the rules more clear:

--start-time
value
--end-time value
None 2009/02/13 2009/02/14 12345696001 2009/02/13T16 12345408002
None today's files Error! May not have end-date without start-date
2009/02/13 20090213.00 through 20090213.23 20090213.00 through 20090213.23 20090213.00 through 20090214.23 20090213.00 through 20090214.005 20090213.00 through 20090213.236 20090213.00 through 20090213.165
12344832003 20090213.00 20090213.00 through 20090213.23 20090213.00 through 20090214.23 20090213.00 through 20090214.00 20090213.00 through 20090213.237 20090213.00 through 20090213.16
2009/02/13T00 20090213.00 20090213.008 20090213.00 through 20090214.008 20090213.00 through 20090214.00 20090213.00 through 20090213.16 20090213.00 through 20090213.16
2009/02/13T14 20090213.14 20090213.148 20090213.14 through 20090214.148 20090213.14 through 20090214.00 20090213.14 through 20090213.16 20090213.14 through 20090213.16
12345336004 20090213.14 20090213.148 20090213.14 through 20090214.148 20090213.14 through 20090214.00 20090213.14 through 20090213.16 20090213.14 through 20090213.16
11234569600 is equivalent to 2009-02-14 00:00:00
21234540800 is equivalent to 2009-02-13 16:00:00
31234483200 is equivalent to 2009-02-13 00:00:00
41234533600 is equivalent to 2009-02-13 14:00:00
5end-date in epoch format forces start-date to be used in hour precision
6end-date hour is ignored when start-date has no hour
7end-date hour is ignored when start-date in epoch format falls on a day boundary
8end-date hour is set to the start-date hour

52. Why does --type=inweb contain non-web data?

SiLK categorizes a flow as web if the protocol is TCP and either the source port or destination port is one of 80, 443, or 8080. Since SiLK does not inspect the contents of packets, it cannot ensure that only HTTP traffic is written to this type, nor can it find HTTP traffic on other ports.

53. How can I make rwfilter always process incoming and outgoing data?

Using the default settings, rwfilter will only examine incoming data unless you specify the --types or --flowtypes switch on its command line. To have rwfilter always examine incoming and outgoing data, modify the silk.conf file at your site. Find the default-types statement in that file, and modify it to include out outweb outicmp.

54. Why do different installations of SiLK show different timestamps when viewing the same file and how can I fix this?

SiLK stores timestamps as seconds since midnight UTC on Jan 1, 1970 (the UNIX epoch), but these timestamps may be displayed differently depending on how SiLK was configured when it was installed, on your environment variable settings, and on command line switches.

When your administrator built SiLK, she configured it to use either UTC or the local timezone by default (the --enable-localtime switch to configure controls this). To see which setting is enabled at your site, check the Timezone support value in the output from rwfilter --version.

If one or more of your different installations of SiLK are configured to use localtime and the timezones are not identical, the displayed timestamps will be different. There are several work-arounds to make the displayed times agree.

  1. When a SiLK installation uses the localtime setting, setting the TZ environment variable modifies the timezone in which timestamps are displayed. In particular, setting TZ to 0 causes timestamps to be displayed in UTC.
  2. The --timestamp-format switch can be used to override the timezone setting in SiLK. Specifying --timestamp-format=utc shows times in UTC, while --timestamp-format=local causes the timestamps to be displayed in the local timezone (subject to modification by the TZ environment variable).
  3. Using --timestamp-format=epoch displays the timestamps using SiLK's internal representation. (For more on the --timestamp-format switch, see rwcut.)

Finally, note that the timezone setting also effects how tools such as rwfilter parse the timestamps you specify on the command line. If SiLK is configured to use localtime, the timestamps are parsed in the local timezone. In this case, you can use the TZ environment variable to modify which timezone is applied when the times are parsed. Alternatively, you can specify the times as seconds since the UNIX epoch. On Linux: [timberline.sei]$ date -ud 'TZ="EST5EDT" 2015/12/08 11:33:44' '+%Y/%m/%dT%H' 2015/12/08T16 [timberline.sei]$ date -ud 'TZ="EST5EDT" 2015/07/08 11:33:44' '+%Y/%m/%dT%H' 2015/07/08T15 With the Perl Time::ParseDate module: /opt/local/bin/perl -MTime::ParseDate -lwe '$,=$/; print scalar gmtime parsedate("2015/08/08 14:15:16", ZONE => "EST");' Sat Aug 8 19:15:16 2015 With Python module: print calendar.timegm(dateutil.parser.parse("2015/08/08 14:15:16").timetuple()) 1439043316

55. How do I import flow data into Excel?

To get SiLK Flow data into Excel, use the rwcut command to convert the binary SiLK data to a textual CSV (comma separated value) file, and import the file into Excel. You need to provide the --delimited=, --timestatmp-format=iso switches to rwcut. Use the --output-path=FILE.csv switch to have rwcut write its output to a file.

56. How can I use plug-ins (or dynamic-libraries) to extend the SiLK tools?

Several of the SiLK tools support extending their capabilities by writing code and including that code into the application:

rwfilter
New ways to partition the flow records into the pass-destination and fail-destination can be defined.
rwcut
New textual column(s) can be displayed for each flow record.
rwsort
Sort-order can be determined by a derived attribute of the flow records.
rwuniq
New fields for binning the flow records can be defined and printed, and new value fields that compute an aggregate value across the bins can be defined and printed.
rwstats
New fields for binning the flow records can be defined and printed, and new value fields that compute an aggregate value across the bins can be defined and printed. In addition, the output can be sorted using the aggregate field.
rwgroup
New fields for binning the flow records can be defined.

The code for these extensions can be written either in C or in Python. (To use Python, SiLK must have been built with the Python extension, PySiLK. See the Installation Handbook for the instructions.)

To use C, one writes the code, compiles it into a shared object, and loads the shared object into the application using the --plugin switch. This process is documented in the silk-plugin(3) manual page.

To use Python, one writes the code and loads it into the application using the --python-file switch. This process is documented in the silkpython(3) manual page.

57. How do I convert packet data (pcap) to flows?

There are four ways to handle pcap files.

  1. The first approach does not require any software outside of SiLK; however, it does require that SiLK is built with pcap support; that is, that libpcap and pcap.h existed when SiLK was compiled. For this approach, use the rwptoflow program to convert each packet to a SiLK Flow record. Note that rwptoflow does not reassemble fragmented packets, and it does not combine packets into a flow. It simply converts each pcap record into a 1-packet SiLK Flow record.
    $ rwptoflow --flow-output=my-data.rw my-data.pcap
  2. The second and third approaches both use the yaf program (from the YAF suite), and they both require that SiLK be built with IPFIX support (provided by libfixbuf--see the Installation Handbook for details on compiling SiLK with libfixbuf). This second approach works well if you have a small number of pcap files covering a fairly small time window. Invoke yaf to convert the pcap data to the IPFIX format, and use rwipfix2silk to convert from IPFIX to SiLK Flow records. For maximum compatibility, you should pass the --silk switch to yaf.
    $ yaf --silk --in=my-data.pcap --out=- | rwipfix2silk > my-data.rw
    To make this task easier, SiLK provides the rwp2yaf2silk Perl script which is a wrapper around the calls to those two tools. (For rwp2yaf2silk to work, both yaf and rwipfix2silk must be on your $PATH.)
    $ rwp2yaf2silk --in=my-data.pcap --out=my-data.rw
  3. The third approach uses rwflowpack to create a repository of SiLK Flow data, and this approach is suggested when you have many pcap files spanning a large time window. For this approach, use yaf to convert the pcap data to files of IPFIX records, and have rwflowpack convert the IPFIX files to a repository of SiLK Flow data. rwflowpack requires a sensor.conf file that describes how to define incoming and outgoing data. The following sensor.conf file categorizes all data as moving between two hosts outside our network at sensor S0. To query this data with rwfilter, specify --type=ext2ext.
    probe S0 ipfix
        poll-directory /tmp/rwflowpack/incoming
    end probe
    sensor S0
        ipfix-probes S0
        source-network external
        destination-network external
    end sensor
    Have yaf write the IPFIX files into the directory specified in the sensor.conf file.
    $ yaf --silk --in=my-data.pcap \
        --out=/tmp/rwflowpack/incoming/my-data.yaf
    The invocation of rwflowpack will resemble
    $ rwflowpack --sensor-conf=sensor.conf --root-directory=/data \
        --log-directory=/tmp/rwflowpack/log
  4. The final approach is to use third-party software to convert the pcap data to NetFlow v5 data, and use rwflowpack to convert the NetFlow v5 data to a repository of SiLK Flow data.

58. What is the difference between rwp2yaf2silk and rwptoflow?

Both rwp2yaf2silk and rwptoflow read a packet capture file and produce SiLK Flow records. The primary difference that rwp2yaf2silk assembles multiple packets into a single flow record, whereas rwptoflow does not; instead, it simply creates a 1-packet flow record for every packet it reads.

If both tools are available, rwp2yaf2silk is usually the better tool, but rwptoflow can be useful if you want to use the SiLK Flow records as an index into the pcap file (for example, when using rwpmatch).

Behind the scenes, rwp2yaf2silk is a Perl script that invokes the yaf and rwipfix2silk programs, so both of those programs must exist on your PATH. rwptoflow is a compiled C program that uses libpcap directly to read the pcap file.

59. I have data in some other format. How do I incorporate that into SiLK?

Please see the Converting data to SiLK format entry on our Tooltips wiki.

60. How do I make lists of IP addresses and label them?

A prefix map file in SiLK provides a label for every IPv4 address. (We have not yet extended prefix map files to support IPv6 addresses.) Use the rwpmapbuild tool to convert a text file of CIDR-block/label pairs to a binary prefix map file. The rwcut, rwfilter, rwuniq, and rwsort tools provide support for printing, partitioning by, binning by, and sorting by the labels you defined.

61. How do I mate unidirectional flows to get both sides of the conversation?

The rwmatch program can be used to mate flows. Create two files that contain the data you are interested in mating. Use rwsort to order the records in each file. (When matching TCP and/or UDP flows, the recommended sort order is shown below.) Run rwmatch over the sorted files to mate the flows. rwmatch writes a match parameter into the next hop IP field on each record that it matches. When using rwcut to display the output file produced by rwmatch, consider using the cutmatch.so plug-in to display the match parameter that rwmatch writes into the next hop IP field.

$ rwsort --fields=1,4,2,3,5,9  incoming.rw > incoming-query.rw
$ rwsort --fields=2,3,1,4,5,9  outgoing.rw > outgoing-response.rw
$ rwmatch --relate=1,2 --relate=4,3 --relate=2,1 --relate=3,4  \
    incoming-query.rw outgoing-response.rw mated.rw
$ rwcut --plugin=cutmatch.so --fields=1,3,match,2,4,5 mated.rw
62. I have SiLK deployed in an asymmetric routing environment, can I mate across sensors?

Yes, you can use the rwmatch program as described in the previous FAQ entry to mate across sensors.

63. How can I create obfuscated (anonymized) data?

The rwrandomizeip application will obfuscate the source and destination IP addresses in a SiLK data file. It can operate in one of two modes:

  1. In default mode, rwrandomizeip substitutes a pseudo-random, non-routable IP address for each source and destination IP address it sees. An IP address that appears multiple times in the input will be mapped to different output address each time, and no structural information in the input will be maintained.
  2. In consistent mode, rwrandomizeip creates four shuffle tables, each having 256 entries where the value is a pseudo-random value from 0 to 255. These tables represent the possible values for each octet in an IPv4 address. rwrandomizeip uses the tables to modify the IP addresses in a consistent way, which allows a conversation between two IP addresses to be visible in the anonymized data.

In addition, note that the file's header may contain information that you would rather not make public (such as a history of commands). You can use rwfileinfo to see these headers. To remove the headers, invoke rwcat on the file.

For a different approach, consider converting the data to text with rwcut, obfuscating the IPs, and then converting back to SiLK format with rwtuc. The procedure is documented in this tooltip.

64. How secure is the anonymized data?

Anonymizing/Obfuscating data is hard. You should be cautious of how widely you distribute data that rwrandomizeip has processed:

  • The rwrandomizeip program only anonymizes the source and destination IP address. Any additional information in the data (such as the existence of services that run on well known ports or protocols) is still visible.
  • In consistent mode, the data is much less random, since the value in an octet is always mapped to the same value. Given the structure of IP addresses on the Internet, reversing the mapping would not be difficult.
  • The default mode does not suffer from that problem, but you cannot do any meaningful traffic analysis on the anonymized data since the mapping is not consistent.