CERT/CC
background
background
CERT NetSA Security Suite 
Open Source Tools for Network Monitoring 
News | Downloads | Documentation | Wiki | Tooltips
SiLK 2.1.0 | YAF 1.0.0.2 | IPA 0.4.0 | fixbuf 0.8.0 | Portal 0.9.0 | RAVE 1.9.16 | iSiLK 0.1.6
SiLK - Frequently Asked Questions
Documentation | Downloads | Release Notes | FAQ | License | Credits | Reference Data | Live CD
Background
1. What is SiLK?
2. Does SiLK support IPv6?
3. What platforms does SiLK run on?
4. What license is SiLK released under?
5. Something is not working as expected, where can I check for errors?
6. Whom do I contact for support?
7. How do I report a bug?
8. How do I contribute a patch or fix?
9. How do I reference SiLK in a publication?
Configuration
10. What is network flow data?
11. What applications and hardware can generate the flows for use in SiLK?
12. What is the NetFlow v5 format?
13. Why does SiLK create unidirectional flows?
14. Can I make it bidirectional?
15. I have a stack of pcap (tcpdump) files, can I use SiLK to analyze them?
16. Why is rwflowpack discarding the flow interfaces and Next Hop IP?
17. What is IPFIX?
18. How many sensors does SiLK support?
19. Can I copy SiLK data between machines?
20. What ports do I need to open in a firewall?
21. Can I split flows seen by one flow meter into different sensors?
Building and Installing
22. Where can I download SiLK?
23. Where can I find RPMs for SiLK?
24. When I configure --with-python, I get error messages saying, "warning: Not importing directory 'site': missing __init__.py". How do I fix this?
Operations
25. How can I improve the performance of the SiLK queries?
26. How are the SiLK Flow files organized and written to disk?
27. How many bytes does a single SiLK Flow record occupy on disk?
28. Where is the SiLK Flow file format documented?
29. How much disk to I need to store on a link of size xxx?
30. How much bandwidth will be used by rwsender?
31. What is the latency of the SiLK packing system?
32. What confidentiality and integrity properties are provided for SILK data sent across machines?
33. If communication between the sensor and the packer go down, are flows lost?
34. Can flowcap function as a "tee", both storing files and forwarding the flow stream onto some place else?
35. How do I list all sensors that are installed for a deployment?
36. How do I rotate the SiLK log files?
Analysis
37. What does --type=inweb? contain?
38. How do I import flow data into Excel?
39. How do I convert packet data to flows?
40. I have data in some other format. How do I incorporate that into SiLK?
41. How do I make lists of IP addresses and label them?
42. How do I mate unidirectional flows to get both sides of the conversation?
43. I have SiLK deployed in an asymmetric routing environment, can I mate across sensors?
44. How can I create obfuscated (anonymized) data?
45. How secure is the anonymized data?

Background

^ 1. What is SiLK?

SiLK is a suite of network traffic collection and analysis tools developed and maintained by the CERT Network Situational Awareness Team (CERT NetSA) at Carnegie Mellon University to facilitate security analysis of large networks. The SiLK tool suite supports the efficient collection, storage, and analysis of network flow data, enabling network security analysts to rapidly query large historical traffic data sets.

^ 2. Does SiLK support IPv6?

Support for IPv6 was added in SiLK 1.0 to the tools listed below. To use IPv6, SiLK must be configured for IPv6 by using the --enable-ipv6 switch to the configure script when you are building SiLK. See the Installation Handbook for details.

rwfilter rwcount rwstats
rwsort rwtuc rwgroup
rwuniq rwappend rwnetmask
rwcut rwcat

^ 3. What platforms does SiLK run on?

SiLK should run on most UNIX-like operating systems. It is most heavily tested on Linux, Solaris, and Mac OS X.

^ 4. What license is SiLK released under?

SiLK is released under two licenses:

  • GNU Public License (GPL) Rights pursuant to Version 2, June 1991
  • Government Purpose License Rights (GPLR) pursuant to DFARS 252.227.7013

^ 5. Something is not working as expected, where can I check for errors?

The applications that make up the packing system (flowcap, rwflowpack, rwflowappend, rwsender, and rwreceiver) write error messages to log files. The location of these log files is set when the daemon is started, with the default location being /usr/local/var/silk.

All other applications write error messages to the standard error (stderr).

^ 6. Whom do I contact for support?

Your primary support person should be the person or group that installs and maintains SiLK at your site. Also see the answer to the following question.

^ 7. How do I report a bug?

Send a detailed bug report to contact_email. Your bug report should include:

  • the operating system you are using (the output of uname -a)
  • the version of the tool that is causing the bug. You can determine this by running TOOL --version, e.g., rwfilter --version. Include the entire output so will we know what optional features the tool may be using.
  • If you cannot run TOOL --version or it exits without printing anything, send the output of ldd TOOL (or the ldd equivalent on your operating system).
  • If you cannot build the tool, the version of SiLK you are attempting to install and the complete error message that make gives you.
  • the exact command line that caused the problem. If the command is part of a pipeline, include the entire pipeline since the bug may be caused by something happening upstream. You may obfuscate IP addresses or sensor names in the command, but please let us know that you have modified the command.
  • the complete error message you receive
  • If the command is reading SiLK data files, the output of running rwfileinfo on those files may be helpful.

You can help us help you by writing an effective bug report.

^ 8. How do I contribute a patch or fix?

We welcome bug fixes and patches. You may send them to contact_email.

^ 9. How do I reference SiLK in a publication?

The BibTeX entry format would be:

@MISC{SiLK,
 author = "CERT/NetSA at Carnegie Mellon University",
 title = "{SiLK (System for Internet-Level Knowledge)}",
 howpublished = "[Online]. Available:
    \url{http://tools.netsa.cert.org/silk}.",
 note = "[Accessed: July 13, 2009]"}

Update the "Accessed" date to the day you accessed the SiLK website, and then you can cite the software in a LaTeX document using \cite{SiLK}.

The final output should look like this:

CERT/NetSA at Carnegie Mellon University. SiLK (System for Internet-Level Knowledge). [Online]. Available: http://tools.netsa.cert.org/silk. [Accessed: July 13, 2009].

Configuration

^ 10. What is network flow data?

(Taken from Chapter 2 of the SiLK Analysts' Handbook [PDF].) NetFlow is a traffic-summarization format that was first implemented by Cisco Systems, primarily for billing purposes. Network flow data (or Network flow) is a generalization of NetFlow.

Network flow collection differs from direct packet capture, such as tcpdump, in that it builds a summary of communications between sources and destinations on a network. This summary covers all traffic matching seven particular keys that are relevant for addressing: the source and destination IP addresses, the source and destination ports, the protocol type, the type of service, and the interface on the router. We use five of these attributes to constitute the flow label in SiLK: the source and destination addresses, the source and destination ports, and the protocol. These attributes (sometimes called the 5-tuple), together with the start time of each network flow, distinguish network flows from each other.

A network flow often covers multiple packets, which are grouped together under a common flow label. A flow record thus provides the label and statistics on the packets that the network flow covers, including the number of packets covered by the flow, the total number of bytes, and the duration and timing of those packets. Because network flow is a summary of traffic, it does not contain packet payload data.

^ 11. What applications and hardware can generate the flows for use in SiLK?

SiLK accepts flows in the NetFlow v5 format from a router. These flows are sometimes called Protocol Data Units (PDU). You can also find software that will generate NetFlow v5 records from various types of input.

When compiled with libfixbuf support, SiLK can accept NetFlow v9 flows and flows in the IPFIX (Internet Protocol Flow Information eXport) format. You can use the yaf flow meter to generate IPFIX flows from libpcap (tcpdump) data or by live capture.

^ 12. What is the NetFlow v5 format?

The definition of NetFlow v5 format is available in the following tables copied from Cisco (October 2009). A NetFlow v5 packet has a 24 byte header and up to thirty 48 byte records. The maximum NetFlow v5 packet is 1464 bytes. The NetFlow v5 header and record formats are specified in the following tables. The record table also lists the SiLK field name, where applicable, but note that SiLK packs the fields differently than NetFlow.

Bytes Contents Description
0-1 version NetFlow export format version number
2-3 count Number of flows exported in this packet (1-30)
4-7 SysUptime Current time in milliseconds since the export device booted
8-11 unix_secs Current count of seconds since 0000 UTC 1970
12-15 unix_nsecs Residual nanoseconds since 0000 UTC 1970
16-19 flow_sequence Sequence counter of total flows seen
20 engine_type Type of flow-switching engine
21 engine_id Slot number of the flow-switching engine
22-23 sampling_interval First two bits hold the sampling mode; remaining 14 bits hold value of sampling interval
Bytes Contents Description SiLK Field
0-3 srcaddr Source IP address sIP
4-7 dstaddr Destination IP address dIP
8-11 nexthop IP address of next hop router nhIP
12-13 input SNMP index of input interface in
14-15 output SNMP index of output interface out
16-19 dPkts Packets in the flow packets
20-23 dOctets Total number of Layer 3 bytes in the packets of the flow bytes
24-27 First SysUptime at start of flow sTime
28-31 Last SysUptime at the time the last packet of the flow was received eTime
32-33 srcport TCP/UDP source port number or equivalent sPort
34-35 dstport TCP/UDP destination port number or equivalent dPort
36 pad1 Unused (zero) bytes -
37 tcp_flags Cumulative OR of TCP flags flags
38 prot IP protocol type (for example, TCP = 6; UDP = 17) protocol
39 tos IP type of service (ToS) n/a
40-41 src_as Autonomous system number of the source, either origin or peer n/a
42-43 dst_as Autonomous system number of the destination, either origin or peer n/a
44 src_mask Source address prefix mask bits n/a
45 dst_mask Destination address prefix mask bits n/a
46-47 pad2 Unused (zero) bytes -
^ 13. Why does SiLK create unidirectional flows?

SiLK's origins are in processing NetFlow v5 data, which is unidirectional. Changing SiLK to support bidirectional flows would be major change to the software. Even if SiLK supported bidirectional flows, you would still face the task of mating flows, since a site with a many access points to the Internet will display asymmetric routing (where each half of a conversion passes through different border routers).

^ 14. Can I make it bidirectional?

No, SiLK does not support bidirectional flows. You will need to mate the unidirectional flows, as described in this FAQ entry.

^ 15. I have a stack of pcap (tcpdump) files, can I use SiLK to analyze them?

Yes you can. See this FAQ entry.

^ 16. Why is rwflowpack discarding the flow interfaces and Next Hop IP?

In our experience, the flow interfaces (or SNMP interfaces) and Next Hop IP do not provide much useful information for security analysis, and by default SiLK does not include them in our packed data files. However, if you wish to store these values or use them for debugging your packing configuration, you can instruct rwflowpack to store the SNMP interfaces and Next Hop IP by giving the it the --pack-interfaces switch.

^ 17. What is IPFIX?

IPFIX is the Internet Protocol Flow Information eXport format. Based on the NetFlow v9 format from CISCO, IPFIX is the draft IETF standard for representing flow data. The rwipfix2silk and rwsilk2ipfix programs in SiLK---which are available when SiLK has been configured with libfixbuf support---will convert between the SiLK Flow format and the IPFIX format.

^ 18. How many sensors does SiLK support?

The SiLK Flow format is capable of representing 65534 unique sensors.

^ 19. Can I copy SiLK data between machines?

Yes, a binary file produced by a SiLK application will store its format, version, byte order, and compression method near the beginning of the file (in the file's header). (You can use the rwfileinfo tool to get a description of the contents of the file's header.) Any release of SiLK that understands that file version should be able to read the file. However, note that if the file's data is compressed, the SiLK tools on the second machine must have been compiled with support for that compression library. The SiLK tools will print an error and exit if they are unable to read a file because the tool does not understand the file's format, version, or compression method.

^ 20. What ports do I need to open in a firewall?

SiLK does not use any hard-coded ports. All SiLK tools that do network communication (flowcap, rwflowpack, rwsender, and rwreceiver) have some way to specify which ports to use for communication.

When flowcap or rwflowpack collect flows from a router, you will need to open a port for UDP traffic between the router and the collection machine.

When flowcap or rwflowpack collect flows from a yaf sensor running on a different machine, you will need to open a port for TCP (or SCTP) traffic between these two machines.

Finally, when you are using flowcap on remote sensor(s) that feed data to rwflowpack running on a central data repository, you will need to open a port between each sensor and your repository. Configure flowcap or rwsender on the sensor and rwflowpack or rwreceiver on repository to use that port.

See the tools' manual pages and the Installation Handbook for details on specifying ports.

^ 21. Can I split flows seen by one flow meter into different sensors?

Currently this is not possible. Each flow collection point (called a probe in the SiLK documentation) corresponds to one unique sensor.

Building and Installing

^ 22. Where can I download SiLK?

The latest Open Source version of SiLK and selected previous releases are available from http://tools.netsa.cert.org/silk/silk_download.html.

^ 23. Where can I find RPMs for SiLK?

RPMs of SiLK and the other NetSA Tools are available here on the NetSA Security Suite Wiki.

^ 24. When I configure --with-python, I get error messages saying, "warning: Not importing directory 'site': missing __init__.py". How do I fix this?

This error message happens if you are running Python >= 2.5, and the PYTHONPATH environment variable includes the current working directory. (Python is attempting to treat the site directory in the SiLK source tree as a Python module directory.) Examples of PYTHONPATH values that can cause this error are any path beginning or ending with a colon (':'), and any path including a period ('.') as an element.

The solution to this problem is to either unset the PYTHONPATH before running configure, or to ensure that all references to the current working directory are removed from PYTHONPATH before running configure.

Operations

^ 25. How can I improve the performance of the SiLK queries?

As analysts, it seems we spend a lot of time waiting for rwfilter to pull data from the repository. One way to reduce the wait time is to write efficient queries. Here are some good practices to follow:

  1. Only look at the files that have the data you are interested in.
    • Specify the hour to the --start-date and --end-date switches to reduce the time window.
    • If traffic for the IPs you are interested in normally passes through particular border routers, use the --sensor switch to limit your search to those sensors.
    • Limit the query to the relevant class(es) and type(s). For example, when looking at DNS traffic you do not need the web traffic, so specify --type=in or --type=out to eliminate the web traffic from your data pull.
  2. Instead of repeating the same rwfilter command multiple times and piping the results to different applications, save the rwfilter results to a local file, and use the file as input to the different applications.
  3. Rather than querying the same time range multiple times with slightly different parameters, consolidate the query into a single rwfilter invocation, and then split the result. For example:
    • Instead of issuing two rwfilter commands to pull TCP and then UDP traffic, pull both protocols at once and then split the result:
      $ rwfilter --protocol=6,17 --pass=temp.rwf ...
      $ rwfilter --proto=6 --pass=tcp.rwf --fail=udp.rwf temp.rwf
    • If you want to pull data for a set of IP addresses, build an IPset with rwsetbuild, and use one of the set switches on rwfilter:
      $ rwsetbuild myips.txt myset.set
      $ rwfilter ... --dipset=myset.set
  4. Take advantage of additional filtering options for your initial pull to restrict the query to the traffic of interest.
    • You can use country code and protocol to restrict the traffic in a coarse grain way--i.e., cast a sufficiently broad net so you don't have to re-issue queries for the same time period.
    • If you are only interested in completed TCP connections, you can filter using TCP flags (e.g., --flags-initial) and byte and packet counts (e.g., flows with more than 5 packets --packets=5-).
    • Outgoing traffic is always smaller than incoming, due to incoming scan traffic. If you are looking at TCP traffic and you just need evidence of communication, consider specifying the outgoing types (--type=out,outweb) rather than incoming.
  5. Instead of using IPsets, consider using the --tuple options to rwfilter. The tuple options allow you to search both directions at once and to limit your search to traffic between particular IP addresses and/or particular ports.
  6. Sometimes it is easier to specify what you don't need. Use the --fail switch on rwfilter to select the flows that don't match the partitioning parameters.

^ 26. How are the SiLK Flow files organized and written to disk?

SiLK Flows are stored in binary files, where each file corresponds to unique class-type-sensor-hour tuple. Multiple data repositories may exist on a machine; however, rwfilter is only capable of examining a single data repository per invocation.

A default repository location is compiled into rwfilter. (This default is set by the --enable-data-rootdir=DIR switch to configure and defaults to /data). You may tell rwfilter to use a different repository by setting the SILK_DATA_ROOTDIR environment variable or specifying the --data-rootdir switch to rwfilter.

The structure of the directory tree beneath the root is determined by the path-format entry in the silk.conf file for each data repository. Traditionally, the directory structure has been /DATA_ROOTDIR/class/type/year/month/day/hourly-files

^ 27. How many bytes does a single SiLK Flow record occupy on disk?

A fully-expanded, uncompressed, SiLK Flow record requires 52 bytes (this is 88 bytes for IPv6 records). These records are written by rwcat --compression=none.

Records in the SiLK data repository require less space since common attributes (sensor, class, type, hour) are stored once in the file's header. The smallest record (uncompressed) in the data repository is that representing a web flow which requires only 22 bytes.

In addition, one can enable data compression in an individual SiLK application (with the --compression-method switch) or in all SiLK applications when SiLK is configured (specify the --enable-output-compression switch when you invoke the configure script). Compression with the lzo1x algorithm reduces the overall file size by about 50%. Using zlib gives a better compression ratio, but the at the cost of access time.

The rwfileinfo command will tell you the (uncompressed) size of records in a SiLK file.

^ 28. Where is the SiLK Flow file format documented?

SiLK uses many different file formats. There are file formats for IPsets, for Bags, and for Prefix Maps; in addition, there several file formats for SiLK Flow records. These file formats are used to provide maximum compression the data in the SiLK Flow repository.

The rwcat tool can be used on any SiLK Flow file(s) to write the Flows into a known format. The rwcat command to use is:
rwcat --compression=none --byte-order=big [--ipv4-output] FILE1 FILE2 ...

That command will produce an output stream/file having a standard SiLK header followed by 0 or more records in the format given in the following table. The length of the SiLK header is the same as the size of the records in the file.

When SiLK is not compiled with IPv6 support or the --ipv4-output switch is given, each record will be 52 bytes long, and the header is 52 bytes; otherwise each record is 88 bytes and the file's header is 88 bytes.

The other SiLK Flow file formats are only documented in the comments of the source files. See the rw*io.c files in the silk/src/libsilk directory.

IPv4 Bytes IPv6 Bytes Field Description
0-7 0-7 sTime Flow start time as milliseconds since UNIX epoch
8-11 8-11 dur Duration of flow in milliseconds (allows for a 49 day flow)
12-13 12-13 sPort Source port
14-15 14-15 dPort Destination port
16 16 protocol IP protocol
17 17 class,type Class & Type (Flowtype) value as set by SiLK packer (integer to name mapping determined by silk.conf)
18-19 18-19 sensor Sensor ID as set by SiLK packer (integer to name mapping determined by silk.conf)
20 20 flags Cumulative OR of all TCP flags (NetFlow flags)
21 21 initialFlags TCP flags in first packet or blank
22 22 sessionFlags Cumulative OR of TCP flags on all but initial packet or blank
23 23 attributes Specifies various attributes of the flow record
24-25 24-25 application Guess as to the content the flow. Some software that generates flow records from packet data, such as yaf, will inspect the contents of the packets that make up a flow and use traffic signatures to label the content of the flow. The application is the port number that is traditionally used for that type of traffic (see the /etc/services file on most UNIX systems).
26-27 26-27 n/a Unused
28-29 28-29 in Router incoming SNMP interface
30-31 30-31 out Router outgoing SNMP interface
32-35 32-35 packets Count of packets in the flow
36-39 36-39 bytes Count of bytes on all packets in the flow
40-43 40-55 sIP Source IP
44-47 55-71 dIP Destination IP
48-51 72-87 nhIP Router Next Hop IP
^ 29. How much disk to I need to store a link of size xxx?

There are many factors that determine the amount of space required, including (1) the size of the link being monitored, (2) the link's utilization, (3) the type of traffic being collected and stored (NetFlow-v5, IPFIX-IPv4, or IPFIX-IPv6), (4) the amount of legacy data to store, and (5) the number of flows records generated from the data. The SiLK Provisioning Spreadsheet allows one to see how modifying the first four factors affects the disk space required. (The spreadsheet specifies a value for the fifth factor based on our experience.)

^ 30. How much bandwidth will be used by rwsender?

The factors that affect the bandwidth required by rwsender to transfer to the storage center flows collected by a flowcap daemon running near a sensor are nearly identical to those that determine the amount of disk space required (see previous entry). The SiLK Provisioning Spreadsheet includes bandwidth calculations.

^ 31. What is the latency of the SiLK packing system?

The latency of the packing system (the time from a flow being collected to it being available for analysis) depends on how the packing system has been configured and additional factors. It can be a few seconds for a simple configuration or a few minutes for a complex one. A separate page provides more detail.

^ 32. What confidentiality and integrity properties are provided for SILK data sent across machines?

The rwsender and rwreceiver programs can use GnuTLS to provide a secure layer over a reliable transport layer. For this support to be available, SiLK's configure script must have found v1.4.1 or later of the GnuTLS library. Using GnuTLS also requires creating certificates, which is described in an appendix of the Installation Handbook.

We recommend creating a local certificate authority (CA) file, and creating program-specific certificates signed by that local CA. The local CA and program-specific certificates are copied onto the machines where rwsender and rwreceiver are running. The local CA acts as a shared secret: it is on both machines and it is used to verify the asymmetric keys between the rwsender and rwreceiver certificates.

If someone else has access to the local CA, they would not be able to decipher the conversation, since the conversation is encrypted with a private key that was negotiated during the initialization of the TLS session.

However, anyone with access to the CA would be able to set up a new session with an rwsender (to download files) or an rwreceiver (to spoof files). The certificates should be one part of your security; additional measures (such as firewall rules) should be enabled to mitigate these issues.

When GnuTLS is not used or not available, communication between rwsender and rwreceiver has no confidentiality or integrity checking beyond that provided by standard TCP.

Legacy systems that use a direct connection between flowcap and rwflowpack have no confidentiality or integrity checking beyond that provided by standard TCP, and there is no way to secure this communication without using some outside method (such as creating an ssh tunnel).

^ 33. If communication between the sensor and the packer go down, are flows lost?

It depends on what you mean by "sensor". If the "sensor" is the flow generator (that is, a router or an IPFIX sensor) which is communicating directly with rwflowpack, the flows are lost when the connection goes down.

To avoid this, you can run flowcap on the sensor. flowcap acts as a flow capacitor, storing flows on the sensor until the communication link between the sensor and packer is restored. Flows will still be lost if the connection between the flow generator and flowcap goes down, but by running flowcap on a machine near the flow generator (or running both on the same machine), the communication between the generator and flowcap should be more reliable, leading to fewer dropped connections.

^ 34. Can flowcap function as a "tee", both storing files and forwarding the flow stream onto some place else?

The flowcap program cannot do this itself; however, the rwsender program can send files to multiple rwreceivers. To get the "tee" functionality, have flowcap drop its files into a directory for processing by rwsender.

^ 35. How do I list all sensors that are installed for a deployment?

The mapsid command will print all the sensors that have been defined at your site.

^ 36. How do I rotate the SiLK log files?

If you invoke a SiLK daemon with the --log-destination=syslog switch, the daemon will use the syslog(3) command to write log messages, and syslog will manage log rotation.

If you pass the --log-directory switch to a daemon, the daemon will manage the log files itself. The first message received after midnight local time will cause the daemon to close the current log file, compress it, and open a new log file.

Analysis

^ 37. Why does --type=inweb? contain non-web data?

SiLK categorizes a flow as web if the protocol is TCP and either the source port or destination port is one of 80, 443, or 8080. Since SiLK does not inspect the contents of packets, it cannot ensure that only HTTP traffic is written to this type, nor can it find HTTP traffic on other ports.

^ 38. How do I import flow data into Excel?

To get SiLK Flow data into Excel, use the rwcut command to convert the binary SiLK data to a textual CSV (comma separated value) file, and import the file into Excel. You need to provide the --delimited=, --legacy-timestamps switches to rwcut. Use the --output-path=FILE.csv switch to have rwcut write its output to a file.

^ 39. How do I convert packet data (pcap) to flows?

There are four ways to handle pcap files:

  1. Use the yaf program (from the YAF suite) to convert the pcap data to the IPFIX format, and use the rwipfix2silk program from SiLK to convert from IPFIX to a stream of SiLK Flow records. For maximum compatibility, you should pass the --silk switch to yaf. SiLK provides the rwp2yaf2silk Perl script to make this task easier. The rwipfix2silk program is only available when SiLK has been configured with libfixbuf support; see the Installation Handbook for details.
  2. Use the yaf program to convert the pcap data to the IPFIX format, and send that data over the network to rwflowpack, which will convert from IPFIX to a repository of SiLK Flow data. This also requires that SiLK be configured with libfixbuf support.
  3. Use the rwptoflow program, included with SiLK, to convert each packet to a SiLK Flow. Note that this tool does not combine packets into a flow, it simply converts each pcap record into a 1-packet SiLK Flow record.
  4. Search the web for software to convert the pcap data to NetFlow v5 data, and use rwflowpack to convert the NetFlow v5 data to a repository of SiLK Flow data.

^ 40. I have data in some other format. How do I incorporate that into SiLK?

See this tooltip.

^ 41. How do I make lists of IP addresses and label them?

A prefix map file in SiLK provides a label for every IPv4 address. (We have not yet extended prefix map files to support IPv6 addresses.) Use the rwpmapbuild tool to convert a text file of CIDR-block/label pairs to a binary prefix map file. The rwcut, rwfilter, rwuniq, and rwsort tools provide support for printing, partitioning by, binning by, and sorting by the labels you defined.

^ 42. How do I mate unidirectional flows to get both sides of the conversation?

The rwmatch program can be used to mate flows. Create two files that contain the data you are interested in mating. Use rwsort to order the records in each file. (When matching TCP and/or UDP flows, the recommended sort order is shown below.) Run rwmatch over the sorted files to mate the flows. rwmatch writes a match parameter into the next hop IP field on each record that it matches. When using rwcut to display the output file produced by rwmatch, consider using the cutmatch.so plug-in to display the match parameter that rwmatch writes into the next hop IP field.

$ rwsort --fields=1,4,2,3,5,9  incoming.rwf > incoming-query.rwf
$ rwsort --fields=2,3,1,4,5,9  outgoing.rwf > outgoing-response.rwf
$ rwmatch --relate=1,2 --relate=4,3 --relate=2,1 --relate=3,4  \
    incoming-query.rwf outgoing-response.rwf mated.rwf
$ rwcut --plugin=cutmatch.so --fields=1,3,match,2,4,5 mated.rwf
^ 43. I have SiLK deployed in an asymmetric routing environment, can I mate across sensors?

Yes, you can use the rwmatch program as described in the previous FAQ entry to mate across sensors.

^ 44. How can I create obfuscated (anonymized) data?

The rwrandomizeip application will obfuscate the source and destination IP addresses in a SiLK data file. It can operate in one of two modes:

  1. In default mode, rwrandomizeip substitutes a pseudo-random, non-routable IP address for each source and destination IP address it sees. An IP address that appears multiple times in the input will be mapped to different output address each time, and no structural information in the input will be maintained.
  2. In consistent mode, rwrandomizeip creates four shuffle tables, each having 256 entries where the value is a pseudo-random value from 0 to 255. These tables represent the possible values for each octet in an IPv4 address. rwrandomizeip uses the tables to modify the IP addresses in a consistent way, which allows a conversation between two IP addresses to be visible in the anonymized data.

In addition, note that the file's header may contain information that you would rather not make public (such as a history of commands). You can use rwfileinfo to see these headers. To remove the headers, invoke rwcat on the file.

For a different approach, consider converting the data to text with rwcut, obfuscating the IPs, and then converting back to SiLK format with rwtuc. The procedure is documented in this tooltip.

^ 45. How secure is the anonymized data?

Anonymizing/Obfuscating data is hard. You should be cautious of how widely you distribute data that rwrandomizeip has processed:

  • The rwrandomizeip program only anonymizes the source and destination IP address. Any additional information in the data (such as the existence of services that run on well known ports or protocols) is still visible.
  • In consistent mode, the data is much less random, since the value in an octet is always mapped to the same value. Given the structure of IP addresses on the Internet, reversing the mapping would not be difficult.
  • The default mode does not suffer from that problem, but you cannot do any meaningful traffic analysis on the anonymized data since the mapping is not consistent.