The SiLK Reference Guide
(SiLK-2.1.0)

CERT Network Situational Awareness
©2002-2009 Carnegie Mellon University
 
The canonical location for this handbook is
http://tools.netsa.cert.org/silk/reference-guide.pdf

October 28, 2009

Use of the SiLK system and related source code is subject to the terms of the following licenses:

GNU Public License (GPL) Rights pursuant to Version 2, June 1991  
Government Purpose License Rights (GPLR) pursuant to DFARS 252.227.7013  
 
NO WARRANTY  
 
ANY INFORMATION, MATERIALS, SERVICES, INTELLECTUAL PROPERTY OR OTHER  
PROPERTY OR RIGHTS GRANTED OR PROVIDED BY CARNEGIE MELLON UNIVERSITY  
PURSUANT TO THIS LICENSE (HEREINAFTER THE "DELIVERABLES") ARE ON AN  
"AS-IS" BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY  
KIND, EITHER EXPRESS OR IMPLIED AS TO ANY MATTER INCLUDING, BUT NOT  
LIMITED TO, WARRANTY OF FITNESS FOR A PARTICULAR PURPOSE,  
MERCHANTABILITY, INFORMATIONAL CONTENT, NONINFRINGEMENT, OR ERROR-FREE  
OPERATION. CARNEGIE MELLON UNIVERSITY SHALL NOT BE LIABLE FOR INDIRECT,  
SPECIAL OR CONSEQUENTIAL DAMAGES, SUCH AS LOSS OF PROFITS OR INABILITY  
TO USE SAID INTELLECTUAL PROPERTY, UNDER THIS LICENSE, REGARDLESS OF  
WHETHER SUCH PARTY WAS AWARE OF THE POSSIBILITY OF SUCH DAMAGES.  
LICENSEE AGREES THAT IT WILL NOT MAKE ANY WARRANTY ON BEHALF OF  
CARNEGIE MELLON UNIVERSITY, EXPRESS OR IMPLIED, TO ANY PERSON  
CONCERNING THE APPLICATION OF OR THE RESULTS TO BE OBTAINED WITH THE  
DELIVERABLES UNDER THIS LICENSE.  
 
Licensee hereby agrees to defend, indemnify, and hold harmless Carnegie  
Mellon University, its trustees, officers, employees, and agents from  
all claims or demands made against them (and any related losses,  
expenses, or attorney’s fees) arising out of, or relating to Licensee’s  
and/or its sub licensees’ negligent use or willful misuse of or  
negligent conduct or willful misconduct regarding the Software,  
facilities, or other rights or assistance granted by Carnegie Mellon  
University under this License, including, but not limited to, any  
claims of product liability, personal injury, death, damage to  
property, or violation of any laws or regulations.  
 
Carnegie Mellon University Software Engineering Institute authored  
documents are sponsored by the U.S. Department of Defense under  
Contract F19628-00-C-0003. Carnegie Mellon University retains  
copyrights in all material produced under this contract. The U.S.  
Government retains a non-exclusive, royalty-free license to publish or  
reproduce these documents, or allow others to do so, for U.S.  
Government purposes only pursuant to the copyright license under the  
contract clause at 252.227.7013.

Contents

Introduction
1 SiLK Analysis Tools and Utilities
  mapsid
  num2dot
  rwaddrcount
  rwappend
  rwbag
  rwbagbuild
  rwbagcat
  rwbagtool
  rwcat
  rwcompare
  rwcount
  rwcut
  rwdedupe
  rwfglob
  rwfileinfo
  rwfilter
  rwgeoip2ccmap
  rwgroup
  rwidsquery
  rwip2cc
  rwipaexport
  rwipaimport
  rwipfix2silk
  rwmatch
  rwnetmask
  rwp2yaf2silk
  rwpcut
  rwpdedupe
  rwpmapbuild
  rwpmapcat
  rwpmatch
  rwptoflow
  rwrandomizeip
  rwresolve
  rwscan
  rwscanquery
  rwset
  rwsetbuild
  rwsetcat
  rwsetintersect
  rwsetmember
  rwsettool
  rwsetunion
  rwsilk2ipfix
  rwsort
  rwsplit
  rwstats
  rwswapbytes
  rwtotal
  rwtuc
  rwuniq
3 SiLK Plug-Ins
  addrtype
  ccfilter
  flowrate
  pmapfilter
  PySiLK: Silk in Python
  silkpython
5 SiLK File Formats
  sensor.conf
  silk.conf
7 SiLK Miscellaneous Information
  SiLK
8 SiLK Administrator’s Tools
  flowcap
  rwflowappend
  rwflowpack
  rwguess
  rwpackchecker
  rwreceiver
  rwsender

Introduction

The SiLK Reference Guide contains the manual page for each analysis tool, utility, plug-in, file format, and collection facility in the SiLK Collection and Analysis Suite.

This document is meant for reference only. The SiLK Analysis Handbook provides both a tutorial for learning about the tools and examples of how they can be used in analyzing flow data. See the SiLK Installation Handbook for instructions on installing SiLK at your site.

This reference guide is broken into sections like the traditional UNIX manual: end-user analysis tools and utilities are described in Section 1; the plug-ins that augment the behavior of some tools are presented in Section 3; Section 5 contains information about file formats; miscellaneous information is in Section 7; and commands for the installer and administor of SiLK appear in Section 8.

 1
SiLK Analysis Tools and Utilities

This section provides the manual page for each analysis tool and utility that the users of SiLK may employ in their day-to-day work.

mapsid

Map sensor name to sensor number or vice versa

SYNOPSIS

  mapsid [--site-config-file=FILENAME] [--print-classes]  
        [{ <sensor-name> | <sensor-number> } ...]

  mapsid --help

  mapsid --version

DESCRIPTION

mapsid is a utility that maps sensor names to sensor numbers or vice versa depending on the input arguments. When no arguments are given, the mapping of all sensor numbers to names is printed. When a numeric argument is given, the number to name mapping is printed for the specified argument. When a name is given, its numeric id is printed. For convenience when typing in sensor names, the case is irrelevant.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as –arg=param or –arg param, though the first form is required for options that take optional parameters.

–print-classes

For each sensor, print the classes for which the sensor collects data.

–site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK_CONFIG_FILE environment variable is used if that variable is not empty. The value of SILK_CONFIG_FILE should include the name of the file. Otherwise, the application looks for a file named silk.conf in the following directories: the directory specified in the SILK_DATA_ROOTDIR environment variable; the data root directory that is compiled into SiLK (use the –version switch to view this value); the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/; and the share/silk/ and share/ directories parallel to the application’s directory.

–help

Print the available options and exit.

–version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

Name to number mapping:

 $ mapsid beta  
 BETA -> 1

Number to name mapping:

 $ mapsid 3  
 3 -> DELTA

Print all mappings:

 $ mapsid  
  0 -> ALPHA  
  1 -> BETA  
  2 -> GAMMA  
  3 -> DELTA  
  4 -> EPSLN  
  5 -> ZETA  
      ....

ENVIRONMENT

SILK_CONFIG_FILE

This environment variable is used as the value for the –site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

When the –site-config-file switch is not provided and the SILK_CONFIG_FILE environment variable is not set, mapsid looks for the site configuration file in $SILK_DATA_ROOTDIR/silk.conf.

SILK_PATH

This environment variable gives the root of the install tree. As part of its search for the SiLK site configuration file, mapsid checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share.

SEE ALSO

rwfilter(1), rwcut(1)

num2dot

Convert an integer IP to dotted-decimal notation

SYNOPSIS

  num2dot [--ip-fields=FIELDS] [--delimiter=C]

  num2dot --help

  num2dot --version

DESCRIPTION

num2dot is a filter to speedup sorting of IP numbers and yet result in both a natural order (i.e., 29.23.1.1 will appear before 192.168.1.1) and readable output (i.e., dotted decimal rather than an integer representation of the IP number).

It is designed specifically to deal with the output of rwcut(1). Its job is to read stdin and convert specified fields (default field 1) separated by a delimiter (default ’|’) from an integer number into a dotted decimal IP address. Up to three IP fields can be specified via the –ip-fields=FIELDS option. The –delimiter option can be used to specify an alternate delimiter.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as –arg=param or –arg param, though the first form is required for options that take optional parameters.

–ip-fields=FIELDS

Column number of the input that should be considered IP numbers. Column numbers start from 1. If not specified, the default is 1.

–delimiter=C

The character that separates the columns of the input. Default is ’|’.

–help

Print the available options and exit.

–version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLE

In addition to the default fields of 1-12 produced by rwcut, you also want to prefix each row with an integer form of the destination IP and the start time to make processing by another tool (e.g., a spreadsheet) easier. However, within the default rwcut output fields of 1-12, you want to see dotted-decimal IP addresses.

  rwfilter ... --pass=stdout | \  
        rwcut --integer-ip --fields=2,9,1-12 --epoch-time | \  
        num2dot --ip-field=3,4

The first six columns produced by rwcut will be dIP, sTime, sIP, dIP, sPort, dPort. The –integer-ip switch makes the first, third, and fourth columns be integers, but you only want the first column to be an integer representation. The pipe through num2dot will convert the third and fourth columns to dotted-decimal IP numbers.

SEE ALSO

rwcut(1)

BUGS

num2dot has no support for IPv6 addresses.

rwaddrcount

Count activity by IP address

SYNOPSIS

  rwaddrcount {--print-recs | --print-ips | --print-stat}  
        [--use-dest] [--min-bytes=BYTEMIN] [--max-bytes=BYTEMAX]  
        [--min-records=RECMIN] [--max-records=RECMAX]  
        [--min-packets=PACKMIN] [--max-packets=PACKMAX]  
        [--set-file=PATHNAME] [--sort-ips]  
        [{--integer-ips | --zero-pad-ips}]  
        [--no-titles] [--no-columns] [--column-separator=CHAR]  
        [--no-final-delimiter] [{--delimited | --delimited=CHAR}]  
        [--print-filenames] [--copy-input=PATH] [--output-path=PATH]  
        [--pager=PAGER_PROG] [--site-config-file=FILENAME]  
        [{--legacy-timestamps | --legacy-timestamps=NUM}] [FILES...]

  rwaddrcount --help

  rwaddrcount --version

DESCRIPTION

rwaddrcount reads SiLK Flow records from files named on the command line or from the standard input, sums the byte-, packet-, and record-counts by individual source or destination IP address and maintains the time window during which that IP address was active. At the end of the count operation, the results per IP address are displayed when the –print-recs switch is given. rwaddrcount includes facilities for displaying only those IP address whose byte-, packet- or flow-counts are between specified minima and maxima.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as –arg=param or –arg param, though the first form is required for options that take optional parameters.

For the application to operate, one of the three –print options must be chosen.

–print-recs

Print out count records: IP address, number of bytes, number of packets, number of filter records, earliest start time and latest end time.

–print-ips

Print out IP addresses exclusively

–print-stat

Print the following statistics for all SiLK flows that were read and for those meeting the minima and maxima criteria: byte, packet, and flow record counts and the number of unique IP addresses.

–use-dest

Count by destination IP address in the filter record rather than source IP.

–min-bytes=BYTEMIN

Filtering criterion; for the final output (stats or printing), only include count records where the total number of bytes exceeds BYTEMIN

–min-packets=PACKMIN

Filtering criterion; for the final output (stats or printing), only include count records where the total number of packets exceeds PACKMIN

–min-records=RECMIN

Filtering criterion; for the final output (stats or printing), only include count records where the total number of filter records contributing to that count record exceeds RECMIN.

–max-bytes=BYTEMAX

Filtering criterion; for the final output (stats or printing), only include count records where the total number of bytes is less than BYTEMAX.

–max-packets=PACKMAX

Filtering criterion; for the final output (stats or printing), only include count records where the total number of packets is less than PACKMAX.

–max-records=RECMAX

Filtering criterion; for the final output (stats or printing), only include count records which at most RECMAX filter records contributed to.

–set-file=PATHNAME

Write the IPs into the rwset(1)-style binary IP-set file named PATHNAME. Use rwsetcat(1) to see the contents of this file.

–integer-ips

For the –print-recs and –print-ips output formats, print the IPs as integers. By default, IP addresses are printed as dotted decimal.

–zero-pad-ips

For the –print-recs and –print-ips output formats, print IP addresses as dotted decimal, but use three digits per octet by adding zero-padding, e.g, 000.000.000.000.

–sort-ips

For the –print-recs and –print-ips output formats, the results are presented sorted by IP address.

–no-titles

Turn off column titles. By default, titles are printed.

–no-columns

Disable fixed-width columnar output.

–column-separator=C

Use specified character between columns and after the final column. When this switch is not specified, the default of ’|’ is used.

–no-final-delimiter

Do not print the column separator after the final column. Normally a delimiter is printed.

–delimited
–delimited=C

Run as if –no-columns –no-final-delimiter –column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default ’|’.

–print-filenames

Print to the standard error the names of input files as they are opened.

–copy-input=PATH

Copy all binary input to the specified file or named pipe. PATH can be stdout to print flows to the standard output as long as the –output-path switch has been used to redirect rwaddrcount’s ASCII output.

–output-path=PATH

Determine where the output of rwaddrcount (ASCII text) is written. If this option is not given, output is written to the standard output.

–pager=PAGER_PROG

When output is to a terminal, invoke the program PAGER_PROG to view the output one screen full at a time. This switch overrides the SILK_PAGER environment variable, which in turn overrides the PAGER variable. If the value of the pager is determined to be the empty string, no paging will be performed and all output will be printed to the terminal.

–site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK_CONFIG_FILE environment variable is used if that variable is not empty. The value of SILK_CONFIG_FILE should include the name of the file. Otherwise, the application looks for a file named silk.conf in the following directories: the directory specified in the SILK_DATA_ROOTDIR environment variable; the data root directory that is compiled into SiLK (use the –version switch to view this value); the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/; and the share/silk/ and share/ directories parallel to the application’s directory.

–legacy-timestamps
–legacy-timestamps=NUM

Specify the format for human readable timestamps, either the default (new) style, YYYY/MM/DDThh:mm:ss , or the legacy style, MM/DD/YYYY hh:mm:ss . When this switch is not present, the timestamps will be in the default format. When this switch is present and no argument is given, timestamps are in the legacy format. When an argument is supplied, timestamps will be in the new format if the argument begins with 0, and in the old format if the argument begins with 1. Any other argument to the switch is an error.

–help

Print the available options and exit.

–version

Print the version number and information about how SiLK was configured, then exit the application.

Deprecated Switches

The following switches are deprecated.

–byte-min=BYTEMIN

Deprecated alias for –min-bytes.

–packet-min=PACKMIN

Deprecated alias for –min-packets.

–rec-min=RECMIN

Deprecated alias for –min-records.

–byte-max=BYTEMAX

Deprecated alias for –max-bytes.

–packet-max=PACKMAX

Deprecated alias for –max-packets.

–rec-max=RECMAX

Deprecated alias for –max-records.

EXAMPLES

To print out a set of IP’s with exactly one tcp record during the time period, use:

  rwfilter --start-date=2003/09/01:00 --end-date=2003/09/01:12 \  
        --proto=6 --pass=stdout \  
        | rwaddrcount --max-records=1 --print-ips

In general, to print out record information, use rwaddrcount with –print-recs

  rwfilter --start-date=2003/01/17:00 --end-date=2003/01/17:23 \  
        --proto=6 --pass=stdout \  
        | rwaddrcount --print-rec | head -3

  10.10.10.1|  65792| 147|  21| 2003/01/17T00:19:01| 2003/01/17T02:00:13|  
  10.10.10.2| 110744|  89|   7| 2003/01/17T01:21:42| 2003/01/17T01:39:21|  
  10.10.10.3|    864|  18|   6| 2003/01/17T00:20:33| 2003/01/17T01:25:38|

ENVIRONMENT

SILK_PAGER

When set to a non-empty string, rwcut automatically invokes this program to display its output a screen at a time. If set to an empty string, rwcut does not automatically page its output.

PAGER

When set and SILK_PAGER is not set, rwcut automatically invokes this program to display its output a screen at a time.

SILK_CONFIG_FILE

This environment variable is used as the value for the –site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

When the –site-config-file switch is not provided and the SILK_CONFIG_FILE environment variable is not set, rwaddrcount looks for the site configuration file in $SILK_DATA_ROOTDIR/silk.conf.

SILK_PATH

This environment variable gives the root of the install tree. As part of its search for the SiLK site configuration file, rwaddrcount checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share.

SEE ALSO

rwfilter(1), rwset(1), rwsetcat(1), rwstats(1), rwtotal(1), rwuniq(1)

NOTES

When used in an IPv6 environment, rwaddrcount will attempt to convert any IPv6 addresses to IPv4. Records that can be converted will be processed, all other records will be silently ignored.

rwaddrcount uses a fairly large hashtable to store data, but it is likely that as the amount of data expands, the application will take more time to process data.

Similar binning of records are produced by rwstats(1), rwtotal(1), and rwuniq(1).

To generate a list of IP addresses without the volume information, use rwset(1).

rwappend

Append SiLK Flow file(s) to an existing SiLK Flow file

SYNOPSIS

  rwappend [--create=[TEMPLATE_FILE]] [--site-config-file=FILENAME]  
        [--print-statistics] TARGET_FILE SOURCE_FILE [SOURCE_FILE...]

  rwappend --help

  rwappend --version

DESCRIPTION

rwappend reads SiLK Flow records from the specified SOURCE_FILEs and appends them to the TARGET_FILE. If stdin is used as the name of one of the SOURCE_FILEs, SiLK flow records will be read from the standard input.

When the TARGET_FILE does not exist and the –create switch is not provided, rwappend will exit with an error. When –create is specified and TARGET_FILE does not exist, rwappend will create the TARGET_FILE using the same format, version, and byte-order as the specified TEMPLATE_FILE. If no TEMPLATE_FILE is given, the TARGET_FILE is created in the default format and version (the same format that rwcat(1) would produce).

The TARGET_FILE must be an actual file—it cannot be a named pipe or the standard output. In addition, the header of TARGET_FILE must not be compressed; that is, you cannot append to a file whose entire contents has been compressed with gzip (those files normally end in the .gz extension).

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as –arg=param or –arg param, though the first form is required for options that take optional parameters.

–create
–create=TEMPLATE_FILE

Create the TARGET_FILE if it does not exist. The file will have the same format, version, and byte-order as the TEMPLATE_FILE if it is provided; otherwise the defaults are used. The TEMPLATE_FILE will NOT be appended to TARGET_FILE unless it also appears in as the name of a SOURCE_FILE.

–print-statistics

Print to the standard error the number of records read from each SOURCE_FILE and the total number of records appened to the TARGET_FILE.

–site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK_CONFIG_FILE environment variable is used if that variable is not empty. The value of SILK_CONFIG_FILE should include the name of the file. Otherwise, the application looks for a file named silk.conf in the following directories: the directory specified in the SILK_DATA_ROOTDIR environment variable; the data root directory that is compiled into SiLK (use the –version switch to view this value); the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/; and the share/silk/ and share/ directories parallel to the application’s directory.

–help

Print the available options and exit.

–version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

Standard usage where results.dat exists:

  rwappend results.dat sample5.dat sample6.dat

To append files sample*.dat to results.dat, or to create results.dat using the same format as the first file argument (note that sample1.dat must be repeated):

 rwappend results.dat --create=sample1.dat \  
       sample1.dat sample2.dat

If results.dat does not exist, the following two commands are equivalent:

  rwappend --create results.dat sample1.dat sample2.dat

  rwcat sample1.dat sample2.dat > results.dat

ENVIRONMENT

SILK_CONFIG_FILE

This environment variable is used as the value for the –site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

When the –site-config-file switch is not provided and the SILK_CONFIG_FILE environment variable is not set, rwappend looks for the site configuration file in $SILK_DATA_ROOTDIR/silk.conf.

SILK_PATH

This environment variable gives the root of the install tree. As part of its search for the SiLK site configuration file, rwappend checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share.

SEE ALSO

rwcat(1)

BUGS

When used in an IPv6 environment, rwappend will convert IP addresses into the form used by the TARGET_FILE. Any records containing IP addresses that cannot be converted will be silently ignored.

rwappend makes some attempts to avoid appending a file to itself (which would eventually exhaust the disk space) by comparing the names of files it is given; it should be smarter about this.

rwbag

Build a binary Bag from SiLK Flow records.

SYNOPSIS

  rwbag [--sip-flows=OUTPUTFILE] [--dip-flows=OUTPUTFILE]  
        [--sport-flows=OUTPUTFILE] [--dport-flows=OUTPUTFILE]  
        [--proto-flows=OUTPUTFILE] [--sensor-flows=OUTPUTFILE]  
        [--input-flows=OUTPUTFILE] [--output-flows=OUTPUTFILE]  
        [--nhip-flows=OUTPUTFILE]  
        [--sip-packets=OUTPUTFILE] [--dip-packets=OUTPUTFILE]  
        [--sport-packets=OUTPUTFILE] [--dport-packets=OUTPUTFILE]  
        [--proto-packets=OUTPUTFILE] [--sensor-packets=OUTPUTFILE]  
        [--input-packets=OUTPUTFILE] [--output-packets=OUTPUTFILE]  
        [--nhip-packets=OUTPUTFILE]  
        [--sip-bytes=OUTPUTFILE] [--dip-bytes=OUTPUTFILE]  
        [--sport-bytes=OUTPUTFILE] [--dport-bytes=OUTPUTFILE]  
        [--proto-bytes=OUTPUTFILE] [--sensor-bytes=OUTPUTFILE]  
        [--input-bytes=OUTPUTFILE] [--output-bytes=OUTPUTFILE]  
        [--nhip-bytes=OUTPUTFILE]  
        [--note-add=TEXT] [--note-file-add=FILE]  
        [--compression-method=COMP_METHOD]  
        [--print-filenames] [--copy-input=PATH]  
        [--site-config-file=FILENAME]  
        [INPUTFILE[ INPUTFILE...]]

  rwbag --help

  rwbag --legacy-help

  rwbag --version

DESCRIPTION

rwbag reads SiLK Flow records and builds a Bag. Source IP address, destination IP address, next hop IP address, source port, destination port, protocol, input interface index, output interface index, or sensor ID may be used as the unique key by which to count volumes. Flows, packets, or bytes may be used as the counter. rwbag attempts to read raw flow records from the standard input or from any INPUTFILE arguments. INPUTFILE may also explicitly be the keyword stdin. If the raw flow records do not contain the proper key and counter fields, rwbag prints an error to stderr and exits abnormally.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as –arg=param or –arg param, though the first form is required for options that take optional parameters.

At least one of the following output flags must be defined. For each, OUTPUTFILE is the name of a non-existent file, a named pipe, or the keyword stdout to write the binary Bag to the standard output. Only one switch may use the standard output as its output stream.

–sip-flows=OUTPUTFILE

Count number of flows by unique source IP.

–sip-packets=OUTPUTFILE

Count number of packets by unique source IP.

–sip-bytes=OUTPUTFILE

Count number of bytes by unique source IP.

–dip-flows=OUTPUTFILE

Count number of flows by unique destination IP.

–dip-packets=OUTPUTFILE

Count number of packets by unique destination IP.

–dip-bytes=OUTPUTFILE

Count number of bytes by unique destination IP.

–sport-flows=OUTPUTFILE

Count number of flows by unique source port.

–sport-packets=OUTPUTFILE

Count number of packets by unique source port.

–sport-bytes=OUTPUTFILE

Count number of bytes by unique source port.

–dport-flows=OUTPUTFILE

Count number of flows by unique destination port.

–dport-packets=OUTPUTFILE

Count number of packets by unique destination port.

–dport-bytes=OUTPUTFILE

Count number of bytes by unique destination port.

–proto-flows=OUTPUTFILE

Count number of flows by unique protocol.

–proto-packets=OUTPUTFILE

Count number of packets by unique protocol.

–proto-bytes=OUTPUTFILE

Count number of bytes by unique protocol.

–sensor-flows=OUTPUTFILE

Count number of flows by unique sensor ID.

–sensor-packets=OUTPUTFILE

Count number of packets by unique sensor ID.

–sensor-bytes=OUTPUTFILE

Count number of bytes by unique sensor ID.

–input-flows=OUTPUTFILE

Count number of flows by unique input interface index.

–input-packets=OUTPUTFILE

Count number of packets by unique input interface index.

–input-bytes=OUTPUTFILE

Count number of bytes by unique input interface index.

–output-flows=OUTPUTFILE

Count number of flows by unique output interface index.

–output-packets=OUTPUTFILE

Count number of packets by unique output interface index.

–output-bytes=OUTPUTFILE

Count number of bytes by unique output interface index.

–nhip-flows=OUTPUTFILE

Count number of flows by unique next hop IP.

–nhip-packets=OUTPUTFILE

Count number of packets by unique next hop IP.

–nhip-bytes=OUTPUTFILE

Count number of bytes by unique next hop IP.

–note-add=TEXT

Add the specified TEXT to the header of every output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

–note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of every output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

–compression-method=COMP_METHOD

Set the compression method of the output to COMP_METHOD. Some SiLK tools can use an external library to compress their binary output. The list of available compression methods and the default method are set when SiLK is compiled (the –help and –version switches print the available and default compression methods) and depend on which supported libraries are found. SiLK can support:

none

Do not compress the output using an external library

zlib

Use the zlib(3) library for compressing the output

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression

best

Use whichever available method gives the best compression in general, though not necessarily the best for this particular output.

–print-filenames

Prints to the standard error the names of input files as they are opened.

–copy-input=PATH

Copy all binary input to the specified file or named pipe. PATH can be stdout to print flows to the standard output as long as the –output-path switch has been used to redirect rwbag’s ASCII output.

–site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK_CONFIG_FILE environment variable is used if that variable is not empty. The value of SILK_CONFIG_FILE should include the name of the file. Otherwise, the application looks for a file named silk.conf in the following directories: the directory specified in the SILK_DATA_ROOTDIR environment variable; the data root directory that is compiled into SiLK (use the –version switch to view this value); the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/; and the share/silk/ and share/ directories parallel to the application’s directory.

–help

Print the available options and exit.

–legacy-help

Print the usage information for rwbag and include the names of the deprecated options in the output, then exit.

–version

Print the version number and information about how SiLK was configured, then exit the application.

The following options are deprecated.

–sf-file=OUTPUTFILE

Deprecated alias for –sip-flows.

–sp-file=OUTPUTFILE

Deprecated alias for –sip-packets.

–sb-file=OUTPUTFILE

Deprecated alias for –sip-bytes.

–df-file=OUTPUTFILE

Deprecated alias for –dip-flows.

–dp-file=OUTPUTFILE

Deprecated alias for –dip-packets.

–db-file=OUTPUTFILE

Deprecated alias for –dip-bytes.

–port-sf-file=OUTPUTFILE

Deprecated alias for –sport-flows.

–port-sp-file=OUTPUTFILE

Deprecated alias for –sport-packets.

–port-sb-file=OUTPUTFILE

Deprecated alias for –sport-bytes.

–port-df-file=OUTPUTFILE

Deprecated alias for –dport-flows.

–port-dp-file=OUTPUTFILE

Deprecated alias for –dport-packets.

–port-db-file=OUTPUTFILE

Deprecated alias for –dport-bytes.

–proto-f-file=OUTPUTFILE

Deprecated alias for –proto-flows.

–proto-p-file=OUTPUTFILE

Deprecated alias for –proto-packets.

–proto-b-file=OUTPUTFILE

Deprecated alias for –proto-bytes.

EXAMPLES

To build both source IP and destination IP Bags of flows:

  rwfilter... | rwbag --sip-flow=sf.bag --dip-flow=df.bag

ENVIRONMENT

SILK_CONFIG_FILE

This environment variable is used as the value for the –site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

When the –site-config-file switch is not provided and the SILK_CONFIG_FILE environment variable is not set, rwbag looks for the site configuration file in $SILK_DATA_ROOTDIR/silk.conf.

SILK_PATH

This environment variable gives the root of the install tree. As part of its search for the SiLK site configuration file, rwbag checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share.

SEE ALSO

rwbagbuild(1), rwbagcat(1), rwbagtool(1), rwfileinfo(1), rwfilter(1)

BUGS

Currently there is no support for Bag files keyed by an IPv6 address.

When used in an IPv6 environment, rwbag will process every record when creating Bags that are not keyed by the IP address. For Bags keyed by the IP address, rwbag will attempt to convert any IPv6 addresses to IPv4. Records that can be converted will be processed, all other records will be silently ignored for the IP-keyed Bags, but will be used for any non-IP-keyed Bags.

rwbagbuild

Create a binary Bag from non-flow data.

SYNOPSIS

  rwbagbuild { --set-input=SETFILE | --bag-input=TEXTFILE }  
        [--delimiter=C] [--default-count=DEFAULTCOUNT]  
        [--note-add=TEXT] [--note-file-add=FILE]  
        [--compression-method=COMP_METHOD] [--output-path=OUTPUTFILE]

  rwbagbuild --help

  rwbagbuild --version

DESCRIPTION

rwbagbuild builds a binary Bag file from an IPset file or from textual input.

When creating a Bag from an IPset, the value associated with each IP address is the value given by the –default-count switch, or 1 if the switch isn’t provided.

The textual input read from the argument to the –bag-input switch is processed a line at a time. Comments begin with a ’#’-character and continue to the end of the line; they are stripped from each line. Any line that is blank or contains only whitespace is ignored. All other lines must contain a valid key or key-count pair; whitespace around the key and count is ignored.

If the delimiter character (specified by the –delimiter switch and having pipe (’|’) as its default) is not present, the line must contain only an IP address or an integer key. If the delimiter is present, the line must contain an IP address or integer key before the delimiter and an integer count after the delimiter. These lines may have a second delimiter after the integer count; the second delimiter and any text to the right of it are ignored.

When the –default-count switch is specified, its value will used as the count for each key, and the count value parsed from each line, if any, is ignored. Otherwise, the parsed count is used, or 1 is used as the count if no delimiter was present.

For each key-count pair, the key will be inserted into Bag with its count or, if the key is already present in the Bag, its total count will be incremented by the count from this line.

The IP address or integer key must be expresed in one of these formats:

If an IP address or count cannot be parsed, or if a line contains a delimiter character but no count, rwbagbuild prints an error and exits.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as –arg=param or –arg param, though the first form is required for options that take optional parameters.

The following two switches control the type of input; one and only one must be provided:

–set-input=SETFILE

Create a Bag from an IPset. SETFILE is a filename, a named pipe, or the keyword stdin. Counts have a volume of 1 unless overridden with –default-count.

–bag-input=TEXTFILE

Create a Bag from a delimited text file. TEXTFILE is a filename, a named pipe, or the keyword stdin. See the DESCRIPTION section for the syntax of the TEXTFILE.

–delimiter=C

The delimiter to expect between each key-count pair of the TEXTFILE read by the –bag-input switch. The delimiter is ignored if the –set-input switch is specified. Since ’#’ is used to denote comments and newline is used to used to denote records, neither is a valid delimiter character.

–default-count=DEFAULTCOUNT

Override the counts of all values in the input bag or set with the value of DEFAULTCOUNT. DEFAULTCOUNT must be a positive integer.

–note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

–note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

–compression-method=COMP_METHOD

Set the compression method of the output to COMP_METHOD. Some SiLK tools can use an external library to compress their binary output. The list of available compression methods and the default method are set when SiLK is compiled (the –help and –version switches print the available and default compression methods) and depend on which supported libraries are found. SiLK can support:

none

Do not compress the output using an external library

zlib

Use the zlib(3) library for compressing the output

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression

best

Use whichever available method gives the best compression in general, though not necessarily the best for this particular output.

–output-path=OUTPUTFILE

Redirect output to OUTPUTFILE. OUTPUTFILE is a filename, named pipe, or the keyword stdout.

–help

Print the available options and exit.

–version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

Assume the file mybag.txt contains the following (ignore leading whitespace and every line ends with a newline):

  192.168.0.1|5  
  192.168.0.2|500  
  192.168.0.3|3  
  192.168.0.4|14  
  192.168.0.5|5

To build a bag with it:

  rwbagbuild --bag-input=mybag.txt > mybag.bag

To create a Bag of protocol data from the text file myproto.txt:

    1|      4|  
    6|    138|  
   17|    131|

use

  rwbag --bag-input=myproto.txt > myproto.bag

Given the IP set myset.set, create a bag where every entry in the set has a count of 3:

  rwbagbuild --set-input=myset.set --default-count=3 \  
        --out=mybag2.bag

SEE ALSO

rwbag(1), rwbagcat(1), rwbagtool(1), rwfileinfo(1), rwset(1)

rwbagcat

Output a binary Bag as text.

SYNOPSIS

  rwbagcat [--stats[=OUTFILE]] [--tree-stats[=OUTFILE]]  
        [ --network-structure[=STRUCTURE] | --bin-ips[=SCALE] ]  
        [--minkey=VALUE] [--maxkey=VALUE] [--mask-set=PATH]  
        [--mincounter=VALUE] [--maxcounter=VALUE] [--zero-counts]  
        [--integer-keys | --zero-pad-ips] [--output-path=OUTPUTFILE]  
        [--no-columns] [--column-separator=C] [--no-final-delimiter]  
        [{--delimited | --delimited=C}] [--pager=PAGER_PROG]  
        [BAGFILE...]

  rwbagcat --help

  rwbagcat --version

DESCRIPTION

rwbagcat reads a binary Bag as created by rwbag(1) or rwbagbuild(1), converts it to text, and outputs it to the standard output or the specified file. It can also print various statistics and summary information about the Bag.

rwbagcat reads the BAGFILEs specified on the command line; if no BAGFILE arguments are given, rwbagcat attempts to read the Bag from the standard input. BAGFILE may also explicitly be the keyword stdin or a hyphen (-) to allow rwbagcat to combine files and piped input. If any input does not contain a Bag, rwbagcat prints an error to the standard error and exits abnormally.

When multiple BAGFILEs are specified, each is handled individually; to process the combination of the BAGFILEs, invoke rwbagcat on the output from rwbagtool(1).

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as –arg=param or –arg param, though the first form is required for options that take optional parameters.

–network-structure
–network-structure=STRUCTURE

Print the sum of the counters for each CIDR block of the specified size listed in STRUCTURE. The switch can also, for each CIDR block, print the number of hosts and smaller CIDR blocks that are occupied. STRUCTURE has one of three forms: CIDR_LIST, CIDR_LIST/, or CIDR_LIST/SUMMARY_EXTRAS. CIDR_LIST and SUMMARY_EXTRAS are each a comma separated list of integers from 1 to 32 as well as the following letters:

A comma is not required between adjacent letters. Any combination of integers and the symbols T,A,B,C,X,H may be specified in CIDR_LIST. In addition, if the argument contains the letter S or a slash (/), the output line for a CIDR block will also show the number of hosts and smaller CIDR blocks that are occupied. This list of smaller CIDR blocks to summarize is generated by forming the union of CIDR_LIST and SUMMARY_EXTRAS. By default, SUMMARY_EXTRAS is 8,16,24,27, and this default is used when the argument contains S but no slash. If the argument includes a slash and SUMMARY_EXTRAS is empty, the list of smaller subnets is set exactly to CIDR_LIST. If an argument is provided, the CIDR_LIST must contain at least one element. If no argument is specified to the switch, the default is TS/ABCX. An argument that contains nothing but S and/or slash is illegal. This option disables printing of the individual IPs; specify the H argument to the switch to print the IP addresses and their counters.

–bin-ips
–bin-ips=SCALE

Invert the bag and count the total number of unique IP addresses for a given value of the volume bin. For example, turn a Bag {sip:flow} into {flow:count(sip)}. SCALE is a string containing the value linear, binary, or decimal.

–stats
–stats=OUTFILE

Print out breakdown of the network hosts seen, and print out general statistics about the keys and counters.

OUTFILE is a filename, named pipe, or one of the keywords stdout or stderr. Defaults to printing on stderr unless output is being paged, in which case output is to stdout.

–tree-stats
–tree-stats=OUTFILE

Print out metadata about how the bag is performing:

OUTFILE is a filename, named pipe, or one of the keywords stdout or stderr. Defaults to printing on stdout.

–minkey=VALUE

Only output records whose minimum key value is VALUE or higher. The valid range is of VALUE 0 to 4294967295, or 0.0.0.0 to 255.255.255.255. Default is 0 (for port or protocol) or 0.0.0.0 (for IP address). Accepts dotted decimal or integer notation.

–maxkey=VALUE

Only output records whose maximum key value is VALUE or lower. The valid range of VALUE is 0 to 4294967295, or 0.0.0.0 to 255.255.255.255. Default is all ports or protocols, or the maximum IP address 255.255.255.255. Accepts dotted decimal or integer notation.

–mask-set=PATH

Only output records whose key appears in the IPset read from the file PATH. When used with –minkey and/or –maxkey, the key must be in the IPset and within when the specified range.

–mincounter=VALUE

Only output records whose minimum counter value is VALUE or higher. The valid range of VALUE is 1 to 18446744073709551615. The default is to print all records with non-zero counter; use –zero-counts to show records whose counter is 0.

–maxcounter=VALUE

Only output records whose maximum counter value is VALUE or lower. The valid range of VALUE is 1 to 18446744073709551615, with the default being the maximum counter value.

–zero-counts

Print keys whose counter is zero. Normally, keys with a counter of zero are suppressed since all keys have a default counter of zero. In order to use this flag, either –mask-set or both –minkey and –maxkey must be specified. When this switch is specified, any counter limit explicitly set by the –maxcounter switch will still be applied.

–output-path=OUTPUTFILE

Redirect output of the –network-structure or –bin-ips options to OUTPUTFILE. OUTPUTFILE is a filename, named pipe, or the keyword stdout.

–zero-pad-ips

Pad IP address octets with zeros so that every octet is three characters wide.

–integer-keys

Print the keys as integers. This flag should be used if the bag is a port or protocol bag.

–no-columns

Disable fixed-width columnar output.

–column-separator=C

Use specified character between columns and after the final column. When this switch is not specified, the default of ’|’ is used.

–no-final-delimiter

Do not print the column separator after the final column. Normally a delimiter is printed. When the network summary is requested (–network-structure=S), the separator is always printed before the summary column and never that column.

–delimited
–delimited=C

Run as if –no-columns –no-final-delimiter –column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default ’|’.

–pager=PAGER_PROG

When output is to a terminal, invoke the program PAGER_PROG to view the output one screen full at a time. This switch overrides the SILK_PAGER environment variable, which in turn overrides the PAGER variable. If the value of the pager is determined to be the empty string, no paging will be performed and all output will be printed to the terminal.

–help

Print the available options and exit.

–version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

To print the bag:

  $ rwbagcat mybag.bag  
       172.23.1.1|              5|  
       172.23.1.2|            231|  
       172.23.1.3|              9|  
       172.23.1.4|             19|  
    192.168.0.100|              1|  
    192.168.0.101|              1|  
    192.168.0.160|             15|  
   192.168.20.161|              1|  
   192.168.20.162|              5|  
   192.168.20.163|              5|

To print it with full network:

  $ rwbagcat --network-structure=TABCHX mybag.bag  
            172.23.1.1      |              5|  
            172.23.1.2      |            231|  
            172.23.1.3      |              9|  
            172.23.1.4      |             19|  
          172.23.1.0/27     |            264|  
        172.23.1.0/24       |            264|  
      172.23.0.0/16         |            264|  
    172.0.0.0/8             |            264|  
            192.168.0.100   |              1|  
            192.168.0.101   |              1|  
          192.168.0.96/27   |              2|  
            192.168.0.160   |             15|  
          192.168.0.160/27  |             15|  
        192.168.0.0/24      |             17|  
            192.168.20.161  |              1|  
            192.168.20.162  |              5|  
            192.168.20.163  |              5|  
          192.168.20.160/27 |             11|  
        192.168.20.0/24     |             11|  
      192.168.0.0/16        |             28|  
    192.0.0.0/8             |             28|  
  TOTAL                     |            292|

Or an abbreviated network structure by class A and C only, including summary information:

  $ rwbagcat --network-structure=ACS mybag.bag  
      172.23.1.0/24     |            264| 4 hosts in 1 /27  
  172.0.0.0/8           |            264| 4 hosts in 1 /16, 1 /24, and 1 /27  
      192.168.0.0/24    |             17| 3 hosts in 2 /27s  
      192.168.20.0/24   |             11| 3 hosts in 1 /27  
  192.0.0.0/8           |             28| 6 hosts in 1 /16, 2 /24s, and 3 /27s

To bin by number of unique IP addresses by volume:

  $ rwbagcat --bin-ips mybag.bag  
                1|              3|  
                5|              3|  
                9|              1|  
               15|              1|  
               19|              1|  
              231|              1|

This means there were 3 source hosts in the bag that had a single flow; 3 hosts that had 5 flows; and one host each that had 9, 15, 19, and 231 flows.

For a log2 breakdown of the counts:

  $ rwbagcat --bin-ips=binary mybag.bag  
     2^0 to 2^1-1|              3|  
     2^2 to 2^3-1|              3|  
     2^3 to 2^4-1|              2|  
     2^4 to 2^5-1|              1|  
     2^7 to 2^8-1|              1|

Statistics:

  $ rwbagcat --stats mybag.bag

  Statistics  
                keys:  10  
     sum of counters:  292  
         minimum key:  172.23.1.1  
         maximum key:  192.168.20.163  
       minimum count:  1  
       maximum count:  231  
                mean:  29.2  
            variance:  5064  
  standard deviation:  71.16  
                skew:  2.246  
            kurtosis:  8.1

  $ rwbagcat --tree-stats mybag.bag  
     nodes allocated:  5 (10240 bytes)  
    leaves allocated:  4 (1024 bytes)  
       keys inserted:  10 (10 unique)  
     counter density:  7.81%

ENVIRONMENT

SILK_PAGER

When set to a non-empty string, rwbagcat automatically invokes this program to display its output a screen at a time. If set to an empty string, rwbagcat does not automatically page its output.

PAGER

When set and SILK_PAGER is not set, rwbagcat automatically invokes this program to display its output a screen at a time.

SEE ALSO

rwbag(1), rwbagbuild(1), rwbagtool(1)

rwbagtool

Perform high-level operations on binary Bag files

SYNOPSIS

  rwbagtool [BAGFILE[,BAGFILE...]]  
        { --add | --subtract | --minimize | --maximize | --divide  
          | --scalar-multiply=VALUE  
          | --compare={lt | le | eq | ge | gt} }  
        [--intersect=SETFILE | --complement-intersect=SETFILE]  
        [--mincounter=VALUE] [--maxcounter=VALUE]  
        [--minkey=VALUE] [--maxkey=VALUE]  
        [--invert] [--coverset] [--output-path=OUTPUTFILE]  
        [--note-strip] [--note-add=TEXT] [--note-file-add=FILE]  
        [--compression-method=COMP_METHOD]

  rwbagtool --help

  rwbagtool --version

DESCRIPTION

rwbagtool performs various operations on Bags. It can add Bags together, subtract a subset of data from a Bag, perform key intersection of a Bag with an IP set, extract the key list of a Bag as an IP set, or filter Bag records based on their counter value.

BAGFILE is a the name of a file or a named pipe, or the names stdin or - to have rwbagtool read from the standard input. If no Bag file names are given on the command line, rwbagtool attempts to read a Bag from the standard input. If BAGFILE does not contain a Bag, rwbagtool prints an error to stderr and exits abnormally.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as –arg=param or –arg param, though the first form is required for options that take optional parameters.

Operation switches

The first set of options are mutually exclusive; only one may be specified. If none are specified, the counters in the Bag files are summed.

–add

Sum the counters for each key for all Bag files given on the command line. If a key does not exist, it has a counter of zero. If no other operation is specified, the add operation is the default.

–subtract

Subtract from the first Bag file all subsequent Bag files. If a key does not appear in the first Bag file, rwbagtool assumes it has a value of 0. If any counter subtraction results in a negative number, the key will not appear in the resulting Bag file.

–minimize

Cause the output to contain the minimum counter seen for each key. Keys that do not appear in all input Bags will not appear in the output.

–maximize

Cause the output to contain the maximum counter seen for each key. The output will contain each key that appears in any input Bag.

–divide

Divide the first Bag file by the second Bag file. It is an error if more than two Bag files are specified. Every key in the first Bag file must appear in the second file; the second Bag may have keys that do not appear in the first, and those keys will not appear in the output. Since Bags do not support floating point numbers, the result of the division is rounded to the nearest integer (values ending in .5 are rounded up). If the result of the division is less than 0.5, the key will not appear in the output.

–scalar-multiply=VALUE

Multiply each counter in the Bag file by the scalar VALUE, where VALUE is an integer in the range 1 to 18446744073709551615. This switch accepts a single Bag as input.

–compare=OPERATION

Compare the key/counter pairs in exactly two Bag files. It is an error if more than two Bag files are specified. The keys in the output Bag will only be those whose counter in the first Bag is OPERATION the counter in the second Bag. The counters for all keys in the output will be 1. Any key that does not appear in both input Bag files will not appear in the result. The possible OPERATION values are the strings:

lt

GetCounter(Bag1, key) < GetCounter(Bag2, key)

le

GetCounter(Bag1, key) <= GetCounter(Bag2, key)

eq

GetCounter(Bag1, key) == GetCounter(Bag2, key)

ge

GetCounter(Bag1, key) >= GetCounter(Bag2, key)

gt

GetCounter(Bag1, key) > GetCounter(Bag2, key)

Masking/Limiting switches

The result of the above operation is an intermediate Bag file. The following switches are applied next to remove entries from the intermediate Bag:

–intersect=SETFILE

Mask the keys in the intermediate Bag using the set in SETFILE. SETFILE is the name of a file or a named pipe containing an IPset, or the name stdin or - to have rwbagtool read the IPset from the standard input. If SETFILE does not contain an IPset, rwbagtool prints an error to stderr and exits abnormally. Only key/counter pairs where the key matches an entry in SETFILE are written to the output.

–complement-intersect=SETFILE

As –intersect, but only writes key/counter pairs for keys which do not match an entry in SETFILE.

–mincounter=VALUE

Cause the output to contain only those records whose counter value is VALUE or higher. The allowable range is 1 to the maximum counter value; the default is 1.

–maxcounter=VALUE

Cause the output to contain only those records whose counter value is VALUE or lower. The allowable range is 1 to the maximum counter value; the default is the maximum counter value.

–minkey=VALUE

Cause the output to contain only those records whose key value is VALUE or higher. Default is 0 (or 0.0.0.0). Accepts input as an integer or as an IP address in dotted decimal notation.

–maxkey=VALUE

Cause the output to contain only those records whose key value is VALUE or higher. Default is 4294967295 (or 255.255.255.255). Accepts input as an integer or as an IP address in dotted decimal notation.

Output switches

The following switches control the output.

–invert

Generate a new Bag whose keys are the counters in the intermediate Bag and whose counter is the number of times the counter was seen. For example, this turns the Bag {sip:flow} into the Bag {flow:count(sip)}. Any counter in the intermediate Bag that is larger than the maximum possible key will be attributed to the maximum key; to prevent this, specify --maxcounter=4294967295.

–coverset

Instead of creating a Bag file as the output, write an IPset which contains the keys contained in the intermediate Bag.

–output-path=OUTPUTFILE

Redirect output to OUTPUTFILE. OUTPUTFILE is the name of a file or a named pipe, or the name stdout or - to write the result to the standard output.

–note-strip

Do not copy the notes (annotations) from the input files to the output file. Normally notes from the input files are copied to the output.

–note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

–note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

–compression-method=COMP_METHOD

Set the compression method of the output to COMP_METHOD. Some SiLK tools can use an external library to compress their binary output. The list of available compression methods and the default method are set when SiLK is compiled (the –help and –version switches print the available and default compression methods) and depend on which supported libraries are found. SiLK can support:

none

Do not compress the output using an external library

zlib

Use the zlib(3) library for compressing the output

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression

best

Use whichever available method gives the best compression in general, though not necessarily the best for this particular output.

–help

Print the available options and exit.

–version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

The examples assume the following contents for the files:

 Bag1.bag    Bag2.bag    Bag3.bag    Bag4.bag    Mask.set  
  3|  10|     1|   1|     2|   8|     1|   1|          2  
  4|   7|     4|   2|     4|  10|     4|   3|          4  
  6|  14|     7|  32|     6|  14|     6|   4|          6  
  7|  23|     8|   2|     7|  12|     7|   4|          8  
  8|   2|                 9|   8|     8|   6|

Adding Bag files
 $ rwbagtool --add Bag1.bag Bag2.bag > Bag-sum.bag  
 $ rwbagcat --integer-keys Bag-sum.bag  
  1|   1|  
  3|  10|  
  4|   9|  
  6|  14|  
  7|  55|  
  8|   4|

 $ rwbagtool --add Bag1.bag Bag2.bag Bag3.bag > Bag-sum2.bag  
 $ rwbagcat --integer-keys Bag-sum2.bag  
  1|   1|  
  2|   8|  
  3|  10|  
  4|  19|  
  6|  28|  
  7|  67|  
  8|   4|  
  9|   8|

Subtracting Bag Files
 $ rwbagtool --sub Bag1.bag Bag2.bag > Bag-diff.bag  
 $ rwbagcat --integer-keys Bag-diff.bag  
  3|  10|  
  4|   5|  
  6|  14|

 $ rwbagtool --sub Bag2.bag Bag1.bag > Bag-diff2.bag  
 $ rwbagcat --integer-keys Bag-diff2.bag  
  1|   1|  
  7|   9|

Getting the Minimum Value
 $ rwbagtool --minimize Bag1.bag Bag2.bag Bag3.bag > Bag-min.bag  
 $ rwbagcat --integer-keys Bag-min.bag  
  4|   2|  
  7|  12|

Getting the Maximum Value
 $ rwbagtool --maximize Bag1.bag Bag2.bag Bag3.bag > Bag-max.bag  
 $ rwbagcat --integer-keys Bag-max.bag  
  1|   1|  
  2|   8|  
  3|  10|  
  4|  10|  
  6|  14|  
  7|  32|  
  8|   2|  
  9|   8|

Dividing Bag Files
 $ rwbagtool --divide Bag2.bag Bag4.bag > Big-div1.bag  
 $ rwbagcat --integer-keys Big-div1.bag  
   1|   1|  
   4|   1|  
   7|   8|  
 $ rwbagtool --divide Bag4.bag Bag2.bag > Big-div2.bag  
 rwbagtool: Error dividing bags; key 6 not in divisor bag

Scalar Multiplication
 $ rwbagtool --scalar-multiply=7 Bag1.bag > Bag-multiply.bag  
 $ rwbagcat --integer-keys Bag-multiply.bag  
  3|  70|  
  4|  49|  
  6|  98|  
  7| 161|  
  8|  14|

Comparing Bag Files
 $ rwbagtool --compare=lt Bag1.bag Bag2.bag > Bag-lt.bag  
 $ rwbagcat --integer-keys Bag-lt.bag  
  7|   1|

 $ rwbagtool --compare=le Bag1.bag Bag2.bag > Bag-le.bag  
 $ rwbagcat --integer-keys Bag-le.bag  
  7|   1|  
  8|   1|

 $ rwbagtool --compare=eq Bag1.bag Bag2.bag > Bag-eq.bag  
 $ rwbagcat --integer-keys Bag-eq.bag  
  8|   1|

 $ rwbagtool --compare=ge Bag1.bag Bag2.bag > Bag-ge.bag  
 $ rwbagcat --integer-keys Bag-ge.bag  
  4|   1|  
  8|   1|

 $ rwbagtool --compare=gt Bag1.bag Bag2.bag > Bag-gt.bag  
 $ rwbagcat --integer-keys Bag-gt.bag  
  4|   1|

Making a Cover Set
 $ rwbagtool --coverset Bag1.bag Bag2.bag Bag3.bag > Cover.set  
 $ rwsetcat --integer-keys Cover.set  
  1  
  2  
  3  
  4  
  6  
  7  
  8  
  9

Inverting a Bag
 $ rwbagtool --invert Bag1.bag > Bag-inv1.bag  
 $ rwbagcat --integer-keys Bag-inv1.bag  
  2|   1|  
  7|   1|  
 10|   1|  
 14|   1|  
 23|   1|

 $ rwbagtool --invert Bag2.bag > Bag-inv2.bag  
 $ rwbagcat --integer-keys Bag-inv2.bag  
  1|   1|  
  2|   2|  
 32|   1|

 $ rwbagtool --invert Bag3.bag > Bag-inv3.bag  
 $ rwbagcat --integer-keys Bag-inv3.bag  
  8|   2|  
 10|   1|  
 12|   1|  
 14|   1|

Masking Bag Files
 $ rwbagtool --intersect=Mask.set Bag1.bag > Bag-mask.bag  
 $ rwbagcat --integer-keys Bag-mask.bag  
  4|   7|  
  6|  14|  
  8|   2|

 $ rwbagtool --complement-intersect=Mask.set Bag1.bag > Bag-mask2.bag  
 $ rwbagcat --integer-keys Bag-mask2.bag  
  3|  10|  
  7|  23|

Restricting the Output
 $ rwbagtool --add --maxkey=5 Bag1.bag Bag2.bag > Bag-res1.bag  
 $ rwbagcat --integer-keys Bag-res1.bag  
  1|   1|  
  3|  10|  
  4|   9|

 $ rwbagtool --minkkey=3 --maxkey=6 Bag1.bag > Bag-res2.bag  
 $ rwbagcat --integer-keys Bag-res2.bag  
  3|  10|  
  4|   9|  
  6|  14|

 $ rwbagtool --mincounter=20 Bag1.bag Bag2.bag > Bag-res3.bag  
 $ rwbagcat --integer-keys Bag-res3.bag  
  7|  55|

 $ rwbagtool --sub --maxcounter=9 Bag1.bag Bag2.bag > Bag-res4.bag  
 $ rwbagcat --integer-keys Bag-res4.bag  
  4|   5|

SEE ALSO

rwbag(1), rwbagbuild(1), rwbagcat(1), rwfileinfo(1), rwset(1), rwsetcat(1)

rwcat

Concatenate SiLK Flow files into single stream

SYNOPSIS

  rwcat [--output-path=FILE] [--note-add=TEXT] [--note-file-add=FILE]  
        [--print-filenames] [--byte-order={big | little | native}]  
        [--ipv4-output] [--compression-method=COMP_METHOD]  
        [--site-config-file=FILENAME]  
        {[--xargs] | [--xargs=FILENAME] | [ input-files ... ]}

  rwcat --help

  rwcat --version

DESCRIPTION

rwcat reads SiLK Flow records from the specified input files and writes the records in the standard binary SiLK format to the specified output-path; rwcat will write the records to the standard output when stdout is not the terminal and –output-path is not provided.

When the –xargs switch is provided, rwcat will read the names of the files to process from the named text file, or from the standard input if no file name argument is provided to the switch. The input should contain one filename per line.

If the input file names end in .gz, they will be uncompressed as they are read. When stdin is provided as an input file name, rwcat will read records from the standard input.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as –arg=param or –arg param, though the first form is required for options that take optional parameters.

–output-path=FILE

Write the SiLK Flow records to FILE, which must not exist. If the switch is not provided or if FILE is stdout, flows are written to the standard output. If the name ends in .gz, the output will be compressed using gzip(1).

–note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

–note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

–byte-order=ENDIAN

Set the byte order for the output SiLK Flow records. The argument is one of the following:

native

Use the byte order of the machine where rwcat is running. This is the default.

big

Use network byte order (big endian) for the output.

little

Write the output in little endian format.

–ipv4-output

Force the output to contain only IPv4 addresses. When this switch is specified, IPv6 addresses are ignored unless the IPv6 address is an encapsulation of an IPv4 address, in which case the IPv4 address will be written to the output. By default, rwcat writes IP addresses in the same format as the input file. When SiLK has not been compiled with IPv6 support, this switch has no effect.

–compression-method=COMP_METHOD

Set the compression method of the output to COMP_METHOD. Some SiLK tools can use an external library to compress their binary output. The list of available compression methods and the default method are set when SiLK is compiled (the –help and –version switches print the available and default compression methods) and depend on which supported libraries are found. SiLK can support:

none

Do not compress the output using an external library

zlib

Use the zlib(3) library for compressing the output

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression

best

Use whichever available method gives the best compression in general, though not necessarily the best for this particular output.

–print-filenames

Print the names of input files and the number of records each file contains as the files are read.

–site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK_CONFIG_FILE environment variable is used if that variable is not empty. The value of SILK_CONFIG_FILE should include the name of the file. Otherwise, the application looks for a file named silk.conf in the following directories: the directory specified in the SILK_DATA_ROOTDIR environment variable; the data root directory that is compiled into SiLK (use the –version switch to view this value); the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/; and the share/silk/ and share/ directories parallel to the application’s directory.

–xargs
–xargs=FILENAME

Causes rwcat to read file names from FILENAME or from the standard input if FILENAME is not provided. The input should have one file name per line. rwcat will open each file in turn and read records from it, as if the files had been listed on the command line.

–help

Print the available options and exit.

–version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

To combine the results of several rwfilter runs—stored in the files run1.rwf, run2.rwf, ... runN.rwf —together, you can use:

  rwcat --output=combined.dat  *.rwf

If the shell complains about too many arguments, you can use the UNIX find(1) function and pipe its output to rwcat:

  find . -name ’*.rwf’ -print | \  
        rwcat --xargs --output=combined.dat

ENVIRONMENT

SILK_CONFIG_FILE

This environment variable is used as the value for the –site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

When the –site-config-file switch is not provided and the SILK_CONFIG_FILE environment variable is not set, rwcat looks for the site configuration file in $SILK_DATA_ROOTDIR/silk.conf.

SILK_PATH

This environment variable gives the root of the install tree. As part of its search for the SiLK site configuration file, rwcat checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share.

SEE ALSO

rwfilter(1), gzip(1), find(1)

BUGS

Although rwcat will read from the standard input, this feature should be used with caution. rwcat will treat the standard input as a single file, as it has no way to know when one file ends and the next begins. The following will not work:

  cat run1.rwf run2.rwf | rwcat --output=combined.dat  # WRONG!

The header of run2.rwf will be treated as data of run1.rwf, resulting in corrupt output.

rwcompare

Compare the records in two SiLK Flow files

SYNOPSIS

  rwcompare [--quiet] FILE1 FILE2

  rwcompare --help

  rwcompare --version

DESCRIPTION

rwcompare opens the two files named on the command and compares the SiLK Flow records they contain. If the records are identical, rwcompare exits with status 0. If any of the records differ, rwcompare prints a message and exits with status 1. If there is an issue reading either file, an error is printed and the exit status is 2. Use the –quiet switch to suppress all output (error messages included). You may use - or stdin for one of the file names, in which case rwcompare reads from the standard input.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as –arg=param or –arg param, though the first form is required for options that take optional parameters.

–quiet

Do not print a message if the files differ, and do not an print error message if a file cannot be opened or read.

–help

Print the available options and exit.

–version

Print the version number and information about how SiLK was configured, then exit the application.

SEE ALSO

rwfileinfo(1)

rwcount

Print traffic summary across time

SYNOPSIS

  rwcount [--bin-size=SIZE] [--load-scheme=LOADSTYLE]  
        [--start-epoch=START_TIME] [--end-epoch=END_TIME]  
        [--epoch-slots] [--bin-slots] [--skip-zeroes] [--no-titles]  
        [--no-columns] [--column-separator=CHAR]  
        [--no-final-delimiter] [{--delimited | --delimited=CHAR}]  
        [--print-filenames] [--copy-input=PATH] [--output-path=PATH]  
        [--pager=PAGER_PROG] [--site-config-file=FILENAME]  
        [{--legacy-timestamps | --legacy-timestamps=NUM}]  
        [FILES...]

  rwcount --help

  rwcount --version

DESCRIPTION

rwcount summarizes SiLK flow records across time. It counts the records in the input stream, and groups their byte and packet totals into time bins. rwcount produces textual output with a row for each bin.

When input files are not specified on the command line, rwcount will read records from the standard input.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as –arg=param or –arg param, though the first form is required for options that take optional parameters.

–bin-size=SIZE

Denote the size of each time bin, in seconds; defaults to 30 seconds. rwcount supports millisecond size bins; SIZE may be a floating point value equal to or greater than than 0.001.

–load-scheme=LOADSTYLE

Determine how the duration of each flow is mapped onto the time bins. LOADSTYLE can be one of the following:

 0 

Assume the traffic is evenly distributed across the bins that contain any part of the flow’s duration. For a flow whose duration spans five bins, each bin’s packet- and byte-counts will be incremented with 1/5 of the values for the entire flow.

The traffic is NOT evenly distributed across the flow’s duration, since, when using a bin-size of 30 seconds, a particularly placed 32 second flow will span three bins, and each bin will receive 1/3 of the flow. Compare with option 4.

 1 

Assume all of the traffic occurs in the initial millisecond of the flow’s duration. For a flow whose duration spans five bins, the first bin’s packet- and byte-counts will be incremented with the values for the entire flow.

 2 

Assume all of the traffic occurs in the last millisecond of the flow’s duration. For a flow whose duration spans five bins, the fifth bin’s packet- and byte-counts will be incremented with the values for the entire flow.

 3 

Assume all of the traffic occurs in the middle millisecond of the flow’s duration. For a flow whose duration spans five bins, the third bin’s packet- and byte-counts will be incremented with the values for the entire flow.

 4 

Assume the traffic is evenly distributed during each millisecond that the flow is active. For a flow whose duration spans five bins, each bin will receive a portion of the flow-, packet-, and byte-counts weighted by the amount of time the flow spent in each bin.

When using 30 second bins, a particularly placed 32 second flow will add 1/32 of its value to the first and last bins, and 30/32 to the middle bin.

The default LOADSTYLE is 4.

–start-epoch=START_TIME

Denote the time to use for the first bin. START_TIME may be in UNIX epoch seconds or in yyyy/mm/dd:HH[:MM[:SS[.sss]]] format.

–end-epoch=END_TIME

Denote the time to use for the final bin. END_TIME may be in UNIX epoch seconds or in yyyy/mm/dd:HH[:MM[:SS[.sss]]] format. When neither START_TIME nor END_TIME are not specified to the millisecond, the ceiling of END_TIME is used. END_TIME will be adjusted so that the number of bins is an integer value. When both START_TIME and END_TIME are used, rwcount will allocate bins for the entire time span before it begins processing data, or exit abnormally if it cannot allocate the required memory.

–epoch-slots

Use the UNIX epoch time as the label for each bin in the output; the default is to label each bin with the time in a human-readable format.

–bin-slots

Use the internal bin index as the label for each bin in the output; the default is to label each bin with the time in a human-readable format.

–skip-zeroes

Disable printing of bins with no traffic. By default, all bins are printed.

–no-titles

Turn off column titles. By default, titles are printed.

–no-columns

Disable fixed-width columnar output.

–column-separator=C

Use specified character between columns and after the final column. When this switch is not specified, the default of ’|’ is used.

–no-final-delimiter

Do not print the column separator after the final column. Normally a delimiter is printed.

–delimited
–delimited=C

Run as if –no-columns –no-final-delimiter –column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default ’|’.

–print-filenames

Print to the standard error the names of input files as they are opened.

–copy-input=PATH

Copy all binary input to the specified file or named pipe. PATH can be stdout to print flows to the standard output as long as the –output-path switch has been used to redirect rwcount’s ASCII output.

–output-path=PATH

Determine where the output of rwcount (ASCII text) is written. If this option is not given, output is written to the standard output.

–pager=PAGER_PROG

When output is to a terminal, invoke the program PAGER_PROG to view the output one screen full at a time. This switch overrides the SILK_PAGER environment variable, which in turn overrides the PAGER variable. If the value of the pager is determined to be the empty string, no paging will be performed and all output will be printed to the terminal.

–site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK_CONFIG_FILE environment variable is used if that variable is not empty. The value of SILK_CONFIG_FILE should include the name of the file. Otherwise, the application looks for a file named silk.conf in the following directories: the directory specified in the SILK_DATA_ROOTDIR environment variable; the data root directory that is compiled into SiLK (use the –version switch to view this value); the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/; and the share/silk/ and share/ directories parallel to the application’s directory.

–legacy-timestamps
–legacy-timestamps=NUM

Specify the format for human readable timestamps, either the default (new) style, YYYY/MM/DDThh:mm:ss , or the legacy style, MM/DD/YYYY hh:mm:ss . When this switch is not present, the timestamps will be in the default format. When this switch is present and no argument is given, timestamps are in the legacy format. When an argument is supplied, timestamps will be in the new format if the argument begins with 0, and in the old format if the argument begins with 1. Any other argument to the switch is an error.

–help

Print the available options and exit.

–version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

To count all web traffic on Jan 1, 2003, into 1 hour bins:

  rwfilter --pass=stdout --start-date=2003/01/01:00 \  
        --end-date=2003/01/01:24 --proto=6 --aport=80 \  
        | rwcount --bin-size=3600  
                 Date|       Records|          Bytes|      Packets|  
  2003/01/01T00:00:00|      12947.00|     1968190.00|     34312.00|  
  2003/01/01T01:00:00|      65318.00|     5783959.00|    100143.00|  
  2003/01/01T02:00:00|      13765.00|     1895933.00|     36121.00|  
  2003/01/01T03:00:00|      69599.00|     7062388.00|    144130.00|  
  2003/01/01T04:00:00|     204717.00|    18491693.00|    385293.00|  
  2003/01/01T05:00:00|      18664.00|     2352966.00|     45296.00|  
  ....

To force the hourly bins in the previous example to run from 30 minutes past the hour, use the –start-epoch switch:

  rwfilter ...| \  
        rwcount --bin-size=3600 --start-epoch=2002/12/31:23:30

ENVIRONMENT

SILK_PAGER

When set to a non-empty string, rwcount automatically invokes this program to display its output a screen at a time. If set to an empty string, rwcount does not automatically page its output.

PAGER

When set and SILK_PAGER is not set, rwcount automatically invokes this program to display its output a screen at a time.

SILK_CONFIG_FILE

This environment variable is used as the value for the –site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

When the –site-config-file switch is not provided and the SILK_CONFIG_FILE environment variable is not set, rwcount looks for the site configuration file in $SILK_DATA_ROOTDIR/silk.conf.

SILK_PATH

This environment variable gives the root of the install tree. As part of its search for the SiLK site configuration file, rwcount checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share.

SEE ALSO

rwfilter(1), rwuniq(1)

BUGS

rwuniq(1)’s –bin-time switch can do time-binning similar to what rwcount supports, but rwuniq cannot divide a SiLK record among multiple bins, i.e., there is no support for a –load-factor type switch. Such a feature could greatly increase rwuniq’s already large memory requirements.

rwcut

Print selected fields of binary SiLK Flow records

SYNOPSIS

  rwcut [--fields=FIELDS] [--all-fields] [--plugin=PLUGIN]  
        [--start-rec-num=NUM] [--end-rec-num=NUM] [--num-recs=NUM]  
        [--dry-run] [--icmp-type-and-code] [--epoch-time]  
        [{--integer-ips | --zero-pad-ips}] [--integer-sensors]  
        [--no-titles] [--no-columns] [--column-separator=CHAR]  
        [--no-final-delimiter] [{--delimited | --delimited=CHAR}]  
        [--print-filenames] [--copy-input=PATH] [--output-path=PATH]  
        [--pager=PAGER_PROG] [--site-config-file=FILENAME]  
        [--ipv6-policy={ignore,asv4,mix,force,only}]  
        [{--legacy-timestamps | --legacy-timestamps=NUM}]  
        [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]  
        [--pmap-column-width=NUM] [--python-file=PATH ...]  
        [FILES...]

  rwcut [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]  
        [--plugin=PLUGIN ...] [--python-file=PATH ...] --help

  rwcut --version

DESCRIPTION

rwcut reads binary SiLK Flow records from files listed on the command line or from the standard input and prints the records to the screen in a textual, bar (|) delimited format. See the EXAMPLES section below for sample output.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as –arg=param or –arg param, though the first form is required for options that take optional parameters.

–fields=FIELDS

FIELDS contains the list of flow attributes (a.k.a. fields or columns) to print. The columns will be displayed in the order the fields are specified. Fields may be repeated. FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Field-names are case-insensitive. Example:

 --fields=stime,10,1-5

If the –fields switch is not given, FIELDS defaults to:

 sIP,dIP,sPort,dPort,protocol,packets,bytes,flags,sTime,dur,eTime,sensor

The complete list of built-in fields that the SiLK tool suite supports follows, though note that not all fields are present in all SiLK file formats; when a field is not present, its value is 0.

sIP,1

source IP address

dIP,2

destination IP address

sPort,3

source port for TCP and UDP, or equivalent

dPort,4

destination port for TCP and UDP, or equivalent

protocol,5

IP protocol

packets,pkts,6

packet count

bytes,7

byte count

flags,8

bit-wise OR of TCP flags over all packets

sTime,9

starting time of flow (millisecond resolution unless the –legacy-timestamps switch is specified)

dur,10

duration of flow (millisecond resolution unless the –legacy-timestamps switch is specified)

eTime,11

end time of flow (millisecond resolution unless the –legacy-timestamps switch is specified)

sensor,12

name or ID of sensor at the collection point

class,20

class of sensor at the collection point

type,21

type of sensor at the collection point

sTime+msec,22

starting time of flow including milliseconds (milliseconds are always displayed)

eTime+msec,23

end time of flow including milliseconds (milliseconds are always displayed)

dur+msec,24

duration of flow including milliseconds (milliseconds are always displayed)

icmpTypeCode,25

include two columns, iType and iCode that contain the ICMP type and code for ICMP flows; for non-ICMP flows, these columns are empty

Many SiLK file formats do not store the following fields and their values will always be 0; they are listed here for completeness:

in,13

router SNMP input interface

out,14

router SNMP output interface

nhIP,15

router next hop IP

SiLK can store flows generated by enhanced collection software that provides more information than NetFlow v5. These flows may support some or all of these additional fields; for flows without this additional information, the field’s value is always 0.

initialFlags,26

TCP flags on first packet in the flow

sessionFlags,27

bit-wise OR of TCP flags over all packets except the first in the flow

attributes,28

flow attributes set by the flow generator:

F

flow generator saw additional packets in this flow following a packet with a FIN flag (excluding ACK packets)

T

flow generator prematurely created a record for a long-running connection due to a timeout. (When the flow generator yaf(1) is run with the –silk switch, it will prematurely create a flow and mark it with T if the byte count of the flow cannot be stored in a 32-bit value.)

C

flow generator created this flow as a continuation of long-running connection, where the previous flow for this connection met a timeout (or a byte threshold in the case of yaf).

Consider a long-running ssh session that exceeds the flow generator’s active timeout. (This is the active timeout since the flow generator creates a flow for a connection that still has activity). The flow generator will create multiple flow records for this ssh session, each spanning some portion of the total session. The first flow record will be marked with a T indicating that it hit the timeout. The second through next-to-last records will be marked with TC indicating that this flow both timed out and is a continuation of a flow that timed out. The final flow will be marked with a C, indicating that it was created as a continuation of an active flow.

application,29

guess as to the content the flow. Some software that generates flow records from packet data, such as yaf, will inspect the contents of the packets that make up a flow and use traffic signatures to label the content of the flow. SiLK calls this label the application; yaf refers to it as the appLabel. The application is the port number that is traditionally used for that type of traffic (see the /etc/services file on most UNIX systems). For example, traffic that the flow generator recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard HTTP/web port (80).

The list of built-in fields may be augmented by run-time loading of plug-ins (shared object files or dynamic libraries) when the plug-in is available. rwcut automatically looks for the following plug-ins:

ADDRESS TYPE (addrtype.so)

stype,16

for the source IP address, the value 0 if the address is non-routable, 1 if it is internal, or 2 if it is routable and external. See addrtype(3).

dtype,17

as stype for the destination IP address

COUNTRY CODE (ccfilter.so)

scc,18

for the source IP, a two-letter country code abbreviation denoting the country who owns that IP address. See ccfilter(3).

dcc,19

as scc for the destination IP

PREFIX MAP (pmapfilter.so)

src-MAPNAME

value determined by passing the source IP or the protocol/source-port to the user-defined mapping defined in the prefix map associated with MAPNAME. See the description of the –pmap-file switch and the pmapfilter(3) manual page.

dst-MAPNAME

as src-MAPNAME for the destination IP or protocol/destination-port.

sval
dval

These are deprecated field names created by pmapfilter that correspond to src-MAPNAME and dst-MAPNAME, respectively. These fields are available when a prefix map is used that is not associated with a MAPNAME.

–all-fields

Instruct rwcut to print all known fields. This switch cannot be combined with the –fields switch. This switch suppresses error messages from the plug-ins.

–plugin=PLUGIN

Augment the list of fields by using run-time loading of the plug-in (shared object) whose path is PLUGIN. The creation of these plug-ins is beyond the scope of this manual page. When PLUGIN contains a slash (/), rwcut assumes the path to PLUGIN is correct. Otherwise, rwcut will attempt to find the file in $SILK_PATH/lib/silk, $SILK_PATH/share/lib, $SILK_PATH/lib, and in these directories parallel to the application’s directory: lib/silk, share/lib, and lib. If rwcut does not find the file, it assumes the plug-in is in the current directory. To force rwcut to look in the current directory first, specify –plugin=./PLUGIN. When the SILK_PLUGIN_DEBUG environment variable is non-empty, rwcut prints status messages to the standard error as it tries to open each of its plug-ins.

–start-rec-num=START_NUM

Begin printing with the START_NUM’th record by skipping the first START_NUM-1 records. The default is 1; that is, to start printing at the first record; START_NUM must be a positive integer. If START_NUM is greater than the number of input records, the only output will be the title. This parameter does not affect the records written to the stream specified by –copy-input.

–end-rec-num=END_NUM

Stop printing after the END_NUM’th record. When END_NUM is 0, the default, printing stops once all input records have been printed; that is, END_NUM is effectively infinity. If this value is non-zero, it must not be less than START_NUM. This parameter does not affect the records written to the stream specified by –copy-input.

–num-recs=REC_COUNT

Print no more than REC_COUNT records; however, if both –start-rec-num and –end-rec-num are specified or if END_NUM is less than REC_COUNT, this switch is ignored. Specifying a REC_COUNT of 0 will print all records, which is the default.

–dry-run

Causes rwcut to print the column headers and exit. Useful for testing.

–icmp-type-and-code

Unlike TCP or UDP, ICMP messages do not use ports, but instead have types and codes. Specifying this switch will cause rwcut to print, for ICMP records, the message’s type and code in the sPort and dPort columns, respectively. The use of this switch is discouraged; use the icmpTypeCode field instead.

–epoch-time

Print timestamps as epoch time (number of seconds since midnight GMT on 1970-01-01).

–integer-ips

Print IPs as integers. By default, IP addresses are printed in their canonical form.

–zero-pad-ips

Print IP addresses in their canonical form, but add zeros to the IP address so it fully fills the width of column. For IPv4, use three digits per octet, e.g, 127.000.000.001. For IPv6, use four digits per hexadectet and expand empty hexadectets, e.g.; 0000:0000:0000:0000:0000:FFFF:FF00:0001.

–integer-sensors

Print the integer ID of the sensor rather than its name.

–no-titles

Turn off column titles. By default, titles are printed.

–no-columns

Disable fixed-width columnar output.

–column-separator=C

Use specified character between columns and after the final column. When this switch is not specified, the default of ’|’ is used.

–no-final-delimiter

Do not print the column separator after the final column. Normally a delimiter is printed.

–delimited
–delimited=C

Run as if –no-columns –no-final-delimiter –column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default ’|’.

–print-filenames

Print to the standard error the names of input files as they are opened.

–copy-input=PATH

Copy all binary input to the specified file or named pipe. PATH can be stdout to print flows to the standard output as long as the –output-path switch has been used to redirect rwcut’s ASCII output.

–output-path=PATH

Determines where the output of rwcut (ASCII text) is written. If this option is not given, output is written to the standard output.

–pager=PAGER_PROG

When output is to a terminal, invoke the program PAGER_PROG to view the output one screen full at a time. This switch overrides the SILK_PAGER environment variable, which in turn overrides the PAGER variable. If the value of the pager is determined to be the empty string, no paging will be performed and all output will be printed to the terminal.

–ipv6-policy=POLICY

Determine how IPv4 and IPv6 flows are handled when SiLK has been compiled with IPv6 support. When the switch is not provided, the SILK_IPV6_POLICY environment variable is checked for a policy. If it is also unset or contains an invalid policy, the POLICY is mixed. When SiLK has not been compiled with IPv6 support; IPv6 flows are always ignored, regardless of the value passed to this switch or in the SILK_IPV6_POLICY variable. The supported values for POLICY are:

ignore

Completely ignore IPv6 flows. Only IPv4 flows will be printed.

asv4

Convert IPv6 addresses to IPv4 if possible, otherwise ignore the IPv6 flows.

mix

Process the input as a mixture of IPv4 and IPv6 flows.

force

Force IPv4 flows to be converted to IPv6.

only

Only process flows that were marked as IPv6 and completely ignore IPv4 flows.

–site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK_CONFIG_FILE environment variable is used if that variable is not empty. The value of SILK_CONFIG_FILE should include the name of the file. Otherwise, the application looks for a file named silk.conf in the following directories: the directory specified in the SILK_DATA_ROOTDIR environment variable; the data root directory that is compiled into SiLK (use the –version switch to view this value); the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/; and the share/silk/ and share/ directories parallel to the application’s directory.

–legacy-timestamps
–legacy-timestamps=NUM

Specify the format for human readable timestamps, either the default (new) style, YYYY/MM/DDThh:mm:ss.sss , or the legacy style, MM/DD/YYYY hh:mm:ss . When this switch is not present, the timestamps will be in the default format. When this switch is present and no argument is given, timestamps are in the legacy format. When an argument is supplied, timestamps will be in the new format if the argument begins with 0, and in the old format if the argument begins with 1. Any other argument to the switch is an error.

This switch also controls whether fractional seconds are displayed in the sTime and eTime fields when –epoch-time is requested. If the –legacy-timestamps switch is present with no value or with a value of 1, milliseconds will not be displayed; when not present or specified with a value of 0, milliseconds will be displayed.

–help

Print the available options and exit. Options that add fields can be specified before –help so that the new options appear in the output.

–version

Print the version number and information about how SiLK was configured, then exit the application.

–dynamic-library=PLUGIN

This switch is deprecated. It is an alias for –plugin.

–pmap-file=MAPNAME:PATH
–pmap-file=PATH

When the prefix map plug-in is used, rwcut reads the mapping file located at PATH. When MAPNAME is provided, it will be used to refer to the fields specific to that prefix map. If MAPNAME is not provided, rwcut will check the prefix map file to see if a map-name was specified when the file was created. Using multiple –prefix-map switches allows additional prefix map files to be read as long as each uses a unique map-name. For more information, see pmapfilter(3).

–pmap-column-width=NUM

When the pmapfilter plug-in is used, this switch gives the maximum number of characters to use when displaying the textual value of any field.

–python-file=PATH

When the SiLK Python plug-in is used, rwcut reads the Python code from the file PATH to define additional fields for possible output. This file should call register_plugin_field() for each field it wishes to define. For details and examples, see the silkpython(3) and pysilk(3) manual pages.

EXAMPLES

A standard rwcut output will look like this (with the text wrapped for readability):

            sIP|            dIP|sPort|dPort|pro|\  
    10.30.30.31|    10.70.70.71|   80|36761|  6|\

        packets|     bytes|          flags|\  
              7|      3227|      FS PA    |\

                    sTime|      dur|                  eTime|senso|  
  2003/01/01T00:00:14.625|    3.959|2003/01/01T00:00:18.584|EDGE1|

The first line of the output is the title line–this line shows what the selected fields are; the –no-titles switch will disable the printing of that line. The second line onwards will contain data.

The most basic use of rwcut is by being directly connected to rwfilter(1). For example, to see representative TCP traffic:

 rwfilter --start-date=2002/01/19:00 --end-date=2002/01/19:01 \  
      --proto=6 --pass=stdout | rwcut

To see only limited field, use the –fields switch. For example, to see only the protocols, use:

 rwcut --fields=5

The silkpython(3) manual page provides examples that use PySiLK to create and print arbitrary fields for rwcut.

The order of the FIELDS is significant, and fields can be repeated. For example, here is a case where in addition to the default fields of 1-12, you also to prefix each row with an integer form of the destination IP and the start time to make processing by another tool easier. However, within the default fields of 1-12, you want to see dotted-decimal IP addresses.

 rwfilter ... --pass=stdout | \  
       rwcut --integer-ip --fields=2,9,1-12 --epoch-time | \  
       num2dot --ip-field=3,4

ENVIRONMENT

SILK_IPV6_POLICY

This environment variable is used as the value for the –ipv6-policy when that switch is not provided.

SILK_PAGER

When set to a non-empty string, rwcut automatically invokes this program to display its output a screen at a time. If set to an empty string, rwcut does not automatically page its output.

PAGER

When set and SILK_PAGER is not set, rwcut automatically invokes this program to display its output a screen at a time.

PYTHONPATH

This environment variable is used by Python to locate modules. When –python-file is specified, rwcut loads Python which in turn loads the PySiLK module which is comprised of several files (silk/pysilk_nl.so, silk/__init__.py, etc). If this silk/ directory is located outside Python’s normal search path (for example, in the SiLK installation tree), it may be necessary to set or modify the PYTHONPATH environment variable to include the parent directory of silk/ so that Python can find the PySiLK module.

SILK_PYTHON_TRACEBACK

When set, Python plug-ins will output traceback information on Python errors to the standard error.

SILK_COUNTRY_CODES

This environment variable allows the user to specify the country code mapping file that the ccfilter(3) plug-in will use. The value may be a complete path or a file relative to the SILK_PATH. If the variable is not specified, the code looks for a file named country_codes.pmap in the location specified by SILK_PATH.

SILK_CONFIG_FILE

This environment variable is used as the value for the –site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

When the –site-config-file switch is not provided and the SILK_CONFIG_FILE environment variable is not set, rwcut looks for the site configuration file in $SILK_DATA_ROOTDIR/silk.conf.

SILK_PATH

This environment variable gives the root of the install tree. As part of its search for the SiLK site configuration file, rwcut checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share. These directories are also searched when any other configuration file is required (e.g., the country code map). In addition, rwcut looks for plug-ins in $SILK_PATH/lib/silk, $SILK_PATH/share/lib and $SILK_PATH/lib.

SILK_PLUGIN_DEBUG

When set to 1, rwcut prints status messages to the standard error as it tries to open each of its plug-ins.

NOTES

The ordering of the field numbers in –fields is significant, specifying –fields=2,1 will print destination IP, then source IP.

If you are interested in only a few fields, use the –fields option to reduce the volume of data to be processed. For example, if you are checking to see which internal host got hit with the slammer worm (signature: UDP, destPort 1434, pkt size 404), then the following rwfilter, rwcut combination will be much faster than simply using default values:

  rwfilter --proto-17 --dport=1434 --bytes-per-packet=404-404 \  
        | rwcut --fields=2

To get a mapping from the integer representing a sensor to its name, use the mapsid(1) command.

SEE ALSO

rwfilter(1), mapsid(1), num2dot(1), addrtype(3), ccfilter(3), pmapfilter(3), silkpython(3), pysilk(3), yaf(1)

rwdedupe

Eliminate duplicate SiLK Flow records

SYNOPSIS

  rwdedupe [--ignore-fields=FIELDS] [--packets-delta=NUM]  
        [--bytes-delta=NUM] [--stime-delta=NUM] [--duration-delta=NUM]  
        [--temp-directory=DIR_PATH] [--buffer-size=SIZE]  
        [--compression-method=COMP_METHOD] [--output-path=PATH]  
        [--site-config-file=FILENAME] [FILES ...]

  rwdedupe --help

  rwdedupe --version

DESCRIPTION

rwdedupe reads SiLK Flow records from the files named on the command line or from the standard input. Records that appear in the input file(s) multiple times will only appear in the output stream once; that is, duplicate records are not written to the output. The SiLK Flows are written to the file specified by the –output-path switch or to the standard output when the –output-path switch is not provided and the standard output is not connected to a terminal.

As part of its processing, rwdedupe will re-order the records before writing them.

By default, rwdedupe will consider one record to be a duplicate of another when all the fields in the records match exactly. From another point on view, any difference in two records results in both records appearing in the output. Note that all means every field that exists on a SiLK Flow record. The complete list of fields is specified in the description of –ignore-fields in the OPTIONS section below.

To have rwdedupe ignore fields in the comparison, specify those fields in the –ignore-fields switch. When –ignore-fields=FIELDS is specified, a record is considered a duplicate of another if all fields except those in FIELDS match exactly. rwdedupe will treat FIELDS as being identical across all records. Put another way, if the only difference between two records is in the FIELDS fields, only one of those records will be written to the output.

The –packets-delta, –bytes-delta, –stime-delta and –duration-delta switches allow for ”fuzziness” in the input. For example, if –stime-delta=NUM is specified and the only difference between two records is in the sTime fields, and the fields are within NUM milliseconds of each other, only one record will be written to the output.

During its processing, rwdedupe will try to allocate a large (near 2GB) in-memory array to hold the records. (You may use the –buffer-size switch to change this maximum buffer size.) If more records are read than will fit into memory, the in-core records are temporarily stored on disk as described by the –temp-directory switch. When all records have been read, the on-disk files are merged to produce the output.

By default, the temporary files are stored in the /tmp directory. Because of the sizes of the temporary files, it is strongly recommended that /tmp not be used as the temporary directory, and rwdedupe will print a warning when /tmp is used. To modify the temporary directory used by rwdedupe, provide the –temp-directory switch, set the SILK_TMPDIR environment variable, or set the TMPDIR environment variable.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as –arg=param or –arg param, though the first form is required for options that take optional parameters.

–ignore-fields=FIELDS

Ignore the fields listed in FIELDS when determining if two flow records are identical; that is, treat FIELDS as being identical across all flows. By default, all fields are treated as significant.

FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Field-names are case-insensitive. Example:

  --ignore-fields=stime,12-15

The list of supported fields are:

sIP,1

source IP address

dIP,2

destination IP address

sPort,3

source port for TCP and UDP, or equivalent

dPort,4

destination port for TCP and UDP, or equivalent

protocol,5

IP protocol

packets,pkts,6

packet count

bytes,7

byte count

flags,8

bit-wise OR of TCP flags over all packets

sTime,9

starting time of flow (milliseconds resolution)

dur,10

duration of flow (milliseconds resolution)

sensor,12

name or ID of sensor at the collection point

in,13

router SNMP input interface

out,14

router SNMP output interface

nhIP,15

router next hop IP

class,20

class of sensor at the collection point

type,21

type of sensor at the collection point

initialFlags,26

TCP flags on first packet in the flow

sessionFlags,27

bit-wise OR of TCP flags over all packets except the first in the flow

attributes,28

flow attributes set by flow generator

application,29

guess as to the content the flow. Some software that generates flow records from packet data, such as yaf(1), will inspect the contents of the packets that make up a flow and use traffic signatures to label the content of the flow. SiLK calls this label the application; yaf refers to it as the appLabel. The application is the port number that is traditionally used for that type of traffic (see the /etc/services file on most UNIX systems). For example, traffic that the flow generator recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard HTTP/web port (80).

–packets-delta=NUM

Treat the packets field on two records as being the same if the values differ by NUM packets or less. If not specified, the default is 0.

–bytes-delta=NUM

Treat the bytes field on two records as being the same if the values differ by NUM bytes or less. If not specified, the default is 0.

–stime-delta=NUM

Treat the start-time field on two records as being the same if the values differ by NUM milliseconds or less. If not specified, the default is 0.

–duration-delta=NUM

Treat the duration field on two records as being the same if the values differ by NUM milliseconds or less. If not specified, the default is 0.

–temp-directory=DIR_PATH

Specify the name of the directory in which to store data files temporarily when more records have been read that will fit into RAM. This switch overrides the directory specified in the SILK_TMPDIR environment variable, which overrides the directory specified in the TMPDIR variable, which overrides the default, /tmp.

–buffer-size=SIZE

Set the maximum size of the buffer to use for holding the records, in bytes. A larger buffer means fewer temporary files need to be created, reducing the I/O wait times. The default maximum for this buffer is near 2GB. The SIZE may be given as an ordinary integer, or as a real number followed by a suffix K, M or G, which represents the numerical value multiplied by 1,024 (kilo), 1,048,576 (mega), and 1,073,741,824 (giga), respectively. For example, 1.5K represents 1,536 bytes, or one and one-half kilobytes. (This value does not represent the absolute maximum amount of RAM that rwdedupe will allocate, since additional buffers will be allocated for reading the input and writing the output.)

–compression-method=COMP_METHOD

Set the compression method of the output to COMP_METHOD. Some SiLK tools can use an external library to compress their binary output. The list of available compression methods and the default method are set when SiLK is compiled (the –help and –version switches print the available and default compression methods) and depend on which supported libraries are found. SiLK can support:

none

Do not compress the output using an external library

zlib

Use the zlib(3) library for compressing the output

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression

best

Use whichever available method gives the best compression in general, though not necessarily the best for this particular output.

–output-path=PATH

Write the SiLK Flow records to the specified file or named pipe. This switch must not name an existing regular file. When the standard output is not a terminal and this switch is not provided or its argument is stdout, the records are written to the standard output.

–site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK_CONFIG_FILE environment variable is used if that variable is not empty. The value of SILK_CONFIG_FILE should include the name of the file. Otherwise, the application looks for a file named silk.conf in the following directories: the directory specified in the SILK_DATA_ROOTDIR environment variable; the data root directory that is compiled into SiLK (use the –version switch to view this value); the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/; and the share/silk/ and share/ directories parallel to the application’s directory.

–help

Print the available options and exit.

–version

Print the version number and information about how SiLK was configured, then exit the application.

LIMITATIONS

When the temporary files and the final output are stored on the same file volume, rwdedupe will require approximately twice as much free disk space as the size of input data.

When the temporary files and the final output are on different volumes, rwdedupe will require between 1 and 1.5 times as much free space on the temporary volume as the size of the input data.

EXAMPLE

Suppose you have made several rwfilter(1) runs to find interesting traffic:

  rwfilter --start-date=2008/02/04 ... --pass=data1.rwf  
  rwfilter --start-date=2008/02/04 ... --pass=data2.rwf  
  rwfilter --start-date=2008/02/04 ... --pass=data3.rwf  
  rwfilter --start-date=2008/02/04 ... --pass=data4.rwf

You now want to merge that traffic into a single output file, but you want to ensure that any records appearing in multiple output files are only counted once. You can use rwdedupe to merge the output:

  rwdedupe data1.rwf data2.rwf data3.rwf data4.rwf --output=data.rwf

ENVIRONMENT

SILK_TMPDIR

When set and –temp-directory is not specified, rwdedupe writes the temporary files it creates to this directory. SILK_TMPDIR overrides the value of TMPDIR.

TMPDIR

When set and SILK_TMPDIR is not set, rwdedupe writes the temporary files it creates to this directory.

SILK_CONFIG_FILE

This environment variable is used as the value for the –site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

When the –site-config-file switch is not provided and the SILK_CONFIG_FILE environment variable is not set, rwdedupe looks for the site configuration file in $SILK_DATA_ROOTDIR/silk.conf.

SILK_PATH

This environment variable gives the root of the install tree. As part of its search for the SiLK site configuration file, rwdedupe checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share.

SEE ALSO

rwfilter(1), yaf(1), zlib(3)

rwfglob

Print files that rwfilter’s File Selection switches will access

SYNOPSIS

  rwfglob [--start-date=YYYY/MM/DD[:HH] [--end-date=YYYY/MM/DD[:HH]]]  
        { [--class=CLASS] [--type={all | TYPE[,TYPE ...]}]  
         | [--flowtype=CLASS/TYPE[,CLASS/TYPE ...]] }  
        [--sensors=SENSOR[,SENSOR ...]]  
        [--data-rootdir=PATH] [--site-config-file=FILENAME]  
        [--print-missing-files] [--no-file-names] [--no-summary]

  rwfglob [--data-rootdir=PATH] [--site-config-file=FILENAME] --help

  rwfglob --version

DESCRIPTION

rwfglob accepts the normal File Selection options of rwfilter(1) and prints, to the standard output, the names of the files that would normally be accessed. At the end, a summary is printed of the number of files that exist and the number of those files that are on tape. (The on tape number is determined by seeing how many files had 0 blocks allocated to them.) By default, rwfglob only prints the names of files that exist; to see the names of files that it did not find, supply the –print-missing-files switch.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as –arg=param or –arg param, though the first form is required for options that take optional parameters.

–start-date=YYYY/MM/DD[:HH]
–end-date=YYYY/MM/DD[:HH]

The date predicates indicate which days and hours to consider when creating the list of files. The dates are expressed in YYYY/MM/DD:HH format. For example, 2003/01/18:00 represents the first hour of January 18th, 2003, while 2002/10/01:22 corresponds to 22:00 on October 1st, 2002.

Whether the date strings represent times in GMT or the local timezone depend on how SiLK was compiled. See the output from –help or check the Timezone support setting in the –version output to determine how your version of SiLK was compiled.

When both –start-date and –end-date are specified to hour precision, all hours within that time range are processed.

When –start-date is specified to day precision, the hour specified in –end-date (if any) is ignored, and files for all dates between midnight on start-date and 23:59 on end-date are processed.

When –end-date is not specified and –start-date is specified to day precision, files for that complete day are processed.

When –end-date is not specified and –start-date is specified to hour precision, files for that single hour are processed.

It is an error to specify –end-date without specifying –start-date.

When neither –start-date nor –end-date is given, rwfglob prints all files for the current day.

–class=CLASS

The –class switch is used to specify a group of data to process. Only a single class may be selected. Classes are defined in the silk.conf(5) site configuration file. If the –class option is not given, the default-class as specified in silk.conf is used. Use the –help option to see the list of available classes and the default class.

–type={all | TYPE[,TYPE]}

The –type predicate further specifies data within the selected CLASS by listing the TYPEs of traffic to process. The switch takes a comma-separated list of types or the keyword all which specifies all types for the specified CLASS. Types are defined in silk.conf, they typically refer to the direction of the flow, and they may vary by class. Classes typically define default-types to use when the –type switch is not specified. Use the –help option to get the list of available types for each class.

–flowtypes=CLASS/TYPE[,CLASS/TYPE
...]

The –flowtype predicate provides an alternate way to specify class/type pairs. The –flowtype switch allows a single rwfglob invocation to print data from multiple classes. The keyword all may be used for the CLASS and/or TYPE to select all classes and/or types.

–sensors=SENSOR[,SENSOR
...]

The –sensor switch is used to select data from specific sensors. The parameter is a comma separated list of sensor names, sensor IDs (integers), and/or ranges of sensor IDs. Sensors are defined in the silk.conf(5) site configuration file, and the mapsid(1) command can be used to print a mapping of sensor names to IDs and classes. When the –sensor switch is not specified, the default is to use all sensors which are valid for the specified class(es).

–data-rootdir=PATH

This option causes rwfglob to use PATH as the root of the data store directory, which overrides the location given in the SILK_DATA_ROOTDIR environment variable, which overrides the location that was compiled into rwfglob. The default data store directory will be shown when the –version option is given.

–site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK_CONFIG_FILE environment variable is used if that variable is not empty. The value of SILK_CONFIG_FILE should include the name of the file. Otherwise, the application looks for a file named silk.conf in the following directories: the root of the data directory (see –data-rootdir); the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/; and the share/silk/ and share/ directories parallel to the application’s directory.

–print-missing-files

This option prints to the standard error file names that rwfglob expected to find but did not. This switch is useful for debugging, but the list of files it produces can be misleading. For example, suppose there is a decommissioned sensor that still appears in the silk.conf file to permit retrieval of historical data; these data files will be missing even though their absence is expected. Use the output from this switch judiciously.

–no-file-names

This option instructs rwfglob not to print the names of the files that it successfully finds. By default, rwfglob prints the names of the files it finds and a summary line showing the number of files it found.

–no-summary

This option instructs rwfglob not to print the summary line (that is, the line that shows the number of files found). By default, rwfglob prints the names of the files it finds and a summary line showing the number of files it found.

–help

Print the available options and exit. The available classes and types will be included in output; you may specify a different root directory or site configuration file before –help to see the classes and types available for that site.

–version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

Looking at a day on a single sensor:

  $ rwfglob --start=2003/10/11 --sensor=2  
  /data/in/2003/10/11/in-GAMMA_20031011.23  
  /data/in/2003/10/11/in-GAMMA_20031011.22  
  /data/in/2003/10/11/in-GAMMA_20031011.21  
  /data/in/2003/10/11/in-GAMMA_20031011.20  
  /data/in/2003/10/11/in-GAMMA_20031011.19  
  /data/in/2003/10/11/in-GAMMA_20031011.18  
  /data/in/2003/10/11/in-GAMMA_20031011.17  
  /data/in/2003/10/11/in-GAMMA_20031011.16  
  /data/in/2003/10/11/in-GAMMA_20031011.15  
  /data/in/2003/10/11/in-GAMMA_20031011.14  
  /data/in/2003/10/11/in-GAMMA_20031011.13  
  /data/in/2003/10/11/in-GAMMA_20031011.12  
  /data/in/2003/10/11/in-GAMMA_20031011.11  
  /data/in/2003/10/11/in-GAMMA_20031011.10  
  /data/in/2003/10/11/in-GAMMA_20031011.09  
  /data/in/2003/10/11/in-GAMMA_20031011.08  
  /data/in/2003/10/11/in-GAMMA_20031011.07  
  /data/in/2003/10/11/in-GAMMA_20031011.06  
  /data/in/2003/10/11/in-GAMMA_20031011.05  
  /data/in/2003/10/11/in-GAMMA_20031011.04  
  /data/in/2003/10/11/in-GAMMA_20031011.03  
  /data/in/2003/10/11/in-GAMMA_20031011.02  
  /data/in/2003/10/11/in-GAMMA_20031011.01  
  /data/in/2003/10/11/in-GAMMA_20031011.00  
  globbed 24 files; 0 on tape

If you only want the summary, specify –no-file-names

  $ rwfglob --start-date=2003/10/11 --sensor=2 --no-file-names  
  globbed 24 files; 0 on tape

ENVIRONMENT

SILK_CONFIG_FILE

This environment variable is used as the value for the –site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

When set, overrides the compiled-in value for the location of the directory tree containing the files of SiLK Flow records collected and stored by the packing system (rwflowpack(8)). In addition, when the –site-config-file switch is not provided and the SILK_CONFIG_FILE environment variable is not set, rwfglob looks for the site configuration file in $SILK_DATA_ROOTDIR/silk.conf.

SILK_PATH

This environment variable gives the root of the install tree. As part of its search for the SiLK site configuration file, rwfglob checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share.

SEE ALSO

rwfilter(1), mapsid(1), silk.conf(5)

BUGS

The –print-missing-files option needs to be smarter about what files are really missing.

The block size check is of unknown portability across different tape-farm systems.

rwfileinfo

Print information about a SiLK file

SYNOPSIS

  rwfileinfo [--fields=FIELDS] [--summary] [--no-titles] FILE [ FILE ... ]

  rwfileinfo --help

  rwfileinfo --version

DESCRIPTION

rwfileinfo prints information about a SiLK file. The information that may be printed is:

  1. format. The output file format, a string and its hexadecimal equivalent: FT_RWSPLIT(0x12), FT_RWFILTER(0x13), etc

  2. version. The version of the file, an integer. As of SiLK 1.0, the version of the file is distinct from the version of the records in the file.

  3. byte-order. The byte-order (endian-ness) of the file, a string

  4. compression. The compression library used to compress the data-section of the file, a string and its decimal equivalent (none(0), lzo1x(2). Does not include any external compression, such as if the entire file has been compressed with gzip(1).

  5. header-length. The length of the header in bytes

  6. record-length. The length of a single record in bytes. This will be 1 if the records do not have a fixed size.

  7. count-records. The number of records in the file. If the record-size is 1, this value is the uncompressed size of the data section of the file.

  8. file-size. The size of the file as it is on disk

  9. command-lines. The command(s) used to generate this file, for tools that support writing that information to the header and for formats that store that information.

  10. record-version. The version of the records contained in the file

  11. silk-version. The release of SiLK that wrote this file, e.g., 1.0.0. This value is 0 for files written by releases of SiLK prior to 1.0.

  12. packed-file-info. The timestamp, flowtype, and sensor for a file in the SiLK data repository.

  13. probe-name. The probe information for files created by flowcap(8)

  14. annotations. The notes (annotations) that have been added to the file with the –note-add and –note-file-add switches

  15. prefix-map. The mapname value for a prefix map file.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as –arg=param or –arg param, though the first form is required for options that take optional parameters.

–fields=FIELDS

Determines which information about the file is printed. FIELDS is a list of integers representing fields to print. The FIELDS may be a comma separated list of integers; a range may be specified by separating the start and end of the range with a hyphen (-). The available fields are listed above. Fields are always printed in the order given above. If the –fields option is not given, all fields are printed.

–summary

Prints a summary that lists the number of files processed, the sizes of those files, and the number of records contained in those files.

–no-titles

Suppresses printing of the file name and field names; only the values are printed, left justified and one per line.

–help

Print the available options and exit.

–version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLE

  $ rwfileinfo tcp-data.rwf  
  tcp-data.rwf:  
    format(id)          FT_RWGENERIC(0x16)  
    version             16  
    byte-order          littleEndian  
    compression(id)     none(0)  
    header-length       208  
    record-length       52  
    record-version      5  
    silk-version        1.0.1  
    count-records       7  
    file-size           572  
    command-lines  
                     1  rwfilter --proto=6 --pass=tcp-data.rwf ...  
    annotations  
                     1  This is some interesting TCP data

  $ rwfileinfo --no-titles --field=count-records tcp-data.rwf  
  7

SEE ALSO

rwfilter(1)

rwfilter

Choose which SiLK Flow records to process

SYNOPSIS

  rwfilter [--threads=N] [--plugin=PLUGIN [--plugin=PLUGIN ...]]  
        [--pass-destination=PASS_PATH]  
        [--fail-destination=FAIL_PATH] [--all-destination=ALL_PATH]  
        [--input-pipe=INPUT_PATH] [--xargs=INPUT_STREAM]  
        [{ --print-statistics | --print-volume-statistics }]  
        [--print-filenames] [--print-missing-filenames]  
        [--dry-run] [--max-pass-records=N] [--max-fail-records=N]  
        [--note-add=TEXT] [--note-file-add=FILE]  
        [--compression-method=COMP_METHOD]  
        [--start-date=YYYY/MM/DD[:HH] [--end-date=YYYY/MM/DD[:HH]]]  
        { [--class=CLASS] [--type={all | TYPE[,TYPE ...]}]  
         | [--flowtype=CLASS/TYPE[,CLASS/TYPE ...]] }  
        [--sensors=SENSOR[,SENSOR ...]]  
        [--data-rootdir=PATH] [--site-config-file=FILENAME]  
        [--stime=DATE_RANGE] [--etime=DATE_RANGE]  
        [--active-time=DATE_RANGE] [--duration=DECIMAL_RANGE]  
        [--sport=INTEGER_LIST] [--dport=INTEGER_LIST]  
        [--aport=INTEGER_LIST] [--protocol=INTEGER_LIST]  
        [--icmp-type=INTEGER_LIST] [--icmp-code=INTEGER_LIST]  
        [--bytes=INTEGER_RANGE] [--packets=INTEGER_RANGE]  
        [--bytes-per-packet=DECIMAL_RANGE]  
        [{--saddress=IP_ADDR_MASK | --not-saddress=IP_ADDR_MASK}]  
        [{--daddress=IP_ADDR_MASK | --not-daddress=IP_ADDR_MASK}]  
        [{--any-address=IP_ADDR_MASK | --not-any-address=IP_ADDR_MASK}]  
        [{--next-hop-id=IP_ADDR_MASK | --not-next-hop-id=IP_ADDR_MASK}]  
        [{--sipset=IP_SET_FILENAME | --not-sipset=IP_SET_FILENAME}]  
        [{--dipset=IP_SET_FILENAME | --not-dipset=IP_SET_FILENAME}]  
        [{--anyset=IP_SET_FILENAME | --not-anyset=IP_SET_FILENAME}]  
        [{--nhipset=IP_SET_FILENAME | --not-nhipset=IP_SET_FILENAME}]  
        [--input-index=INTEGER_LIST] [--output-index=INTEGER_LIST]  
        [--tcp-flags=TCP_FLAGS] [--flags-all=HIGH_MASK_FLAGS_LIST]  
        [--fin-flag=SCALAR] [--syn-flag=SCALAR] [--rst-flag=SCALAR]  
        [--psh-flag=SCALAR] [--ack-flag=SCALAR] [--urg-flag=SCALAR]  
        [--ece-flag=SCALAR] [--cwr-flag=SCALAR]  
        [--flags-initial=HIGH_MASK_FLAGS_LIST]  
        [--flags-session=HIGH_MASK_FLAGS_LIST]  
        [--attributes=ATTRIBUTES_LIST] [--application=INTEGER_LIST]  
        [--ip-version=INTEGER_LIST]  
        [--scc=COUNTRY_CODE_LIST] [--dcc=COUNTRY_CODE_LIST]  
        [--stype=SCALAR] [--dtype=SCALAR]  
        [--ippair-any=FILENAME] [--ipport-any=FILENAME]  
        [--tuple-file=TUPLE_FILENAME { [--tuple-fields=FIELDS]  
                                       [--tuple-direction=DIRECTION]  
                                       [--tuple-delimiter=CHAR] } ]  
        [--python-expr=PYTHON_EXPR]  
        [--python-file=FILENAME [--python-file=FILENAME ...]]  
        [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]  
         { [--pmap-src-MAPNAME=LABELS] [--pmap-dst-MAPNAME=LABELS]  
           [--pmap-any-MAPNAME=LABELS] } ]

  rwfilter [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]  
        [--plugin=PLUGIN ...] [--python-file=PATH]  
        [--data-rootdir=PATH] [--site-config-file=FILENAME]  
        --help

  rwfilter --version

DESCRIPTION

rwfilter serves two purposes: (1) It acts as an interface to the data store to select which SiLK Flow records to process, and (2) it partitions those records into one or more pass and/or fail streams.

The selection switches let one choose records by where the flow was collected (its sensor), the date of collection, and the flow’s direction.

The partitioning switches describe various types of traffic behavior (e.g., TCP traffic, or all traffic going to port 80). rwfilter identifies records matching or violating the behavior(s), and partitions them into appropriate output streams (i.e., files) as specified.

These output streams from rwfilter are always binary. The output must be passed through another tool in the SiLK Tool Suite for further processing to get human-readable output.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as –arg=param or –arg param, though the first form is required for options that take optional parameters.

Output Switches

At least one of the following output switches must be provided:

–pass-destination=PASS_PATH

PASS_PATH refers to a non-existent file, a named pipe, or stdout. The pass-destination will output records which have passed ALL of the partitioning predicates.

–fail-destination=FAIL_PATH

FAIL_PATH refers to a non-existent file, a named pipe, or stdout. The fail-destination will output records which failed ANY of the partitioning predicates.

–all-destination=ALL_PATH

ALL_PATH refers to a file, a named pipe, or stdout. This output will output all records read by rwfilter.

–print-statistics
–print-statistics=PATH

Prints out the statistics on files read - the number of records which passed, the number which failed and the total read. If a PATH is provided, the statistics will be printed there; otherwise they are printed to the standard error.

–print-volume-statistics
–print-volume-statistics=PATH

An enhanced version of –print-statistics, in that the statistics include the number of records, packets, and bytes that passed and failed the filter.

–help

Print the available options and exit. Options that add fields can be specified before –help so that the new options appear in the output. The available classes and types will be included in output; you may specify a different root directory or site configuration file before –help to see the classes and types available for that site.

–version

Print the version number and information about how SiLK was configured, then exit the application.

Additional Switches
–threads=N

Invoke rwfilter with N threads reading the input files. When this switch is not provided, the value in the SILK_RWFILTER_THREADS environment variable is used. If that variable is not set, rwfilter runs with a single thread. Using multiple threads, performance of rwfilter is greatly improved for queries that look at many files but return few records. Preliminary testing has found that performance peaks around four threads per CPU, but performance will vary depending on the type of query and the number of records returned.

–input-pipe=INPUT_PATH

INPUT_PATH is a named pipe or the string stdin. This refers to another source of rwfilter records. Note that rwfilter will not read from the standard input by default, to get this behavior, you must use –input-pipe=stdin.

–xargs=INPUT_PATH

Causes rwfilter to read file names from INPUT_PATH; the input should have one file name per line. rwfilter will open each file in turn and read records from it.

–print-filenames

Print the names of input files as they are read. This can be useful feedback for a long-running rwfilter process.

–dry-run

Perform a sanity check on the input arguments to check that the arguments are acceptable. In addition, prints to the standard output the names of the files that would be accessed (and the names of missing files if –print-missing is specified). rwfglob(1) can also be used to generate the lists of files that rwfilter will access.

–max-pass-records=N

Write N records to each –pass-destination. rwfilter will stop reading input once it has written these N records unless the –fail-destination or –all-destination switches were specified.

–max-fail-records=N

Write N records to each –fail-destination. rwfilter will stop reading input once it has written these N records unless the –pass-destination or –all-destination switches were specified.

–note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

–note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

–compression-method=COMP_METHOD

Set the compression method of the output to COMP_METHOD. Some SiLK tools can use an external library to compress their binary output. The list of available compression methods and the default method are set when SiLK is compiled (the –help and –version switches print the available and default compression methods) and depend on which supported libraries are found. SiLK can support:

none

Do not compress the output using an external library

zlib

Use the zlib(3) library for compressing the output

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression

best

Use whichever available method gives the best compression in general, though not necessarily the best for this particular output.

File Selection Options

The following options determine which files are read from the data store to provide the records.

–start-date=YYYY/MM/DD[:HH]
–end-date=YYYY/MM/DD[:HH]

The date predicates indicate which days and hours to consider when creating the list of files. The dates are expressed in YYYY/MM/DD:HH format. For example, 2003/01/18:00 represents the first hour of January 18th, 2003, while 2002/10/01:22 corresponds to 22:00 on October 1st, 2002.

Whether the date strings represent times in GMT or the local timezone depend on how SiLK was compiled. See the output from –help or check the Timezone support setting in the –version output to determine how your version of SiLK was compiled.

When both –start-date and –end-date are specified to hour precision, all hours within that time range are processed.

When –start-date is specified to day precision, the hour specified in –end-date (if any) is ignored, and files for all dates between midnight on start-date and 23:59 on end-date are processed.

When –end-date is not specified and –start-date is specified to day precision, files for that complete day are processed.

When –end-date is not specified and –start-date is specified to hour precision, files for that single hour are processed.

It is an error to specify –end-date without specifying –start-date.

When neither –start-date nor –end-date is given, rwfilter processes all files for the current day.

–class=CLASS

The –class switch is used to specify a group of data to process. Only a single class may be selected. Classes are defined in the silk.conf(5) site configuration file. If the –class option is not given, the default-class as specified in silk.conf is used. Use the –help option to see the list of available classes and the default class.

–type={all | TYPE[,TYPE]}

The –type predicate further specifies data within the selected CLASS by listing the TYPEs of traffic to process. The switch takes a comma-separated list of types or the keyword all which specifies all types for the specified CLASS. Types are defined in silk.conf, they typically refer to the direction of the flow, and they may vary by class. Classes typically define default-types to use when the –type switch is not specified. Use the –help option to get the list of available types for each class.

–flowtypes=CLASS/TYPE[,CLASS/TYPE
...]

The –flowtype predicate provides an alternate way to specify class/type pairs. The –flowtype switch allows a single rwfilter invocation to process data from multiple classes. The keyword all may be used for the CLASS and/or TYPE to select all classes and/or types.

–sensors=SENSOR[,SENSOR
...]

The –sensor switch is used to select data from specific sensors. The parameter is a comma separated list of sensor names, sensor IDs (integers), and/or ranges of sensor IDs. Sensors are defined in the silk.conf(5) site configuration file, and the mapsid(1) command can be used to print a mapping of sensor names to IDs and classes. When the –sensor switch is not specified, the default is to use all sensors which are valid for the specified class(es).

–data-rootdir=PATH

This option causes rwfilter to use PATH as the root of the data store directory, which overrides the location given in the SILK_DATA_ROOTDIR environment variable, which overrides the location that was compiled into rwfilter. The default data store directory will be shown when the –version option is given.

–site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK_CONFIG_FILE environment variable is used if that variable is not empty. The value of SILK_CONFIG_FILE should include the name of the file. Otherwise, the application looks for a file named silk.conf in the following directories: the root of the data directory (see –data-rootdir); the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/; and the share/silk/ and share/ directories parallel to the application’s directory.

–print-missing-files

This option prints to the standard error file names that rwfilter’s file selection switches expected to find but did not. This switch is useful for debugging, but the list of files it produces can be misleading. For example, suppose there is a decommissioned sensor that still appears in the silk.conf file to permit retrieval of historical data; these data files will be missing even though their absence is expected. Use the output from this switch judiciously.

Partitioning Switches

rwfilter supports the following partitioning switches, at least one of which must be specified. The switches are AND’ed together; i.e., to pass the filter, the record must pass the test implied by each switch. Any record that does not pass will be sent to the fail-destination(s), if specified.

SWITCH PARAMETERS

The forms of the parameters to these partitioning switches are:

SWITCHES

The switches are:

–stime=DATE_RANGE

Pass the record if its starting time is in this DATE_RANGE.

–etime=DATE_RANGE

As –stime for the ending time.

–active-time=DATE_RANGE

Pass the record if the record was active at ANY time during this DATE_RANGE. If a single time is specified, pass the record if it was active at that instant.

–duration=DECIMAL_RANGE

Pass the record if its duration (eTime-sTime) is in this DECIMAL_RANGE. The DECIMAL_RANGE represents the time in seconds; use floating point numbers to specify millisecond ranges.

–sport=INTEGER_LIST

Pass the record if its source port is in this INTEGER_LIST, possible values are 0-65535.

–dport=INTEGER_LIST

Pass the record if its destination port is in this INTEGER_LIST, possible values are 0-65535

–aport=INTEGER_LIST

Pass the record if its source port and/or its destination port is in this INTEGER_LIST, possible values are 0-65535. For example, use –aport=25 to see all SMTP conversions regardless or where they originated.

–protocol=INTEGER_LIST

Pass the record if its IP Suite Protocol is in this INTEGER_LIST, possible values are 0-255.

–icmp-type=INTEGER_LIST

Pass the record if its ICMP (or ICMPv6) type is in this INTEGER_LIST; possible values 0-255. This switch will also verify that the flow’s protocol is 1 (or 58 if the flow is IPv6). It is an error to specify a –protocol that does not include 1 and/or 58.

–icmp-code=INTEGER_LIST

Pass the record if its ICMP (or ICMPv6) code is in this INTEGER_LIST; possible values 0-255. This switch will also verify that the flow’s protocol is 1 (or 58 if the flow is IPv6). It is an error to specify a –protocol that does not include 1 and/or 58.

–bytes=INTEGER_RANGE

Pass the record if its byte count is in this INTEGER_RANGE.

–packets=INTEGER_RANGE

Pass the record if its packet count is in this INTEGER_RANGE.

–bytes-per-packet=DECIMAL_RANGE

Pass the record if its average bytes per packet count (bytes/packet) is in this DECIMAL_RANGE.

–saddress=IP_ADDR_MASK

Pass the record if its source IP address is matched by this IP_ADDR_MASK. To match on multiple IPs, use an IPset (see –sipset).

–daddress=IP_ADDR_MASK

Pass the record if its destination IP address is matched by this IP_ADDR_MASK (see also –dipset).

–any-address=IP_ADDR_MASK

Pass the record if either its source or its destination IP address is matched by this IP_ADDR_MASK (see also –anyset). Does not consider the next-hop IP address.

–not-saddress=IP_ADDR_MASK

Pass the record if its source IP address is not matched by this IP_ADDR_MASK (see also –not-sipset).

–not-daddress=IP_ADDR_MASK

Pass the record if its destination IP address is not matched by this IP_ADDR_MASK (see also –not-dipset).

–not-any-address=IP_ADDR_MASK

Pass the record if neither its source nor its destination IP address is matched by this IP_ADDR_MASK (see also –not-anyset). Does not consider the next-hop IP address.

–sipset=IP_SET_FILENAME

Pass the record if its source IP address is in the list of IPs contained in the binary set file IP_SET_FILENAME

–dipset=IP_SET_FILENAME

As –sipset for the destination IP address.

–anyset=IP_SET_FILENAME

Pass the record if either its source IP address or its destination IP address is in the list of IPs contained in the binary set file IP_SET_FILENAME. Does not consider the next-hop IP.

–nhipset=IP_SET_FILENAME

As –sipset for the next-hop IP address.

–not-sipset=IP_SET_FILENAME

Pass the record if its source IP address is not in the list of IPs contained in the binary set file IP_SET_FILENAME

–not-dipset=IP_SET_FILENAME

As –not-sipset for the destination IP address.

–not-anyset=IP_SET_FILENAME

Pass the record if neither its source IP address nor its destination IP address is in the list of IPs contained in the binary set file IP_SET_FILENAME. Does not consider the next-hop IP.

–not-nhipset=IP_SET_FILENAME

As –not-sipset for the next-hop IP address.

–tcp-flags=TCP_FLAGS

Pass the record if, for any one of its packets, any of the specified TCP_FLAGS was on.

–flags-all=HIGH_MASK_FLAGS_LIST

HIGH_MASK_FLAGS_LIST is a comma separated list of up to 16 HIGH_FLAGS/MASK_FLAGS pairs, where HIGH_FLAGS and MASK_FLAGS are lists of TCP_FLAGS. HIGH_FLAGS must be a subset of MASK_FLAGS. Pass the record if the flags listed in HIGH_FLAGS are set and the flags listed in MASK_FLAGS but not listed in HIGH_FLAGS are not-set. This switch accepts a list of values, so that --flags-all=S/S,A/A will pass flows that have either only-SYN high or only-ACK high.

–fin-flag=SCALAR

Set to 0, only passes records where the FIN Flag is Low, Set to 1, only passes records where the FIN Flag is high.

–syn-flag=SCALAR

As –fin-flag except for the SYN Flag

–rst-flag=SCALAR

As –fin-flag except for the RST Flag

–psh-flag=SCALAR

As –fin-flag except for the PSH Flag

–ack-flag=SCALAR

As –fin-flag except for the ACK Flag

–urg-flag=SCALAR

As –fin-flag except for the URG Flag

–ece-flag=SCALAR

As –fin-flag except for the ECE Flag

–cwr-flag=SCALAR

As –fin-flag except for the CWR Flag

–tuple-file=TUPLE_FILENAME

This switch provides support for partitioning by arbitrary subsets of the basic five-tuple:

 {source-ip,destination-ip,source-port,destination-ip-port,protocol}

A SiLK Flow record will pass the test when the record’s fields match one of the tuples; if the SiLK record does not match any tuple, the record fails. The tuples are read from the text file TUPLE_FILENAME which must contain lines of delimited fields. The default delimiter is |, but may be specified with the –tuple-delimiter switch. Each field contains one member of the tuple; the fields may appear in any order. The fields may represent any subset of the five-tuple, but each line in the file must define the same subset. A field that is present but has no value will generate an error. If you want the field to match any value, it is best that you not include that field in your input.

In addition to the tuple-lines, TUPLE_FILENAME may contain blank lines and comments (which begin with # and continue to the end of the line). The first line of TUPLE_FILENAME may contain a title labeling the fields in the file. This title line will be ignored when the –tuple-fields switch is given.

The IP fields may contain an IPv4 address, an integer, or a IP in CIDR block notation. Comma-separated lists (80,443) and ranges (0-1023,8080) are supported for the ports and protocol fields. NOTE: Currently the code is not clever in its support for CIDR notation and ranges in that each occurrence is fully expanded. When this occurs, the memory required to hold the search tree will quickly grow.

–tuple-fields=FIELDS

FIELDS contains the list of fields (columns) to parse from the TUPLE_FILENAME in the order in which they appear in the file. When this switch is not provided, rwfilter will treat the first line in TUPLE_FILENAME as a title line and attempt to determine the fields (a la rwtuc(1)); rwfilter will exit if it cannot determine the fields.

FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Names can be abbreviated to their shortest unique prefix. The field names and their descriptions are:

sIP,sip,1

source IP address

dIP,dip,2

destination IP address

sPort,sport,3

source port

dPort,dport,4

destination port

protocol,5

IP protocol

–tuple-direction=DIRECTION

Allows you to change the comparison between the tuple and the SiLK Flow record. This switch allows one to look for traffic in the reverse direction (or both directions) without having to write all of the rules twice. The available directions are:

forward

The tuple’s fields are compared against the corresponding fields on the flow; that is, sIP is compared with sIP, dIP with dIP, sPort with sPort, dPort with dPort, and protocol with protocol. This is the default.

reverse

The tuple’s fields are compared against the opposite fields on the flow; that is, sIP is compared with dIP, dIP with sIP, sPort with dPort, dPort with sPort, and protocol with protocol.

both

Both of the above comparisons are performed.

–tuple-delimiter=CHAR

Specifies the character separating the input fields. When the switch is not provided, the default of | is used.

–ippair-any=FILENAME

Pass the record if the source IP and destination IP (in either order) match one of the IP-pairs listed in the text file FILENAME. Each line of FILENAME should contain two IP addresses separated by whitespace. This switch is equivalent to –tuple-file=FILENAME –tuple-fields=sIP,dIP –tuple-direction=both –tuple-delimiter=’ ’. You cannot use this switch in conjunction with –tuple-file or –ipport-any. This switch is deprecated and it exists for backward compatibility only; it may be removed in a future release.

–ipport-any=FILENAME

Pass the record if either the source IP and port pair or the destination IP and port pair are listed in the text file FILENAME. Each line in FILENAME should contain an IP address and port list of interest for that IP separated by whitespace. This switch is equivalent to –tuple-file=FILENAME –tuple-fields=sIP,sPort –tuple-direction=both –tuple-delimiter=’ ’. You cannot use this switch in conjunction with –tuple-file or –ippair-any. This switch is deprecated and it exists for backward compatibility only; it may be removed in a future release.

–plugin=PLUGIN

Augment the partitioning switches by using run-time loading of the plug-in (shared object) whose path is PLUGIN. The switch may be repeated to load multiple plug-ins. The creation of plug-ins is beyond the scope of this manual page; the process is described in Analysts’ Handbook: Using SiLK for Network Traffic Analysis. When multiple Partitioning Switches are given, the code specified by the –plugin switch(es) will be last to be invoked. When PLUGIN contains a slash (/), rwfilter assumes the path to PLUGIN is correct. Otherwise, rwfilter will attempt to find the file in $SILK_PATH/lib/silk, $SILK_PATH/share/lib, $SILK_PATH/lib, and in these directories parallel to the application’s directory: lib/silk, share/lib, and lib. If rwfilter does not find the file, it assumes the plug-in is in the current directory. To force rwfilter to look in the current directory first, specify –plugin=./PLUGIN. When the SILK_PLUGIN_DEBUG environment variable is non-empty, rwfilter prints status messages to the standard error as it tries to open each of its plug-ins.

–dynamic-library=PLUGIN

This switch is deprecated. It is an alias for –plugin.

SiLK can store flows generated by enhanced collection software that provides more information than NetFlow v5. These flows may support some or all of these additional switches; for flows without this additional information, the field’s value is always 0.

–flags-initial=HIGH_MASK_FLAGS_LIST

As –flags-all, except this switch considers only the initial packet in the flow.

–flags-session=HIGH_MASK_FLAGS_LIST

As –flags-all, except this switch ignores the initial packet in the flow.

–attributes=ATTRIBUTES_LIST

ATTRIBUTES_LIST is a comma separated list of up to 8 HIGH_ATTRIBUTES/MASK_ATTRIBUTES pairs, where HIGH_ATTRIBUTES and MASK_ATTRIBUTES is a string of the ATTRIBUTE characters F,T,C; see above for a description of these values. HIGH_ATTRIBUTES must be a subset of MASK_ATTRIBUTES. Pass the record if the attributes listed in HIGH_ATTRIBUTES are set and the attributes listed in MASK_ATTRIBUTES but not listed in HIGH_ATTRIBUTES are not-set.

–application=INTEGER_LIST

Some software that generates flow records from packet data, such as yaf(1), will inspect the contents of the packets that make up a flow and use traffic signatures to label the content of the flow. SiLK calls this label the application; yaf refers to it as the appLabel. The application is the port number that is traditionally used for that type of traffic (see the /etc/services file on most UNIX systems). For example, traffic that the flow generator recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard HTTP/web port (80). The flow generator uses a value for 0 if the application cannot be determined. The –application switch passes the flow if the flow’s application value is in the specified INTEGER_LIST. For example, passing a value of 21 to this switch will find traffic that the flow generation software labeled as FTP regardless of which port the traffic actually used.

–ip-version=INTEGER_LIST

Passes the flow if the IP Version is in the specified INTEGER_LIST. INTEGER_LIST can be 4, 6, or 4,6 when SiLK has been compiled with IPv6 support. If SiLK does not have IPv6 support, the only legal value for this switch is 4.

–scc=COUNTRY_CODE_LIST

Pass the record if the country code of its source IP address is in the specified COUNTRY_CODE_LIST. This switch requires that the country code mapping file is installed. See ccfilter(3).

–dcc=COUNTRY_CODE_LIST

As –scc for the destination IP address.

For the following three filter tests, some file formats do not store these values, in which case the value is always 0:

–next-hop-id=IP_ADDR_MASK

Pass the record if its next hop IP address is matched by this IP_ADDR_MASK.

–not-next-hop-id=IP_ADDR_MASK

Pass the record if its next hop IP address is not matched by this IP_ADDR_MASK.

–input-index=INTEGER_LIST

Pass the record if its incoming SNMP interface is in this INTEGER_LIST.

–output-index=INTEGER_LIST

Pass the record if its outgoing SNMP interface is in this INTEGER_LIST.

Additional filtering switches are provided by run-time loading of plug-ins (shared object files or dynamic libraries) when the plug-in is available. rwfilter automatically looks for the following plug-ins:

ADDRESS TYPE (addrtype.so)

–stype=SCALAR

When SCALAR is 0, pass the record if its source IP address is non-routable. When 1, pass if internal. When 2, pass if external (i.e., routable but not internal). When 3, pass if not internal (non-routable or external). See addrtype(3).

–dtype=SCALAR

As –stype for the destination IP address.

PREFIX MAP (pmapfilter.so)

–pmap-file=MAPNAME:PATH
–pmap-file=PATH

When the prefix map plug-in is used, rwfilter reads the mapping file located at PATH. When MAPNAME is provided, it will be used to refer to the switches specific to that prefix map. If MAPNAME is not provided, rwfilter will check the prefix map file to see if a map-name was specified when the file was created. Using multiple –prefix-map switches allows additional prefix map files to be read as long as each uses a unique map-name. The –pmap-file switch(es) must precede all other –pmap-* switches. For more information, see pmapfilter(3).

–pmap-src-MAPNAME=LABELS

If the prefix map associated with MAPNAME is an IP prefix map, this matches records with a source IPv4 address that maps to a label contained in the list of labels in LABELS.

If the prefix map associated with MAPNAME is a proto-port prefix map, this matches records with a protocol and source port combination that maps to a label contained in the list of labels in LABELS.

–pmap-dst-MAPNAME=LABELS

Similar to –pmap-src-MAPNAME, but uses the destination IP or the protocol and destination port.

–pmap-any-MAPNAME=LABELS

If the prefix map associated with MAPNAME is an IP prefix map, this matches records with a source IP address or a destination IP address that maps to a label contained in the list of labels in LABELS.

If the prefix map associated with MAPNAME is a port/protocol prefix map, this matches records with a protocol and source port or destination port combination that maps to a label contained in the list of labels in LABELS.

–pmap-saddress=LABELS
–pmap-daddress=LABELS
–pmap-any-address=LABELS

These are deprecated switches created by pmapfilter that correspond to –pamp-src-MAPNAME, –pmap-dst-MAPNAME, and –pmap-any-MAPNAME, respectively. These switches are available when an IP prefix map is used that is not associated with a MAPNAME.

–pmap-sport-proto=LABELS
–pmap-dport-proto=LABELS
–pmap-any-port-proto=LABELS

These are deprecated switches created by pmapfilter that correspond to –pamp-src-MAPNAME, –pmap-dst-MAPNAME, and –pmap-any-MAPNAME, respectively. These switches are available when a proto-port prefix map is used that is not associated with a MAPNAME.

PYTHON (silkpython.so)

The SiLK Python plug-in provides support for filtering by expressions or complex functions written in the Python programming language. See the silkpython(3) and pysilk(3) manual pages for information and examples for how to use Python to manipulate SiLK data structures. When multiple Partitioning Switches are given, the Python plug-in will be the next-to-last to be invoked. Only the code specified by the –plugin switch is called after the Python code.

–python-file=FILENAME

Pass the record if the result of the processing the flow with the function named rwfilter() in FILENAME is true. The function should take a single silk.RWRec object as an argument. See silkpython(3) for details.

–python-expr=PYTHON_EXPRESSION

Pass the record if the result of the processing the flow with the specified PYTHON_EXPRESSION is true. The expression is evaluated as if it appeared in the following context:

 from silk import *  
 def rwfilter(rec):  
     return (PYTHON_EXPRESSION)

EXAMPLES

The most basic filtering involves looking at specific traffic over a specific time. For example:

  rwfilter --start-date=2003/02/19:00 --end-date=2003/02/19:23 \  
        --pass=alltcp.rwf --proto=6

will create a file, alltcp.rwf containing all TCP traffic. This file contains SiLK Flow data in a binary format. To examine the contents, use the command rwcut(1).

Please note that the output file described above could be extremely large.

Once a file is written, rwfilter can filter the file again, for example:

  rwfilter --aport=80 alltcp.rwf --pass=allweb.rwf

will generate allweb.rwf. This progressive filtering can also be done at the command line, but the interim files can be examined with rwcut, rwuniq(1) and other tools.

Multiple filters can be chained at the command line using pipes:

  rwfilter --start-date=2003/02/19:00 --end-date=2003/02/19:23 \  
        --proto=6 --pass=stdout | \  
        rwfilter --input-pipe=stdin --aport=80 --packets=1-5 \  
        --pass=smallweb.rwf

ENVIRONMENT

SILK_RWFILTER_THREADS

The number of threads to use while reading input files or files selected from the data store.

PYTHONPATH

This environment variable is used by Python to locate modules. When –python-file or –python-expr is specified, rwfilter loads Python which in turn loads the PySiLK module which is comprised of several files (silk/pysilk_nl.so, silk/__init__.py, etc). If this silk/ directory is located outside Python’s normal search path (for example, in the SiLK installation tree), it may be necessary to set or modify the PYTHONPATH environment variable to include the parent directory of silk/ so that Python can find the PySiLK module. For information on using Python from within rwfilter, see pysilk(3).

SILK_PYTHON_TRACEBACK

When set, Python plug-ins will output traceback information on Python errors to stderr.

SILK_COUNTRY_CODES

This environment variable allows the user to specify the country code mapping file that the –scc and –dcc switches use. The value may be a complete path or a file relative to the SILK_PATH. If the variable is not specified, the code looks for a file named country_codes.pmap in the location specified by SILK_PATH.

SILK_CONFIG_FILE

This environment variable is used as the value for the –site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

When set, overrides the compiled-in value for the location of the directory tree containing the files of SiLK Flow records collected and stored by the packing system (rwflowpack(8)). In addition, when the –site-config-file switch is not provided and the SILK_CONFIG_FILE environment variable is not set, rwfilter looks for the site configuration file in $SILK_DATA_ROOTDIR/silk.conf.

SILK_PATH

This environment variable gives the root of the install tree. As part of its search for the SiLK site configuration file, rwfilter checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share. These directories are also searched when any other configuration file is required (e.g., the country code map). In addition, rwfilter looks for plug-ins in $SILK_PATH/lib/silk, $SILK_PATH/share/lib and $SILK_PATH/lib.

SILK_PLUGIN_DEBUG

When set to 1, rwfilter prints status messages to the standard error as it tries to open each of its plug-ins.

SILK_LOGSTATS

When set to a non-empty value, rwfilter will treat the value as the path to an external program to execute with information about this rwfilter invocation. If the value in SILK_LOGSTATS does not contain a slash or if it references a file that does not exist, is not a regular file, or is not executable, the SILK_LOGSTATS value is silently ignored. The arguments to the external program are:

SILK_LOGSTATS_RWFILTER

If set, this environment variable overrides the value specified in SILK_LOGSTATS.

SILK_LOGSTATS_DEBUG

If the environment variable is set to a non-empty value, rwfilter will print messages to the standard error about the SILK_LOGSTATS value being used and either the reason why the value cannot be used or the arguments to the external program being executed.

NOTES

rwfilter is the most commonly used application in the suite. It provides access to the data files and performs all the basic queries.

rwfilter supports a variety of I/O options - in addition to reading from the data store, rwfilter results can be chained together with named pipes to output results to multiple files simultaneously. An introduction to named pipes is outside the scope of this document, however.

Two often underused options are –dry-run and –print-statistics

–dry-run does a sanity check on the input arguments and should be used, especially for complicated arguments, to check that the arguments are acceptable.

–print-statistics used without –pass-destination or –fail-destination simply dumps aggregate statistics to stderr (not stdout) in the following format:

  File <#input files> Read <# of recs read> \  
  Pass <# of recs passing the filter> \  
  Fail <# of recs failing the filter>

and can be used to do a quick pass through the data to get aggregate counts before going in deeper into the phenomenon being investigated.

–print-filename can be used as a progress meter; during long jobs, it shows which file is currently being read by the application. –print-filename will not provide meaningful results with piped input.

Filters are applied in the order given on the command line. It is best to apply the biggest filters first.

The switches used to create a filter output file are stored in the file itself. Use the rwfileinfo(1) command to see this information.

SEE ALSO

rwcount(1), rwcut(1), rwfglob(1), rwfileinfo(1), rwset(1), rwsort(1), rwstats(1), rwtotal(1), rwuniq(1), rwtuc(1), rwsetbuild(1), mapsid(1), addrtype(3), ccfilter(3), pmapfilter(3), pysilk(3), silkpython(3), silk.conf(5), silk(7), rwflowpack(8), yaf(1), zlib(3), Analysts’ Handbook: Using SiLK for Network Traffic Analysis

rwgeoip2ccmap

Create a country code prefix map from a GeoIP data file

SYNOPSIS

  unzip -p GeoIPCountryCSV.zip | \  
      rwgeoip2ccmap --csv-input > country_codes.pmap

  gzip -d -c GeoIP.dat.gz | \  
      rwgeoip2ccmap --encoded-input > country_codes.pmap

DESCRIPTION

Prefix maps provide a way to map field values to string labels based on a user-defined map file. The country code prefix map, typically named country_codes.pmap, is a special prefix map that maps an IP address to a two-letter country code. It uses the country codes defined by the Internet Assigned Numbers Authority (http://www.iana.org/root-whois/index.html).

The country code prefix map is used by the ccfilter(3) plug-in to partition by, count by, sort by, and display the country code in SiLK Flow files. The rwip2cc(1) command can use the map file to display the country code for textual IP addresses.

The country code prefix map is based on the GeoIP Country(R) or free GeoLite database created by MaxMind(R) and available from http://www.maxmind.com/. The GeoLite database is a free evaluation copy that is 98% accurate which is updated monthly. MaxMind sells the GeoIP Country database which has over 99% accuracy and is updated weekly.

The database comes in two forms:

GeoIPCountryCSV.zip

as a compressed (zip) textual file containing the IP range, country name, and county code in a comma separated value (CSV) form

GeoIP.dat.gz

as a compressed (gzip) binary file containing an encoded form of the IP address range and country code

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as –arg=param or –arg param, though the first form is required for options that take optional parameters.

One of the following switches is required:

–csv-input

Treat the standard input as a textual stream containing the CSV (comma separated value) GeoIP country code data.

–encoded-input

Treat the standard input as a binary stream the encoded GeoIP country code data.

EXAMPLES

Obtain your copy of the MaxMind GeoIP Country database, either the comma separated value version or the binary version (GeoIP.dat.gz). To create the country_codes.pmap data file, run

Once you have created the country_codes.pmap file, you will need to copy it to $SILK_PATH/share/silk/country_codes.pmap so that the ccfilter plug-in will use it.

SEE ALSO

ccfilter(3), rwip2cc(1)

rwgroup

Tag similar SiLK records with a common next hop IP value

SYNOPSIS

  rwgroup  
        {--id-fields=KEY | --delta-field=FIELD --delta-value=DELTA}  
        [--objective] [--summarize] [--plugin=PLUGIN]  
        [--rec-threshold=THRESHOLD] [--group-offset=IP]  
        [--note-add=TEXT] [--note-file-add=FILE] [--output-path=PATH]  
        [--copy-input=PATH] [--compression-method=COMP_METHOD]  
        [--site-config-file=FILENAME] [--python-file=PATH ...]  
        [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]  
        [FILE]

  rwgroup [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]  
        [--plugin=PLUGIN ...] [--python-file=PATH ...] --help

  rwgroup --version

DESCRIPTION

rwgroup reads sorted SiLK Flow records (c.f. rwsort(1)) from the standard input or from a single file name listed on the command line, marks records that form a group with an identifier in the Next Hop IP field, and prints the binary SiLK Flow records to the standard output. In some ways rwgroup is similar to rwuniq(1), but rwgroup writes SiLK flow records instead of textual output.

Two SiLK records are defined as being in the same group when the fields specified in the –id-fields switch match exactly and when the field listed in the –delta-field matches within the value given by the –delta-value switch. Either –id-fields or –delta-fields is required; both may be specified. A –delta-value must be given when –delta-fields is present.

The records that make up the first group will have the value 0 written into their Next Hop IP field. Each subsequent group will value their Next Hop IP value incremented by 1. The –group-offset switch will change the initial group’s Next Hop IP value.

The –rec-threshold switch may be used to only print groups that contain a certain number of records. The –summarize switch attempts to merge records in the same group to a single output record.

rwgroup requires that the records are sorted on the fields listed in the –id-fields and –delta-fields switches. For example, a call using

  rwgroup --id-field=2 --delta-field=9 --delta-value=3

should read the output of

  rwsort --field=2,9

otherwise the results are unpredictable.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as –arg=param or –arg param, though the first form is required for options that take optional parameters.

At least one value for –id-field or –delta-field must be provided; rwgroup will terminate with an error if no fields are specified.

–id-fields=KEY

KEY contains the list of flow attributes (a.k.a. fields or columns) that must match exactly for flows to be considered part of the same group. Each field may be specified once only. KEY is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Field-names are case insensitive. Example:

 --id-fields=stime,10,1-5

There is no default value for the –id-fields switch.

The complete list of built-in fields that the SiLK tool suite supports follows, though note that not all fields are present in all SiLK file formats; when a field is not present, its value is 0.

sIP,1

source IP address

dIP,2

destination IP address

sPort,3

source port for TCP and UDP, or equivalent

dPort,4

destination port for TCP and UDP, or equivalent

protocol,5

IP protocol

packets,pkts,6

packet count

bytes,7

byte count

flags,8

bit-wise OR of TCP flags over all packets

sTime,9

starting time of flow (seconds resolution)

dur,10

duration of flow (seconds resolution)

eTime,11

end time of flow (seconds resolution)

sensor,12

name or ID of sensor at the collection point

class,20

class of sensor at the collection point

type,21

type of sensor at the collection point

icmpTypeCode,25

the ICMP type and code

Many SiLK file formats do not store the following fields and their values will always be 0; they are listed here for completeness:

in,13

router SNMP input interface

out,14

router SNMP output interface

SiLK can store flows generated by enhanced collection software that provides more information than NetFlow v5. These flows may support some or all of these additional fields; for flows without this additional information, the field’s value is always 0.

initialFlags,26

TCP flags on first packet in the flow

sessionFlags,27

bit-wise OR of TCP flags over all packets except the first in the flow

attributes,28

flow attributes set by the flow generator:

F

flow generator saw additional packets in this flow following a packet with a FIN flag (excluding ACK packets)

T

flow generator prematurely created a record for a long-running connection due to a timeout. (When the flow generator yaf(1) is run with the –silk switch, it will prematurely create a flow and mark it with T if the byte count of the flow cannot be stored in a 32-bit value.)

C

flow generator created this flow as a continuation of long-running connection, where the previous flow for this connection met a timeout (or a byte threshold in the case of yaf).

Consider a long-running ssh session that exceeds the flow generator’s active timeout. (This is the active timeout since the flow generator creates a flow for a connection that still has activity). The flow generator will create multiple flow records for this ssh session, each spanning some portion of the total session. The first flow record will be marked with a T indicating that it hit the timeout. The second through next-to-last records will be marked with TC indicating that this flow both timed out and is a continuation of a flow that timed out. The final flow will be marked with a C, indicating that it was created as a continuation of an active flow.

application,29

guess as to the content the flow. Some software that generates flow records from packet data, such as yaf, will inspect the contents of the packets that make up a flow and use traffic signatures to label the content of the flow. SiLK calls this label the application; yaf refers to it as the appLabel. The application is the port number that is traditionally used for that type of traffic (see the /etc/services file on most UNIX systems). For example, traffic that the flow generator recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard HTTP/web port (80).

The list of built-in fields may be augmented by run-time loading of plug-ins (shared object files or dynamic libraries) when the plug-in is available. rwgroup automatically looks for the following plug-ins:

ADDRESS TYPE (addrtype.so)

stype,16

categorize the source IP address as non-routable, internal, or external and group based on the category. See addrtype(3).

dtype,17

as stype for the destination IP address

COUNTRY CODE (ccfilter.so)

scc,18

the country code of the source IP address. See ccfilter(3).

dcc,19

as scc for the destination IP

PREFIX MAP (pmapfilter.so)

src-MAPNAME

value determined by passing the source IP or the protocol/source-port to the user-defined mapping defined in the prefix map associated with MAPNAME. See the description of the –pmap-file switch and the pmapfilter(3) manual page.

dst-MAPNAME

as src-MAPNAME for the destination IP or protocol/destination-port.

sval
dval

These are deprecated field names created by pmapfilter that correspond to src-MAPNAME and dst-MAPNAME, respectively. These fields are available when a prefix map is used that is not associated with a MAPNAME.

–delta-field=FIELD

Specify a single field that can differ by a specified delta-value among the SiLK records that make up a group. The FIELD identifiers include most of those specified for –id-fields. The exceptions are that plug-in fields are not supported, nor are fields that do not have numeric values (e.g., class, type, flags). The most common value for this switch is stime, which allows records that are identical in the id-fields but temporally far apart to be in different groups. The switch takes a single argument; multiple delta fields cannot be specified. When this switch is specified, the –delta-value switch is required.

–delta-value=DELTA_VALUE

Specify the acceptable difference between the values of the –delta-field. The –delta-value switch is required when the –delta-field switch is provided. For fields other than those holding IPs, when two consecutive records have values less than or equal to DELTA_VALUE, the records are considered members of the same group. When the delta-field refers to an IP field, DELTA_VALUE is the number of least significant bits of the IPs to remove before comparing them. For example, when –delta-field=sIP –delta-value=8 is specified, two records are the same group if their source IPv4 addresses belong to the same /24 or if their source IPv6 addresses belong to the same /120. The –objective switch affects the meaning of this switch.

–objective

Change the behavior of the –delta-value switch so that a record is considered part of a group if the value of its –delta-field is within the DELTA_VALUE of the first record in the group. (When this switch is not specified, consecutive records are compared.)

–summarize

Cause rwgroup to print (typically) a single record for each group. By default, all records in each group having at least –rec-threshold members is printed. When –summarize is active, the record that is written for the group is the first record in the group with the following modifications:

Note that multiple records for a group may be printed if the bytes, packets, or elapsed time values are too large to be stored in a SiLK flow record.

–plugin=PLUGIN

Augment the list of fields by using run-time loading of the plug-in (shared object) whose path is PLUGIN. The creation of these plug-ins is beyond the scope of this manual page. When PLUGIN contains a slash (/), rwgroup assumes the path to PLUGIN is correct. Otherwise, rwgroup will attempt to find the file in $SILK_PATH/lib/silk, $SILK_PATH/share/lib, $SILK_PATH/lib, and in these directories parallel to the application’s directory: lib/silk, share/lib, and lib. If rwgroup does not find the file, it assumes the plug-in is in the current directory. To force rwgroup to look in the current directory first, specify –plugin=./PLUGIN. When the SILK_PLUGIN_DEBUG environment variable is non-empty, rwgroup prints status messages to the standard error as it tries to open each of its plug-ins.

–rec-threshold=THRESHOLD

Specify the minimum number of SiLK records a group must contain before the records in the group are written to the output stream. The default is 1; i.e., write all records. The maximum threshold is 65535.

–group-offset=IP

Specify the value to write into the Next Hop IP for the records that comprise the first group. The value IP may be an integer, or an IPv4 or IPv6 address in the canonical presenation form. If not specified, counting begins at 0. The value for each subsequent group is incremented by 1.

–note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

–note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

–copy-input=PATH

Copy all binary input to the specified file or named pipe. PATH can be stdout to print flows to the standard output as long as the –output-path switch has been used to redirect rwgroup’s output.

–output-path=PATH

Determines where the output of rwgroup is written. If this option is not given, output is written to the standard output.

–compression-method=COMP_METHOD

Set the compression method of the output to COMP_METHOD. Some SiLK tools can use an external library to compress their binary output. The list of available compression methods and the default method are set when SiLK is compiled (the –help and –version switches print the available and default compression methods) and depend on which supported libraries are found. SiLK can support:

none

Do not compress the output using an external library

zlib

Use the zlib(3) library for compressing the output

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression

best

Use whichever available method gives the best compression in general, though not necessarily the best for this particular output.

–site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK_CONFIG_FILE environment variable is used if that variable is not empty. The value of SILK_CONFIG_FILE should include the name of the file. Otherwise, the application looks for a file named silk.conf in the following directories: the directory specified in the SILK_DATA_ROOTDIR environment variable; the data root directory that is compiled into SiLK (use the –version switch to view this value); the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/; and the share/silk/ and share/ directories parallel to the application’s directory.

–help

Print the available options and exit. Options that add fields can be specified before –help so that the new options appear in the output.

–version

Print the version number and information about how SiLK was configured, then exit the application.

–pmap-file=MAPNAME:PATH
–pmap-file=PATH

When the prefix map plug-in is used, rwgroup reads the mapping file located at PATH. When MAPNAME is provided, it will be used to refer to the fields specific to that prefix map. If MAPNAME is not provided, rwgroup will check the prefix map file to see if a map-name was specified when the file was created. Using multiple –prefix-map switches allows additional prefix map files to be read as long as each uses a unique map-name. For more information, see pmapfilter(3).

–python-file=PATH

When the SiLK Python plug-in is used, rwgroup reads the Python code from the file PATH to define additional fields that can be used as part of the group key. This file should call register_plugin_field() for each field it wishes to define. For details and examples, see the silkpython(3) and pysilk(3) manual pages.

LIMITATIONS

rwgroup requires sorted data. The application works by comparing records in the order that the records are received (similar to the UNIX uniq(1) command), odd orders will produce odd groupings.

EXAMPLES

As a rule of thumb, the –id-fields and –delta-field parameters should match rwsort(1)’s call, with –delta-field being the last parameter. A call to group all web traffic by queries from the same addresses (field=2) within 10 seconds (field=9) of the first query from that address will be:

  rwfilter --proto=6 --dport=80 --pass=stdout | \  
        rwsort --field=2,9 | \  
        rwgroup --id-field=2 --delta-field=9 --delta-value=10  
        --objective

ENVIRONMENT

PYTHONPATH

This environment variable is used by Python to locate modules. When –python-file is specified, rwgroup loads Python which in turn loads the PySiLK module which is comprised of several files (silk/pysilk_nl.so, silk/__init__.py, etc). If this silk/ directory is located outside Python’s normal search path (for example, in the SiLK installation tree), it may be necessary to set or modify the PYTHONPATH environment variable to include the parent directory of silk/ so that Python can find the PySiLK module.

SILK_PYTHON_TRACEBACK

When set, Python plug-ins will output traceback information on Python errors to the standard error.

SILK_COUNTRY_CODES

This environment variable allows the user to specify the country code mapping file that the ccfilter(3) plug-in will use. The value may be a complete path or a file relative to the SILK_PATH. If the variable is not specified, the code looks for a file named country_codes.pmap in the location specified by SILK_PATH.

SILK_CONFIG_FILE

This environment variable is used as the value for the –site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

When the –site-config-file switch is not provided and the SILK_CONFIG_FILE environment variable is not set, rwgroup looks for the site configuration file in $SILK_DATA_ROOTDIR/silk.conf.

SILK_PATH

This environment variable gives the root of the install tree. As part of its search for the SiLK site configuration file, rwgroup checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share. These directories are also searched when any other configuration file is required (e.g., the country code map). In addition, rwgroup looks for plug-ins in $SILK_PATH/lib/silk, $SILK_PATH/share/lib and $SILK_PATH/lib.

SILK_PLUGIN_DEBUG

When set to 1, rwgroup prints status messages to the standard error as it tries to open each of its plug-ins.

SEE ALSO

rwfilter(1), rwfileinfo(1), rwsort(1), rwuniq(1), addrtype(3), ccfilter(3), pmapfilter(3), silkpython(3), pysilk(3), uniq(1), yaf(1), zlib(3)

rwidsquery

Invoke rwfilter to find flows matching Snort signatures

SYNOPSIS

 rwidsquery --intype=INPUT_TYPE  
        [--output-file=OUTPUT_FILE]  
        [--start-date=YYYY/MM/DD[:HH] [--end-date=YYYY/MM/DD[:HH]]]  
        [--year=YEAR] [--tolerance=SECONDS]  
        [--config-file=CONFIG_FILE]  
        [--mask=PREDICATE_LIST]  
        [--verbose] [--dry-run]  
        [INPUT_FILE | -]  
        [-- EXTRA_RWFILTER_ARGS...]

  rwidsquery --help

  rwidsquery --version

DESCRIPTION

rwidsquery facilitates selection of SiLK flow records that correspond to Snort IDS alerts and signatures. rwidsquery takes as input either a snort alert log or rule file, analyzes the alert or rule contents, and invokes rwfilter(1) with the appropriate arguments to retrieve flow records that match attributes of the input file. rwidsquery will process the Snort rules or alerts from a single file named on the command line; if no file name is given, rwidsquery will attempt to read the Snort rules or alerts from the standard input, unless the standand input is connected to a terminal. An input file name of - or stdin will force rwidsquery to read from the standard input, even when the standard input is a terminal.

OPTIONS

In addition to the options listed below, you can pass extra options through to rwfilter(1) on the rwidsquery command line. The syntax for doing so is to place a double-hyphen (–) sequence after all valid rwidsquery options, and before all of the options you wish to pass through to rwfilter.

–intype=INPUT_TYPE

Specify the type of input contained in the input file. This switch is required. Two alert formats and one rule format are currently supported. Valid values for this option are:

fast

Input is a Snort ”fast” log file entry. Alerts are written in this format when Snort is configured with the snort_fast output module enabled. snort_fast alerts resemble the following:

    Jan  1 01:23:45 hostname snort[1976]: [1:1416:11] ...

full

Input is a Snort ”full” log file entry. Alerts are written in this format when Snort is configured with the snort_full output module enabled. snort_full alerts look like the following example:

    [**] [116:151:1] (snort decoder) Bad Traffic  ...

rule

Input is a Snort rule (signature). For example:

    alert tcp $EXTERNAL_NET any -> $HOME_NET any ...

–output-file=OUTPUT_FILE

Specify the output file that flows will be written to. If not specified, the default is to write to stdout. The argument to this option becomes the argument to rwfilter’s –pass switch.

–start-date=YYYY/MM/DD[:HH]
–end-date=YYYY/MM/DD[:HH]

Used in conjunction with rule file input only. The date predicates indicate which time to start and end the search. See the rwfilter(1) manual page for details of the date format.

–year=YEAR

Used in conjunction with alert file input only. Timestamps in Snort alert files do not contain year information. By default, the current calendar year is used, but this option can be used to override this default behavior.

–tolerance=SECONDS

Used in conjunction with alert file input only. This option is provided to compensate for timing differences between the timestamps in Snort alerts and the start/end time of the corresponding flows. The default –tolerance value is 3600 seconds, which means that flow records +/- one hour from the alert timestamp will be searched.

–config-file=CONFIG_FILE

Used in conjunction with rule file input only. Snort requires a configuration file which, among other things, contains variables that can be used in Snort rule definitions. This option allows you to specify the location of this configuration file so that IP addresses, port numbers, and other information from the snort configuration file can be used to find matching flows.

–mask=PREDICATE_LIST

Exclude the rwfilter predicates named in PREDICATE_LIST from the selection criteria. This option is provided to widen the scope of queries by making them more general than the Snort rule or alert provided. For instance, –mask=dport will return flows with any destination port, not just those which match the input Snort alert or rule.

–verbose

Print the resulting rwfilter(1) command on stderr prior to invoking it.

–dry-run

Print the resulting rwfilter(1) command on stderr but do not actually run it.

–help

Print the available options and exit.

–version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

To find SiLK flows matching a Snort alert in snort_fast format:

  $ rwidsquery --intype fast --year 2007 --tolerance 300 alert.fast.txt

For the following Snort alert:

    Nov  15 00:00:58 hostname snort[5214]: [1:1416:11]  
    SNMP broadcast trap [Classification: Attempted Information Leak]  
    [Priority: 2]: {TCP}  
    192.168.0.1:4161 -> 127.0.0.1:139

The resulting rwfilter(1) command would look similar to:

  rwfilter --start-date=2007/11/14:23 --end-date=2007/11/15:00 \  
    --stime=2007/11/14:23:55:58-2007/11/15:00:05:58 \  
    --saddress=192.168.0.1 --sport=4161 --daddress=127.0.0.1 \  
    --dport=139 --protocol=6 --pass=stdout

If you want to find flows matching the same criteria, except you want UDP flows instead of TCP flows, use the following syntax:

  $ rwidsquery --intype fast --year 2007 --tolerance 300 \  
      --mask protocol alert.fast.txt -- --protocol=17

which would yield the following rwfilter command line:

  $ rwfilter --start-date=2007/11/14:23 --end-date=2007/11/15:00 \  
      --stime=2007/11/14:23:55:58-2007/11/15:00:05:58 \  
      --saddress=192.168.0.1 --sport=4161 --daddress=127.0.0.1 \  
      --dport=139 --protocol=17 --pass=stdout

To find SiLK flows matching a Snort rule:

  $ rwidsquery --intype rule --start 2008/02/20:00 --end 2008/02/20:02 \  
      -c /opt/local/etc/snort/snort.conf -v rule.txt

For the following Snort rule:

  alert icmp $EXTERNAL_NET any -> $HOME_NET any  
  (msg:"ICMP Parameter Problem Bad Length"; icode:2; itype:12;  
  classtype:misc-activity; sid:425; rev:6;)

The resulting rwfilter(1) command would look similar to:

  rwfilter --start-date=2008/02/20:00 --end-date=2008/02/20:02 \  
    --stime=2008/02/20:00-2008/02/20:02 --sipset=/tmp/tmpeKIPn2.set  
    --icmp-code=2 --icmp-type=12 --pass=stdout

SEE ALSO

snort(8), rwfilter(1)

rwip2cc

Maps IP addresses to country codes

SYNOPSIS

  rwip2cc { --address=IP_ADDRESS | --input-file=FILE }  
        [--map-file=PMAP_FILE] [--print-ips={0,1}]  
        [{--integer-ips | --zero-pad-ips}] [--no-columns]  
        [--column-separator=CHAR] [--no-final-delimiter]  
        [{--delimited | --delimited=CHAR}]  
        [--output-path=PATH] [--pager=PAGER_PROG]