The SiLK Reference Guide
(SiLK-3.15.0)

CERT Coordination Center
2002–2017 Carnegie Mellon University
License available in Appendix A
 
The canonical location for this handbook is
http://tools.netsa.cert.org/silk/silk-reference-guide.pdf

March 24, 2017

Contents

Introduction
1 SiLK Analysis Tools and Utilities
 mapsid
 num2dot
 rwaddrcount
 rwaggbag
 rwaggbagbuild
 rwaggbagcat
 rwaggbagtool
 rwappend
 rwbag
 rwbagbuild
 rwbagcat
 rwbagtool
 rwcat
 rwcombine
 rwcompare
 rwcount
 rwcut
 rwdedupe
 rwfglob
 rwfileinfo
 rwfilter
 rwgeoip2ccmap
 rwgroup
 rwidsquery
 rwip2cc
 rwipaexport
 rwipaimport
 rwipfix2silk
 rwmatch
 rwnetmask
 rwp2yaf2silk
 rwpcut
 rwpdedupe
 rwpdu2silk
 rwpmapbuild
 rwpmapcat
 rwpmaplookup
 rwpmatch
 rwptoflow
 rwrandomizeip
 rwrecgenerator
 rwresolve
 rwscan
 rwscanquery
 rwset
 rwsetbuild
 rwsetcat
 rwsetmember
 rwsettool
 rwsilk2ipfix
 rwsiteinfo
 rwsort
 rwsplit
 rwstats
 rwswapbytes
 rwtotal
 rwtuc
 rwuniq
 silk_config
3 SiLK Libraries and Plug-Ins
 addrtype
 ccfilter
 flowkey
 flowrate
 int-ext-fields
 ipafilter
 packlogic-generic.so
 packlogic-twoway.so
 pmapfilter
 PySiLK
 silk-plugin
 silkpython
5 SiLK File Formats
 sensor.conf
 silk.conf
7 SiLK Miscellaneous Information
 SiLK
8 SiLK Administrator’s Tools
 flowcap
 rwflowappend
 rwflowpack
 rwguess
 rwpackchecker
 rwpollexec
 rwreceiver
 rwsender
A License

Introduction

The SiLK Reference Guide contains the manual page for each analysis tool, utility, plug-in, file format, and collection facility in the SiLK Collection and Analysis Suite.

This document is meant for reference only. The SiLK Analysis Handbook provides both a tutorial for learning about the tools and examples of how they can be used in analyzing flow data. See the SiLK Installation Handbook for instructions on installing SiLK at your site.

This reference guide is broken into sections like the traditional UNIX manual: end-user analysis tools and utilities are described in Section 1; the libraries and plug-ins that augment the behavior of some tools are presented in Section 3; Section 5 contains information about file formats; miscellaneous information is in Section 7; and commands for the installer and administrator of SiLK appear in Section 8.

 1
SiLK Analysis Tools and Utilities

This section provides the manual page for each analysis tool and utility that the users of SiLK may employ in their day-to-day work.

mapsid

Map between sensor names and sensor numbers

SYNOPSIS

  mapsid [--print-classes] [--print-descriptions]  
        [--site-config-file=FILENAME]  
        [{ <sensor-name> | <sensor-number> } ...]

  mapsid --help

  mapsid --version

DESCRIPTION

As of SiLK 3.0, mapsid is deprecated, and it will be removed in the SiLK 4.0 release. Use rwsiteinfo(1) instead---the EXAMPLES section shows how to use rwsiteinfo to get output similar to that produced by mapsid.

mapsid is a utility that maps sensor names to sensor numbers or vice versa depending on the input arguments. Sensors are defined in the silk.conf(5) file.

When no sensor arguments are given to mapsid, the mapping of all sensor numbers to names is printed. When a numeric argument is given, the number to name mapping is printed for the specified argument. When a name is given, its numeric id is printed. For convenience when typing in sensor names, case is ignored.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--print-classes

For each sensor, print the classes for which the sensor collects data. The classes are enclosed in square brackets, [].

--print-descriptions

For each sensor, print the description of the sensor as defined in the silk.conf file (if any).

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, mapsid searches for the site configuration file in the locations specified in the FILES section.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

The following examples demonstrate the use of mapsid. In addition, each example shows how to get similar output using rwsiteinfo(1).

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

Name to number mapping
 $ mapsid beta  
 BETA ->     1

 $ rwsiteinfo --fields=sensor,id-sensor --sensors=BETA  
 Sensor|Sensor-ID|  
   BETA|        1|

Unlike mapsid, matching of the sensor name is case-sensitive in rwsiteinfo.

Number to name mapping
 $ mapsid 3  
     3 -> DELTA

 $ rwsiteinfo --fields=id-sensor,sensor --sensors=3 --delimited=,  
 Sensor-ID,Sensor  
 3,DELTA

Print all mappings
 $ mapsid  
     0 -> ALPHA  
     1 -> BETA  
     2 -> GAMMA  
     3 -> DELTA  
     4 -> EPSLN  
     5 -> ZETA  
      ....

 $ rwsiteinfo --fields=id-sensor,sensor --no-titles  
   0| ALPHA|  
   1|  BETA|  
   2| GAMMA|  
   3| DELTA|  
   4| EPSLN|  
   5|  ZETA|  
   ...

Print the class
 $ mapsid --print-classes 3 ZETA  
     3 -> DELTA  [all]  
 ZETA  ->     5  [all]

 $ rwsiteinfo --fields=id-sensor,sensor,class:list --sensors=4,ZETA  
 Sensor-ID|Sensor|Class:list|  
         3| DELTA|       all|  
         5|  ZETA|       all|

Print the class and description
 $ mapsid --print-classes --print-description 0 1  
     0 -> ALPHA  [all]  "Primary gateway"  
     1 -> BETA   [all]  "Secondary gateway"

rwsiteinfo supports using an integer range when specifying sensors.

 $ rwsiteinfo --fields=id-sensor,sensor,class:list,describe-sensor \  
       --sensors=0-1  
 Sensor-ID|Sensor|Class:list|Sensor-Description|  
         0| ALPHA|       all|   Primary gateway|  
         1|  BETA|       all| Secondary gateway|

ENVIRONMENT

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, mapsid may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, mapsid may use this environment variable. See the FILES section for details.

FILES

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwsiteinfo(1), silk.conf(5), silk(7)

NOTES

As of SiLK 3.0, mapsid is deprecated; use rwsiteinfo(1) instead.

num2dot

Convert an integer IP to dotted-decimal notation

SYNOPSIS

  num2dot [--ip-fields=FIELDS] [--delimiter=C]

  num2dot --help

  num2dot --version

DESCRIPTION

num2dot is a filter to speedup sorting of IP numbers and yet result in both a natural order (i.e., 29.23.1.1 will appear before 192.168.1.1) and readable output (i.e., dotted decimal rather than an integer representation of the IP number).

It is designed specifically to deal with the output of rwcut(1). Its job is to read stdin and convert specified fields (default field 1) separated by a delimiter (default ’|’) from an integer number into a dotted decimal IP address. Up to three IP fields can be specified via the --ip-fields=FIELDS option. The --delimiter option can be used to specify an alternate delimiter.

num2dot does not support IPv6 addresses. The EXAMPLES section below includes an example PySiLK script to handle IPv6.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--ip-fields=FIELDS

Column number of the input that should be considered IP numbers. Column numbers start from 1. If not specified, the default is 1.

--delimiter=C

The character that separates the columns of the input. Default is ’|’.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following example, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

Suppose in addition to the default fields of 1-12 produced by rwcut(1), you want to prefix each row with an integer form of the destination IP and the start time to make processing by another tool (e.g., a spreadsheet) easier. However, within the default rwcut output fields of 1-12, you want to see dotted-decimal IP addresses. You could use the following command:

 $ rwfilter ... --pass=stdout                               \  
   | rwcut --fields=dip,stime,1-12 --ip-format=decimal      \  
        --timestamp-format=epoch                            \  
   | num2dot --ip-field=3,4

In the rwcut invocation, you prepend the fields of interest (dip and stime before the standard fields. The first six columns produced by rwcut will be dIP, sTime, sIP, dIP, sPort, dPort. The --ip-format switch causes the first, third, and fourth columns to be printed as integers, but you only want the first column to have an integer representation. The pipe through num2dot will convert the third and fourth columns to dotted-decimal IP numbers.

num2dot does not support converting integers to IPv6 addresses. The following PySiLK script (see pysilk(3)) could be used as a starting-point to create a version of num2dot that supports IPv6 addresses:

 #! /usr/bin/env python  
 from __future__ import print_function  
 import sys  
 import silk  
 # The IPv6 fields to process; the ID of the first field is 0  
 ip_fields = (0, 1)  
 # The delimiter between fields  
 delim = ’|’  
 # The width of the IPv6 fields  
 width = 39  
 # The file to process; this script processes standard input  
 f = sys.stdin  
 try:  
     for line in f:  
         fields = line.rstrip(f.newlines).split(delim)  
         for i in ip_fields:  
             fields[i] = "%*s" % (width, silk.IPv6Addr(int(fields[i])))  
         print(delim.join(fields))  
 finally:  
     f.close()

SEE ALSO

rwcut(1), pysilk(3), silk(7)

BUGS

num2dot has no support for IPv6 addresses.

rwaddrcount

Count activity by IP address

SYNOPSIS

  rwaddrcount {--print-recs | --print-ips | --print-stat}  
        [--use-dest] [--min-bytes=BYTEMIN] [--max-bytes=BYTEMAX]  
        [--min-records=RECMIN] [--max-records=RECMAX]  
        [--min-packets=PACKMIN] [--max-packets=PACKMAX]  
        [--set-file=PATHNAME] [--sort-ips] [--timestamp-format=FORMAT]  
        [--ip-format=FORMAT] [--integer-ips] [--zero-pad-ips]  
        [--no-titles] [--no-columns] [--column-separator=CHAR]  
        [--no-final-delimiter] [{--delimited | --delimited=CHAR}]  
        [--print-filenames] [--copy-input=PATH] [--output-path=PATH]  
        [--pager=PAGER_PROG] [--site-config-file=FILENAME]  
        [{--legacy-timestamps | --legacy-timestamps=NUM}]  
        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}

  rwaddrcount --help

  rwaddrcount --version

DESCRIPTION

rwaddrcount reads SiLK Flow records, sums the byte-, packet-, and record-counts on those records by individual source or destination IP address and maintains the time window during which that IP address was active. At the end of the count operation, the results per IP address are displayed when the --print-recs switch is given. rwaddrcount includes facilities for displaying only those IP address whose byte-, packet- or flow-counts are between specified minima and maxima.

rwaddrcount reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwaddrcount reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

For the application to operate, one of the three --print options must be chosen.

--print-recs

Print one row for each bin that meets the minima/maxima criteria. Each bin contains the IP address, number of bytes, number of packets, number of flow records, earliest start time, and latest end time.

--print-ips

Print a single column containing the IP addresses for each bin that meets the minima/maxima criteria.

--print-stat

Print a one or two line summary (plus a title line) that summarizes the bins. The first line is a summary across all bins, and it contains the number of unique IP addresses and the sums of the bytes, packets, and flow records. The second line is printed only when one or more minima or maxima are specified. This second line contains the same columns as first, and its values are the sums across those bins that meet the criteria.

--use-dest

Count by destination IP address in the filter record rather than source IP.

--min-bytes=BYTEMIN

Filtering criterion; for the final output (stats or printing), only include count records where the total number of bytes exceeds BYTEMIN

--min-packets=PACKMIN

Filtering criterion; for the final output (stats or printing), only include count records where the total number of packets exceeds PACKMIN

--min-records=RECMIN

Filtering criterion; for the final output (stats or printing), only include count records where the total number of filter records contributing to that count record exceeds RECMIN.

--max-bytes=BYTEMAX

Filtering criterion; for the final output (stats or printing), only include count records where the total number of bytes is less than BYTEMAX.

--max-packets=PACKMAX

Filtering criterion; for the final output (stats or printing), only include count records where the total number of packets is less than PACKMAX.

--max-records=RECMAX

Filtering criterion; for the final output (stats or printing), only include count records which at most RECMAX filter records contributed to.

--set-file=PATHNAME

Write the IPs into the rwset(1)-style binary IP-set file named PATHNAME. Use rwsetcat(1) to see the contents of this file.

--timestamp-format=FORMAT

Specify the format and/or timezone to use when printing timestamps. When this switch is not specified, the SILK_TIMESTAMP_FORMAT environment variable is checked for a default format and/or timezone. If it is empty or contains invalid values, timestamps are printed in the default format, and the timezone is UTC unless SiLK was compiled with local timezone support. FORMAT is a comma-separated list of a format and/or a timezone. The format is one of:

default

Print the timestamps as YYYY /MM/DDThh:mm:ss

iso

Print the timestamps as YYYY -MM-DD hh:mm:ss

m/d/y

Print the timestamps as MM/DD/YYYY hh:mm:ss

epoch

Print the timestamps as the number of seconds since 00:00:00 UTC on 1970-01-01.

When a timezone is specified, it is used regardless of the default timezone support compiled into SiLK. The timezone is one of:

utc

Use Coordinated Universal Time to print timestamps.

local

Use the TZ environment variable or the local timezone.

--ip-format=FORMAT

For the --print-recs and --print-ips output formats, specify how IP addresses are printed. When this switch is not specified, the SILK_IP_FORMAT environment variable is checked for a format. If it is empty or contains an invalid format, IPs are printed in the canonical format. The FORMAT is one of:

canonical

Print IP addresses in their canonical form, 127.0.0.1.

zero-padded

Print IP addresses in their canonical form, but add zeros to the output so it fully fills the width of column. The address 127.0.0.1 is printed as 127.000.000.001.

decimal

Print IP addresses as integers in decimal format. The address 127.0.0.1 is printed as 2130706433.

hexadecimal

Print IP addresses as integers in hexadecimal format. The address 127.0.0.1 is printed as 7f000001.

force-ipv6

Print all IP addresses in the canonical form for IPv6 without using any IPv4 notation. Any IPv4 address is mapped into the ::ffff:0:0/96 netblock. The address 127.0.0.1 is printed as ::ffff:7f00:1.

--integer-ips

Print IP addresses as integers. This switch is equivalent to --ip-format=decimal, it is deprecated as of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.

--zero-pad-ips

Print IP addresses as fully-expanded, zero-padded values in their canonical form. This switch is equivalent to --ip-format=zero-padded, it is deprecated as of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release

--sort-ips

For the --print-recs and --print-ips output formats, the results are presented sorted by IP address.

--no-titles

Turn off column titles. By default, titles are printed.

--no-columns

Disable fixed-width columnar output.

--column-separator=C

Use specified character between columns and after the final column. When this switch is not specified, the default of ’|’ is used.

--no-final-delimiter

Do not print the column separator after the final column. Normally a delimiter is printed.

--delimited
--delimited=C

Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default ’|’.

--print-filenames

Print to the standard error the names of input files as they are opened.

--copy-input=PATH

Copy all binary SiLK Flow records read as input to the specified file or named pipe. PATH may be stdout or - to write flows to the standard output as long as the --output-path switch is specified to redirect rwaddrcount’s textual output to a different location.

--output-path=PATH

Write the textual output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output (and bypass the paging program). If PATH names an existing file, rwaddrcount exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is either sent to the pager or written to the standard output.

--pager=PAGER_PROG

When output is to a terminal, invoke the program PAGER_PROG to view the output one screen full at a time. This switch overrides the SILK_PAGER environment variable, which in turn overrides the PAGER variable. If the --output-path switch is given or if value of the pager is determined to be the empty string, no paging is performed and all output is written to the terminal.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwaddrcount searches for the site configuration file in the locations specified in the FILES section.

--legacy-timestamps
--legacy-timestamps=NUM

When NUM is not specified or is 1, this switch is equivalent to --timestamp-format=m/d/y. Otherwise, the switch has no effect. This switch is deprecated as of SiLK 3.0.0, and it will be removed in the SiLK 4.0 release.

--xargs
--xargs=FILENAME

Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwaddrcount opens each named file in turn and reads records from it as if the filenames had been listed on the command line.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

Deprecated Switches

The following switches are deprecated. They will be removed in SiLK 4.0.

--byte-min=BYTEMIN

Deprecated alias for --min-bytes.

--packet-min=PACKMIN

Deprecated alias for --min-packets.

--rec-min=RECMIN

Deprecated alias for --min-records.

--byte-max=BYTEMAX

Deprecated alias for --max-bytes.

--packet-max=PACKMAX

Deprecated alias for --max-packets.

--rec-max=RECMAX

Deprecated alias for --max-records.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

To print out a set of IP’s with exactly one tcp record during the time period, use:

 $ rwfilter --start-date=2003/09/01:00 --end-date=2003/09/01:12     \  
        --proto=6 --pass=stdout                                     \  
   | rwaddrcount --max-records=1 --print-ips

In general, to print out record information, use rwaddrcount with --print-recs

 $ rwfilter --start-date=2003/01/17:00 --end-date=2003/01/17:23     \  
        --proto=6 --pass=stdout                                     \  
   | rwaddrcount --print-rec | head -3

  10.10.10.1|  65792| 147|  21| 2003/01/17T00:19:01| 2003/01/17T02:00:13|  
  10.10.10.2| 110744|  89|   7| 2003/01/17T01:21:42| 2003/01/17T01:39:21|  
  10.10.10.3|    864|  18|   6| 2003/01/17T00:20:33| 2003/01/17T01:25:38|

ENVIRONMENT

SILK_IP_FORMAT

This environment variable is used as the value for --ip-format when that switch is not provided. Since SiLK 3.11.0.

SILK_TIMESTAMP_FORMAT

This environment variable is used as the value for --timestamp-format when that switch is not provided. Since SiLK 3.11.0.

SILK_PAGER

When set to a non-empty string, rwaddrcount automatically invokes this program to display its output a screen at a time. If set to an empty string, rwaddrcount does not automatically page its output.

PAGER

When set and SILK_PAGER is not set, rwaddrcount automatically invokes this program to display its output a screen at a time.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwaddrcount may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwaddrcount may use this environment variable. See the FILES section for details.

TZ

When the argument to the --timestamp-format switch includes local or when a SiLK installation is built to use the local timezone, the value of the TZ environment variable determines the timezone in which rwaddrcount displays timestamps. (If both of those are false, the TZ environment variable is ignored.) If the TZ environment variable is not set, the machine’s default timezone is used. Setting TZ to the empty string or 0 causes timestamps to be displayed in UTC. For system information on the TZ variable, see tzset(3) or environ(7). (To determine if SiLK was built with support for the local timezone, check the Timezone support value in the output of rwaddrcount --version.)

FILES

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwset(1), rwsetcat(1), rwstats(1), rwtotal(1), rwuniq(1), silk(7), tzset(3), environ(7)

NOTES

rwaddrcount only supports IPv4 addresses, and it will not be modified to support IPv6 addresses. To produce output similar to rwaddrcount for IPv6 addresses, use rwuniq(1):

 rwuniq --fields=sip --values=bytes,packets,records,stime,etime

When used in an IPv6 environment, rwaddrcount converts IPv6 flow records that contain addresses in the ::ffff:0:0/96 prefix to IPv4 and processes them. IPv6 records having addresses outside of that prefix are ignored.

rwaddrcount uses a fairly large hashtable to store data, but it is likely that as the amount of data expands, the application will take more time to process data.

Similar binning of records are produced by rwstats(1), rwtotal(1), and rwuniq(1).

To generate a list of IP addresses without the volume information, use rwset(1).

rwaggbag

Build a binary Aggregate Bag from SiLK Flow records

SYNOPSIS

  rwaggbag --keys=KEY --counters=COUNTER  
        [--note-strip] [--note-add=TEXT] [--note-file-add=FILE]  
        [--invocation-strip] [--print-filenames] [--copy-input=PATH]  
        [--compression-method=COMP_METHOD]  
        [--ipv6-policy={ignore,asv4,mix,force,only}]  
        [--output-path=PATH]  
        [--site-config-file=FILENAME]  
        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}

  rwaggbag --help

  rwaggbag --version

DESCRIPTION

rwaggbag reads SiLK Flow records and builds an Aggregate Bag file. To build an Aggregate Bag from textual input, use rwaggbagbuild(1).

An Aggregate Bag is a binary file that maps a key to a counter, where the key and the counter are both composed of one or more fields. For example, an Aggregate Bag could contain the sum of the packet count and the sum of the byte count for each unique source IP and source port pair.

For each SiLK flow record rwaggbag reads, it extracts the values of the fields listed in the --keys switch, combines those fields into a key, searches for an existing bin that has that key and creates a new bin for that key if none is found, and adds the values for each of the fields listed in the --counters switch to the bin’s counter. Both the --keys and --counters switches are required.

rwaggbag reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwaggbag reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.

If rwaggbag runs out of memory, it will exit immediately. The output Aggregate Bag file remains behind with a size of 0 bytes.

To print the contents of an Aggregate Bag as text, use rwaggbagcat(1). The rwaggbagbuild(1) tool can create an Aggregate Bag from textual input. rwaggbagtool(1) allows you to manipulate binary Aggregate Bag files.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--keys=KEY

Create a key for binning flow records using the values of the comma-separated field(s) listed in KEY. The field names are case-insensitive, a name may be abbreviated to its shortest unique prefix, and a name may only be used one time. The list of available KEY fields are

sIPv4

source IP address when IPv4

sIPv6

source IP address when IPv6

dIPv4

destination IP address when IPv4

dIPv6

destination IP address when IPv6

sPort

source port for TCP or UDP, or equivalent

dPort

destination port for TCP or UDP, or equivalent

protocol

IP protocol

packets

count of packets recorded for this flow record

bytes

count of bytes recorded for this flow record

flags

bit-wise OR of TCP flags over all packets in the flow

sTime

starting time of the flow, in seconds resolution

duration

duration of the flow, in seconds resolution

eTime

ending time of the flow, in seconds resolution

sensor

numeric ID of the sensor where the flow was collected

input

router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))

output

router SNMP output interface or postVlanId

nhIPv4

router next hop IP address when IPv4

nhIPv6

router next hop IP address when IPv6

initialFlags

TCP flags on first packet in the flow as reported by yaf(1)

sessionFlags

bit-wise OR of TCP flags over all packets in the flow except the first as reported by yaf

attributes

flow attributes set by the flow generator

application

the content of the flow as reported in the applabel field of yaf

--counters=COUNTER

Add to the bin determined by the fields in --key the values of the comma-separated field(s) listed in COUNTER. The field names are case-insensitive, a name may be abbreviated to its shortest unique prefix, and a name may only be used one time. The list of available COUNTER fields are

records

count of the number of flow records that match the key

sum-packets

the sum of the packet counts for flow records that match the key

sum-bytes

the sum of the byte counts for flow records that match the key

sum-duration

the sum of the durations (in seconds) for flow records that match the key

--note-strip

Do not copy the notes (annotations) from the input file(s) to the output file. When this switch is not specified, notes from the input file(s) are copied to the output.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--invocation-strip

Do not record any command line history: do not copy the invocation history from the input files to the output file(s), and do not record the current command line invocation in the output. The invocation may be viewed with rwfileinfo(1).

--print-filenames

Print to the standard error the names of input files as they are opened.

--copy-input=PATH

Copy all binary SiLK Flow records read as input to the specified file or named pipe. PATH may be stdout or - to write flows to the standard output as long as the --output-path switch is specified to redirect rwaggbag’s output to a different location.

--output-path=PATH

Write the binary Aggregate Bag output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwaggbag exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwaggbag to exit with an error.

--ipv6-policy=POLICY

Determine how IPv4 and IPv6 flows are handled when SiLK has been compiled with IPv6 support. When the switch is not provided, the SILK_IPV6_POLICY environment variable is checked for a policy. If it is also unset or contains an invalid policy, the POLICY is mix. When SiLK has not been compiled with IPv6 support, IPv6 flows are always ignored, regardless of the value passed to this switch or in the SILK_IPV6_POLICY variable. The supported values for POLICY are:

ignore

Ignore any flow record marked as IPv6, regardless of the IP addresses it contains. Only IP addresses contained in IPv4 flow records will be added to the Aggregate Bag.

asv4

Convert IPv6 flow records that contain addresses in the ::ffff:0:0/96 prefix to IPv4 and ignore all other IPv6 flow records.

mix

Process the input as a mixture of IPv4 and IPv6 flow records. When creating a bag whose key is an IP address and the input contains IPv6 addresses outside of the ::ffff:0:0/96 prefix, this policy is equivalent to force; otherwise it is equivalent to asv4.

force

Convert IPv4 flow records to IPv6, mapping the IPv4 addresses into the ::ffff:0:0/96 prefix.

only

Process only flow records that are marked as IPv6. Only IP addresses contained in IPv6 flow records will be added to the Aggregate Bag.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwaggbag searches for the site configuration file in the locations specified in the FILES section.

--xargs
--xargs=FILENAME

Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwaggbag opens each named file in turn and reads records from it as if the filenames had been listed on the command line.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

To create an Aggregate Bag that sums the packet count for destination IPs addresses in the SiLK Flow file data.rw:

 $ rwaggbag --key=dipv6 --counter=sum-packets data.rw   \  
   | rwaggbagcat

To sum the number of records, packet count, and byte count for all flow records

 $ rwaggbag --key=dport --counter=records,sum-packets,sum-bytes    \  
        --output-path=dport.aggbag data.rw

To count the number of records seen for each unique source port, destination port, and protocol:

 $ rwaggbag --key=source,dport,proto --counter=records data.rw  \  
   | rwaggbagcat

ENVIRONMENT

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwaggbag may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwaggbag may use this environment variable. See the FILES section for details.

FILES

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

NOTES

rwaggbag and the other Aggregate Bag tools were introduced in SiLK 3.15.0.

SEE ALSO

rwaggbagbuild(1), rwaggbagcat(1), rwaggbagtool(1), rwbag(1), rwfileinfo(1), rwfilter(1), rwnetmask(1), rwset(1), rwuniq(1), sensor.conf(5), silk(7), zlib(3)

rwaggbagbuild

Create a binary aggregate bag from non-flow data

SYNOPSIS

  rwaggbagbuild [--fields=FIELDS]  
        [--constant-field=FIELD=VALUE [--constant-field=FIELD=VALUE...]]  
        [--column-separator=CHAR] [--no-titles]  
        [--bad-input-lines=FILE] [--verbose] [--stop-on-error]  
        [--note-add=TEXT] [--note-file-add=FILE]  
        [--invocation-strip] [--compression-method=COMP_METHOD]  
        [--output-path=PATH] [--site-config-file=FILENAME]  
        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE...]]}

  rwaggbagbuild --help

  rwaggbagbuild --version

DESCRIPTION

rwaggbagbuild builds a binary Aggregate Bag file by reading one or more files containing textual input. To build an Aggregate Bag from SiLK Flow records, use rwaggbag(1).

An Aggregate Bag is a binary file that maps a key to a counter, where the key and the counter are both composed of one or more fields. For example, an Aggregate Bag could contain the sum of the packet count and the sum of the byte count for each unique source IP and source port pair.

rwaggbagbuild reads its input from the files named on the command line or from the standard input when no file names are specified, when --xargs is not present, and when the standard input is not a terminal. To read the standard input in addition to the named files, use - or stdin as a file name. When the --xargs switch is provided, rwaggbagbuild reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.

The new Aggregate Bag file is written to the location specified by the --output-path switch. If it is not provided, output is sent to the standard output when it is not connected to a terminal.

The Aggregate Bag file must have at least one field that it considers and key field and at least one field that it considers a counter field. See the description of the --fields switch.

In general (and as detailed below), each line of the text input files becomes one entry in the Aggregate Bag file. It is also possible to specify that each entry in the Aggregate Bag file contains additional fields, each with a specific value. These fields are specified by the --constant-field switch whose argument is a field name, an equals sign (’=’), and a textual representation of a value. The named field becomes one of the key or counter fields in the Aggregate Bag file, and that field is given the specified value for each entry that is read from an input file. See the --fields switch in the OPTIONS section for the names of the fields and the acceptable forms of the textual input for each field.

The remainder of this section details how rwaggbagbuild processes each text input file to create an Aggregate Bag file.

When the --fields switch is specified, its argument specifies the key and counter fields that the new Aggregate Bag file is to contain. If --fields is not specified, the first line of the first input file is expected to contain field names, and those names determine the Aggregate Bag’s key and counter. A field name of ignore causes rwaggbagbuild to ignore the values in that field when parsing the input.

The textual input is processed one line at a time. Comments begin with a ’#’-character and continue to the end of the line; they are stripped from each line. After removing the comments, any line that is blank or contains only whitespace is ignored.

All other lines must contain valid input, which is a set of fields separated by a delimiter. The default delimiter is the virtual bar (’|’) and may be changed with the --column-separator switch. Whitespace around a delimiter is allowed; however, using space or tab as the separator causes each space or tab character to be treated as a field delimiter. The newline character is not a valid delimiter character since it is used to denote records, and ’#’ is not a valid delimiter since it begins a comment.

The first line of each input file may contain delimiter-separated field names denoting in which order the fields appear in this input file. As mentioned above, when the --fields switch is not given, the first line of the first file determines the Aggregate Bag’s key and counter. To tell rwaggbagbuild to treat the first line of each file as field values to be parsed, specify the --no-titles switch.

Every other line must contain delimiter-separated field values. A delimiter may follow the final field on a line. rwaggbagbuild ignores lines that contain either too few or too many fields.

See the description of the --fields switch in the OPTIONS section for the names of the fields and the acceptable forms of the textual input for each field.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--fields=FIELDS

Specify the fields in the input files. FIELDS is a comma separated list of field names. Field names are case-insensitive, and a name may be abbreviated to the shortest unique prefix. Other than the ignore field, a field name may not be specified more than once. The Aggregate Bag file must have at least one key field and at least one counter field.

The names of the fields that are considered key fields, their descriptions, and the format of the input that each expects are:

ignore

field that rwaggbagbuild is to skip

sIPv4

source IP address, IPv4 only; either the canonical dotted-quad format or an integer from 0 to 4294967295 inclusive

dIPv4

destination IP address, IPv4 only; uses the same format as sIPv4

nhIPv4

next hop IP address, IPv4 only; uses the same format as sIPv4

any-IPv4

a generic IPv4 address; uses the same format as sIPv4

sIPv6

source IP address, IPv6 only; the canonical hex-encoded format for IPv6 addresses

dIPv6

destination IP address, IPv6 only; uses the same format as sIPv6

nhIPv6

next hop IP address, IPv6 only; uses the same format as sIPv6

any-IPv6

a generic IPv6 address; uses the same format as sIPv6

sPort

source port; an integer from 0 to 65535 inclusive

dPort

destination port; an integer from 0 to 65535 inclusive

any-port

a generic port; an integer from 0 to 65535 inclusive

protocol

IP protocol; an integer from 0 to 255 inclusive

packets

packet count; an integer from 1 to 4294967295 inclusive

bytes

byte count; an integer from 1 to 4294967295 inclusive

flags

bit-wise OR of TCP flags over all packets; a string containing F, S, R, P, A, U, E, C in upper- or lowercase

initialFlags

TCP flags on the first packet; uses the same form as flags

sessionFlags

bit-wise OR of TCP flags on the second through final packet; uses the same form as flags

sTime

starting time in seconds; uses the form YYYY/MM/DD[:hh[:mm[:ss[.sss]]]] (any milliseconds value is dropped). A T may be used in place of : to separate the day and hour fields. A floating point value between 536870912 and 2147483647 is also allowed and is treated as seconds since the UNIX epoch.

eTime

ending time in seconds; uses the same format as sTime

any-time

a generic time in seconds; uses the same format as sTime

duration

duration of flow; a floating point value from 0.0 to 4294967.295

sensor

sensor name or ID at the collection point; a string as given in silk.conf(5)

class

class at collection point; a string as given in silk.conf

type

type at collection point; a string as given in silk.conf

input

router SNMP ingress interface or vlanId; an integer from 0 to 65535

output

router SNMP egress interface or postVlanId; an integer from 0 to 65535

any-snmp

a generic SNMP value; an integer from 0 to 65535

attribute

flow attributes set by the flow generator:

S

all the packets in this flow record are exactly the same size

F

flow generator saw additional packets in this flow following a packet with a FIN flag (excluding ACK packets)

T

flow generator prematurely created a record for a long-running connection due to a timeout or a byte-count threshold

C

flow generator created a record as a continuation of a previous record for a connection that exceeded a timeout or byte-count threshold

application

guess as to the content of the flow; as an integer from 0 to 65535

icmpType

ICMP type; an integer from 0 to 255 inclusive

icmpCode

ICMP code; an integer from 0 to 255 inclusive

custom-key

a generic key; an integer from 0 to 4294967295 inclusive

The names and descriptions of the fields that are considered counter fields are listed next. For each, the type of input is an unsigned 64-bit number; that is, an integer from 0 to 18446744073709551615.

records

count of records that match the key

sum-packets

sum of packet counts

sum-bytes

sum of byte counts

sum-duration

sum of duration values

custom-counter

a generic counter

--constant-field=FIELD=VALUE

For each entry read from the input file(s), insert a field named FIELD and set its value to VALUE. VALUE is a textual representation of the field’s value as described in the description of the --fields switch above. When FIELD is a counter field and the same key appears multiple times in the input, VALUE is added to the counter multiple times. If a field named FIELD appears in an input file, its value from that file is ignored. Specify the --constant-field switch multiple times to insert multiple fields.

--column-separator=CHAR

When reading textual input, use the character CHAR as the delimiter between columns (fields) in the input. The default column separator is the vertical pipe (’|’). rwaggbagbuild normally ignores whitespace (space and tab) around the column separator; however, using space or tab as the separator causes each space or tab character to be treated as a field delimiter. The newline character is not a valid delimiter character since it is used to denote records, and ’#’ is not a valid delimiter since it begins a comment.

--bad-input-lines=FILEPATH

When parsing textual input, copy any lines than cannot be parsed to FILEPATH. The strings stdout and stderr may be used for the standard output and standard error, respectively. Each bad line is prepended by the name of the source input file, a colon, the line number, and a colon. On exit, rwaggbagbuild removes FILEPATH if all input lines were successfully parsed.

--verbose

When a textual input line fails to parse, print a message to the standard error describing the problem. When this switch is not specified, parsing failures are not reported. rwaggbagbuild continues to process the input after printing the message. To stop processing when a parsing error occurs, use --stop-on-error.

--stop-on-error

When a textual input line fails to parse, print a message to the standard error describing the problem and exit the program. When this occurs, the output file contains any records successfully created prior to reading the bad input line. The default behavior of rwaggbagbuild is to silently ignore parsing errors. To report parsing errors and continue processing the input, use --verbose.

--no-titles

Parse the first line of the input as field values. Normally when the --fields switch is specified, rwaggbagbuild examines the first line to determine if the line contains the names (titles) of fields and skips the line if it does. rwaggbagbuild exits with an error when --no-titles is given but --fields is not.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--invocation-strip

Do not record the command used to create the Aggregate Bag file in the output. When this switch is not given, the invocation is written to the file’s header, and the invocation may be viewed with rwfileinfo(1).

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--output-path=PATH

Write the binary Aggregate Bag output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwaggbagbuild exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwaggbagbuild to exit with an error.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwaggbagbuild searches for the site configuration file in the locations specified in the FILES section.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

Assume the following textual data in the file rec.txt:

             dIP|dPort|   packets|     bytes|  
   10.245.15.175|   80|       127|     12862|  
 192.168.251.186|29222|       131|    351213|  
  10.247.186.130|   80|       596|     38941|  
 192.168.239.224|29362|       600|    404478|  
 192.168.215.219|   80|       400|     32375|  
   10.255.252.19|28925|       404|   1052274|  
 192.168.255.249|   80|       112|      7412|  
    10.208.7.238|29246|       109|    112977|  
 192.168.254.127|   80|       111|      9759|  
   10.218.34.108|29700|       114|    461845|

To create an Aggregate Bag file from this data, provide the --fields switch with the names used by the Aggregate Bag tools:

 $ rwaggbagbuild --fields=dipv4,dport,sum-packets,sum-bytes  \  
        --output-path=ab.aggbag rec.txt

Use the rwaggbagcat(1) tool to view it:

 $ rwaggbagcat ab.aggbag  
           dIPv4|dPort|    sum-packets|           sum-bytes|  
    10.208.7.238|29246|            109|              112977|  
   10.218.34.108|29700|            114|              461845|  
   10.245.15.175|   80|            127|               12862|  
  10.247.186.130|   80|            596|               38941|  
   10.255.252.19|28925|            404|             1052274|  
 192.168.215.219|   80|            400|               32375|  
 192.168.239.224|29362|            600|              404478|  
 192.168.251.186|29222|            131|              351213|  
 192.168.254.127|   80|            111|                9759|  
 192.168.255.249|   80|            112|                7412|

Create an Aggregate Bag from the destination port field and count the number of times each port appears, ignore all fields except the dPort fields and use --constant-field to add a new field:

 $ rwaggbagbuild --fields=ignore,dport,ignore,ignore  \  
        --constant-field=record=1                     \  
   | rwaggbagcat  
 dPort|   records|  
    80|         5|  
 28925|         1|  
 29222|         1|  
 29246|         1|  
 29362|         1|  
 29700|         1|

Alternatively, use rwaggbagtool(1) to get the same information from the ab.aggbag file created above:

 $ rwaggbagtool --select-fields=dport        \  
        --insert-field=record=1 ab.aggbag    \  
   | rwaggbagcat  
 dPort|   records|  
    80|         5|  
 28925|         1|  
 29222|         1|  
 29246|         1|  
 29362|         1|  
 29700|         1|

ENVIRONMENT

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwaggbagbuild may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwaggbagbuild may use this environment variable. See the FILES section for details.

FILES

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwaggbag(1), rwaggbagcat(1), rwaggbagtool(1), rwfileinfo(1), rwpmapbuild(1), rwset(1), rwsetbuild(1), rwsetcat(1), rwsettool(1), silk(7), ccfilter(3), zlib(3)

NOTES

rwaggbagbuild and the other Aggregate Bag tools were introduced in SiLK 3.15.0.

rwaggbagcat

Output a binary Aggregate Bag file as text

SYNOPSIS

  rwaggbagcat [--timestamp-format=FORMAT] [--ip-format=FORMAT]  
        [--integer-sensors] [--integer-tcp-flags]  
        [--no-titles] [--no-columns] [--column-separator=C]  
        [--no-final-delimiter] [{--delimited | --delimited=C}]  
        [--output-path=PATH] [--pager=PAGER_PROG]  
        [--site-config-file=FILENAME]  
        [AGGBAGFILE [AGGBAGFILE...]]

  rwaggbagcat --help

  rwaggbagcat --version

DESCRIPTION

rwaggbagcat reads a binary Aggregate Bag as created by rwaggbag(1) or rwaggbagbuild(1), converts it to text, and outputs it to the standard output, the pager, or the specified file.

rwaggbagcat reads the AGGBAGFILEs specified on the command line; if no AGGBAGFILE arguments are given, rwaggbagcat attempts to read an Aggregate Bag from the standard input. To read the standard input in addition to the named files, use - or stdin as an AGGBAGFILE name. If any input does not contain an Aggregate Bag file, rwaggbagcat prints an error to the standard error and exits abnormally.

When multiple AGGBAGFILEs are specified on the command line, each is handled individually. To process the files as a single Aggregate Bag, use rwaggbagtool(1) to combine the Aggregate Bags and pipe the output of rwaggbagtool into rwaggbagcat.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--timestamp-format=FORMAT

Specify the format, timezone, and/or modifier to use when printing timestamps. When this switch is not specified, the SILK_TIMESTAMP_FORMAT environment variable is checked for a format, timezone, and modifier. If it is empty or contains invalid values, timestamps are printed in the default format, and the timezone is UTC unless SiLK was compiled with local timezone support. FORMAT is a comma-separated list of a format, a timezone, and/or a modifier. The format is one of:

default

Print the timestamps as YYYY /MM/DDThh:mm:ss.sss.

iso

Print the timestamps as YYYY -MM-DD hh:mm:ss.sss.

m/d/y

Print the timestamps as MM/DD/YYYY hh:mm:ss.sss.

epoch

Print the timestamps as the number of seconds since 00:00:00 UTC on 1970-01-01.

When a timezone is specified, it is used regardless of the default timezone support compiled into SiLK. The timezone is one of:

utc

Use Coordinated Universal Time to print timestamps.

local

Use the TZ environment variable or the local timezone.

--ip-format=FORMAT

Specify how IP addresses are printed. When this switch is not specified, the SILK_IP_FORMAT environment variable is checked for a format. If it is empty or contains an invalid format, IPs are printed in the canonical format. The FORMAT is one of:

canonical

Print IP addresses in their canonical form: dotted quad for IPv4 (127.0.0.1) and hexadectet for IPv6 (2001:db8::1). Note that IPv6 addresses in ::ffff:0:0/96 and some IPv6 addresses in ::/96 will be printed as a mixture of IPv6 and IPv4.

zero-padded

Print IP addresses in their canonical form, but add zeros to the output so it fully fills the width of column. The addresses 127.0.0.1 and 2001:db8::1 are printed as 127.000.000.001 and 2001:0db8:0000:0000:0000:0000:0000:0001, respectively. When the --ipv6-policy is force, the output for 127.0.0.1 becomes 0000:0000:0000:0000:0000:ffff:7f00:0001.

decimal

Print IP addresses as integers in decimal format. The addresses 127.0.0.1 and 2001:db8::1 are printed as 2130706433 and 42540766411282592856903984951653826561, respectively.

hexadecimal

Print IP addresses as integers in hexadecimal format. The addresses 127.0.0.1 and 2001:db8::1 are printed as 7f000001 and 20010db8000000000000000000000001, respectively.

force-ipv6

Print all IP addresses in the canonical form for IPv6 without using any IPv4 notation. Any IPv4 address is mapped into the ::ffff:0:0/96 netblock. The addresses 127.0.0.1 and 2001:db8::1 are printed as ::ffff:7f00:1 and 2001:db8::1, respectively.

--integer-sensors

Print the integer ID of the sensor rather than its name.

--integer-tcp-flags

Print the TCP flag fields (flags, initialFlags, sessionFlags) as an integer value. Typically, the characters F,S,R,P,A,U,E,C are used to represent the TCP flags.

--no-titles

Turn off column titles. By default, titles are printed.

--no-columns

Disable fixed-width columnar output.

--column-separator=C

Use specified character between columns and after the final column. When this switch is not specified, the default of ’|’ is used.

--no-final-delimiter

Do not print the column separator after the final column. Normally a delimiter is printed.

--delimited
--delimited=C

Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default ’|’.

--output-path=PATH

Write the textual output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output (and bypass the paging program). If PATH names an existing file, rwaggbagcat exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this option is not given, the output is either sent to the pager or written to the standard output.

--pager=PAGER_PROG

When output is to a terminal, invoke the program PAGER_PROG to view the output one screen full at a time. This switch overrides the SILK_PAGER environment variable, which in turn overrides the PAGER variable. If the --output-path switch is given or if the value of the pager is determined to be the empty string, no paging is performed and all output is written to the terminal.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwaggbagcat searches for the site configuration file in the locations specified in the FILES section.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

The formatting switches on rwaggbagcat are similar to those on the other SiLK tools.

First, use rwaggbag(1) to create an Aggregate Bag file from the SiLK Flow file data.rw:

 $ rwaggbag --key=sport,dport --counter=sum-pack,sum-byte \  
        --output-path=ab.aggbag data.rw

To print Aggregate Bag:

 $ rwaggbagcat ab.aggbag | head -4  
 sPort|dPort|    sum-packets|           sum-bytes|  
     0|    0|          73452|             6169968|  
     0|  769|          15052|              842912|  
     0|  771|          14176|              793856|

To produce column separated data:

 rwaggbagcat --delimited=, /tmp/ab.aggbag | head -4  
 sPort,dPort,sum-packets,sum-bytes  
 0,0,73452,6169968  
 0,769,15052,842912  
 0,771,14176,793856

To remove the title:

 $ rwaggbagcat --no-title ab.aggbag | head -4  
     0|    0|          73452|             6169968|  
     0|  769|          15052|              842912|  
     0|  771|          14176|              793856|  
     0| 2048|          14356|             1205904|

To change the format of IP addresses:

 $ rwaggbag --key=sipv4,dipv4 --counter=sum-pack,sum-byte data.rw   \  
   | rwaggbagcat --ip-format=decimal | head -4  
      sIPv4|     dIPv4|    sum-packets|           sum-bytes|  
  168047851|3232295339|            255|               18260|  
  168159227|3232293505|            331|              536169|  
  168381813|3232282689|            563|               55386|

To change the format of timestamps:

 $ rwaggbag --key=stime,etime --counter=sum-pack,sum-byte data.rwf  \  
   | rwaggbagcat --timestamp-format=epoch | head -4  
      sTime|     eTime|    sum-packets|           sum-bytes|  
 1234396802|1234396802|              2|                 259|  
 1234396802|1234398594|            526|               38736|  
 1234396803|1234396803|              9|                 504|

ENVIRONMENT

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_PAGER

When set to a non-empty string, rwaggbagcat automatically invokes this program to display its output a screen at a time. If set to an empty string, rwaggbagcat does not automatically page its output.

PAGER

When set and SILK_PAGER is not set, rwaggbagcat automatically invokes this program to display its output a screen at a time.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwaggbagcat may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files and plug-ins, rwaggbagcat may use this environment variable. See the FILES section for details.

TZ

When the argument to the --timestamp-format switch includes local or when a SiLK installation is built to use the local timezone, the value of the TZ environment variable determines the timezone in which rwaggbagcat displays timestamps. (If both of those are false, the TZ environment variable is ignored.) If the TZ environment variable is not set, the machine’s default timezone is used. Setting TZ to the empty string or 0 causes timestamps to be displayed in UTC. For system information on the TZ variable, see tzset(3) or environ(7). (To determine if SiLK was built with support for the local timezone, check the Timezone support value in the output of rwaggbagcat --version.)

FILES

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

NOTES

rwaggbagcat and the other Aggregate Bag tools were introduced in SiLK 3.15.0.

SEE ALSO

rwaggbag(1), rwaggbagbuild(1), rwaggbagtool(1), silk(7), tzset(3), environ(7)

rwaggbagtool

Manipulate binary Aggregate Bag files

SYNOPSIS

  rwaggbagtool { --add | --subtract }  
        [--insert-field=FIELD=VALUE [--insert-field=FIELD2=VALUE2...]]  
        [--remove-fields=FIELD_LIST] [--select-fields=FIELD_LIST]  
        [--to-ipset=FIELD [--ipset-record-version=VERSION]]  
        [--to-bag=BAG_KEY,BAG_COUNTER] [--output-path=PATH]  
        [--note-strip] [--note-add=TEXT] [--note-file-add=FILE]  
        [--compression-method=COMP_METHOD]  
        [--site-config-file=FILENAME]  
        [AGGBAG_FILE [AGGBAG_FILE ...]]

  rwaggbagtool --help

  rwaggbagtool --version

DESCRIPTION

rwaggbagtool performs operations on one or more Aggregate Bag files and creates a new Aggregate Bag file.

rwaggbagtool processes the Aggregate Bag files listed on the command line. When no file names are specified, rwaggbagtool attempts to read an Aggregate Bag from the standard input. To read the standard input in addition to the named files, use - or stdin as a file name. If any input is not an Aggregate Bag file, rwaggbagtool prints an error to the standard error and exits with an error status.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--add

Sum each of the counters for each key for all the Aggregate Bag input files. All the Aggregate Bag files must have the same set of key fields and counter fields. (The values of the keys may differ, but the set of fields that comprise the key must match.) If no other operation is specified, the add operation is the default.

--subtract

Subtract from the first Aggregate Bag file all subsequent Aggregate Bag files. All the Aggregate Bag files must have the same set of key fields and counter fields. If a key does not appear in the first Aggregate Bag file, rwaggbagtool assumes it has a value of 0. If any counter subtraction results in a negative number, the key will not appear in the resulting Aggregate Bag file.

Field manipulation switches

The following switches allow modification of the fields in the Aggregate Bag file.

--insert-field=FIELD=VALUE

For each entry read from an Aggregate Bag input file, insert a field named FIELD and set its value to VALUE if one of the following is true: (1)the input file does not contain a field named FIELD or (2)the input file does have a field named FIELD but it was removed by either (2a)being listed in the --remove-fields list or (2b)not being listed in the --select-fields list. That is, this switch only inserts FIELD when FIELD is not present in the input Aggregate Bag, but specifying FIELD in --remove-fields removes it from the input. VALUE is a textual representation of the field’s value as described in the description of the --fields switch in the rwaggbagbuild(1) tool. This switch may be repeated in order to insert multiple fields.

--remove-fields=FIELD_LIST

Remove the fields specified in FIELD_LIST from each of the Aggregate Bag input files, where FIELD_LIST is a comma-separated list of field names. This switch may include field names that are not in an Aggregate Bag input, and those field names are ignored. If a field name is included in this list and in a --insert-field switch, the field is given the value specified by the --insert-field switch, and the field is included in the output Aggregate Bag file. If removing a key field produces multiple copies of a key, the counters of those keys are merged. rwaggbagbuild exits with an error when this switch is used with --select-fields, --to-ipset, or --to-bag.

--select-fields=FIELD_LIST

For each Aggregate Bag input file, only use the fields in FIELD_LIST, a comma-separated list of field names. Alternatively, consider this switch as removing all fields that are not included in FIELD_LIST. This switch may include field names that are not in an Aggregate Bag input, and those field names are ignored. When a field name is included in this list and in a --insert-field switch, the field uses its value from the input Aggregate Bag file if present, and it uses the value specified in the --insert-field switch otherwise. If selecting only some key fields produces multiple copies of a key, the counters of those keys are merged. rwaggbagbuild exits with an error when this switch is used with --remove-fields, --to-ipset, or --to-bag.

Output switches

The following switches control the output.

--to-ipset=FIELD

After operating on the Aggregate Bag input files, create an IPset file from the resulting Aggregate Bag by treating the values in the field named FIELD as IP addresses, inserting the IP addresses into the IPset, and writing the IPset to the standard output or the destination specified by --output-path. When this switch is used, the only legal field name that may be used in the --insert-field switch is FIELD. rwaggbagbuild exits with an error when this switch is used with --remove-fields, --select-fields, or --to-bag.

--ipset-record-version=VERSION

Specify the format of the IPset records that are written to the output when the --to-ipset switch is used. VERSION may be 2, 3, 4, 5 or the special value 0. When the switch is not provided, the SILK_IPSET_RECORD_VERSION environment variable is checked for a version. The default version is 0.

 0 

Use the default version for an IPv4 IPset and an IPv6 IPset, currently 2 and 3, respectively.

 2 

Create a file that may hold only IPv4 adresses and is readable by all versions of SiLK.

 3 

Create a file that may hold IPv4 or IPv6 adresses and is readable by SiLK 3.0 and later.

 4 

Create a file that may hold IPv4 or IPv6 adresses and is readable by SiLK 3.7 and later. These files are more compact that version 3 and often more compact than version 2.

 5 

Create a file that may hold only IPv6 adresses and is readable by SiLK 3.14 and later. When this version is specified, IPsets containing only IPv4 addresses are written in version 4. These files are usually more compact that version 4.

--to-bag=BAG_KEY,BAG_COUNTER

After operating on the Aggregate Bag input files, create a (normal) Bag file from the resulting Aggregate Bag. Use the BAG_KEY field as the key of the Bag, and the BAG_COUNTER field as the counter of the Bag. Write the Bag to the standard output or the destination specified by --output-path. When this switch is used, the only legal field names that may be used in the --insert-field switch are BAG_KEY and BAG_COUNTER. rwaggbagbuild exits with an error when this switch is used with --remove-fields, --select-fields, or --to-ipset.

--output-path=PATH

Write the resulting Aggregate Bag, IPset (see --to-ipset), or Bag (see --to-bag) to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwaggbagtool exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwaggbagtool to exit with an error.

--note-strip

Do not copy the notes (annotations) from the input files to the output file. Normally notes from the input files are copied to the output.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

Miscellaneous switches
--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwaggbagtool searches for the site configuration file in the locations specified in the FILES section.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

To create two Aggregate Bag files, in.aggbag and inweb.aggbag, and then add the counters to create total.aggbag:

 $ rwfilter --type=in --pass=-                              \  
   | rwaggbag --key=sport,dport,proto --counter=records     \  
        --output-path=in.aggbag  
 $ rwfilter --type=inweb --pass=-                           \  
   | rwaggbag --key=sport,dport,proto --counter=records     \  
        --output-path=inweb.aggbag  
 $ rwaggbagtool --add in.aggbag inweb.aggbag --output-path=total.aggbag  
 $ rwaggbagcat total.aggbag

To subtract inweb.aggbag from total.aggbag:

 $ rwaggbagtool --subtract total.aggbag inweb.aggbag    \  
   | rwaggbagcat

Create an Aggregate Bag file:

 $ rwaggbag --key=sport,dport                       \  
        --counter=sum-bytes,sum-packets data.rw     \  
        --output-path=my-ab.aggbag

To get just the source port and byte count from the file my-ab.aggbag, you may either remove the destination port and packet count:

 $ rwaggbagtool --remove=dport,sum-packets my-ab.aggbag  \  
        --output-path=source-bytes.aggbag

or you may select the source port and byte count:

 $ rwaggbagtool --select=sport,sum-bytes my-ag.aggbag    \  
        --output-path=source-bytes.aggbag

To replace the packet count in my-ab.aggbag with zeros, remove the field and insert it with the value you want:

 $ rwaggbagtool --remove=sum-packets --insert=sum-packets=0  \  
        my-ab.aggbag --output-path=zero-packets.aggbag

To create a regular Bag with the source port and byte count from my-ab.aggbag, use the --to-bag switch:

 $ rwaggbagtool --to-bag=sport,sum-bytes my-ab.aggbag  \  
        --output-path=sport-byte.bag

The --to-ipset switch works similarly:

 $ rwaggbag --key=sipv6,dipv6 --counter=records data-v6.rw  \  
        --output-path=ips.aggbag  
 $ rwaggbagtool --to-ipset=dipv6 --output-path=dip.set

ENVIRONMENT

SILK_IPSET_RECORD_VERSION

This environment variable is used as the value for the --ipset-record-version when that switch is not provided.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwaggbagtool may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwaggbagtool may use this environment variable. See the FILES section for details.

FILES

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwaggbag(1), rwaggbagbuild(1), rwaggbagcat(1), rwfilter(1), rwfileinfo(1), silk(7), zlib(3)

rwappend

Append SiLK Flow file(s) to an existing SiLK Flow file

SYNOPSIS

  rwappend [--create=[TEMPLATE_FILE]] [--print-statistics]  
         [--site-config-file=FILENAME]  
         TARGET_FILE SOURCE_FILE [SOURCE_FILE...]

  rwappend --help

  rwappend --version

DESCRIPTION

rwappend reads SiLK Flow records from the specified SOURCE_FILEs and appends them to the TARGET_FILE. If stdin is used as the name of one of the SOURCE_FILEs, SiLK flow records will be read from the standard input.

When the TARGET_FILE does not exist and the --create switch is not provided, rwappend will exit with an error. When --create is specified and TARGET_FILE does not exist, rwappend will create the TARGET_FILE using the same format, version, and byte-order as the specified TEMPLATE_FILE. If no TEMPLATE_FILE is given, the TARGET_FILE is created in the default format and version (the same format that rwcat(1) would produce).

The TARGET_FILE must be an actual file---it cannot be a named pipe or the standard output. In addition, the header of TARGET_FILE must not be compressed; that is, you cannot append to a file whose entire contents has been compressed with gzip (those files normally end in the .gz extension).

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--create
--create=TEMPLATE_FILE

Create the TARGET_FILE if it does not exist. The file will have the same format, version, and byte-order as the TEMPLATE_FILE if it is provided; otherwise the defaults are used. The TEMPLATE_FILE will NOT be appended to TARGET_FILE unless it also appears in as the name of a SOURCE_FILE.

--print-statistics

Print to the standard error the number of records read from each SOURCE_FILE and the total number of records appended to the TARGET_FILE.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwappend searches for the site configuration file in the locations specified in the FILES section.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

Standard usage where the file to append to, results.rw, exists:

 $ rwappend results.rw sample5.rw sample6.rw

To append files sample*.rw to results.rw, or to create results.rw using the same format as the first file argument (note that sample1.rw must be repeated):

 $ rwappend results.rw --create=sample1.rw          \  
        sample1.rw sample2.rw

If results.rw does not exist, the following two commands are equivalent:

 $ rwappend --create results.rw sample1.rw sample2.rw

 $ rwcat sample1.rw sample2.rw > results.rw

ENVIRONMENT

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwappend may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwappend may use this environment variable. See the FILES section for details.

FILES

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwcat(1), silk(7)

BUGS

When a SOURCE_FILE contains IPv6 flow records and the TARGET_FILE only supports IPv4 records, rwappend converts IPv6 records that contain addresses in the ::ffff:0:0/96 prefix to IPv4 and writes them to the TARGET_FILE. rwappend silently ignores IPv6 records having addresses outside of that prefix.

rwappend makes some attempts to avoid appending a file to itself (which would eventually exhaust the disk space) by comparing the names of files it is given; it should be smarter about this.

rwbag

Build a binary Bag from SiLK Flow records

SYNOPSIS

  rwbag --bag-file=KEY,COUNTER,OUTPUTFILE  
        [--bag-file=KEY,COUNTER,OUTPUTFILE ...]  
        [{ --pmap-file=PATH | --pmap-file=MAPNAME:PATH }]  
        [--note-strip] [--note-add=TEXT] [--note-file-add=FILE]  
        [--invocation-strip] [--print-filenames] [--copy-input=PATH]  
        [--compression-method=COMP_METHOD]  
        [--ipv6-policy={ignore,asv4,mix,force,only}]  
        [--site-config-file=FILENAME]  
        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}

  rwbag --help

  rwbag --legacy-help

  rwbag --version

LEGACY SYNOPSIS

  rwbag [--sip-flows=OUTPUTFILE] [--dip-flows=OUTPUTFILE]  
        [--sport-flows=OUTPUTFILE] [--dport-flows=OUTPUTFILE]  
        [--proto-flows=OUTPUTFILE] [--sensor-flows=OUTPUTFILE]  
        [--input-flows=OUTPUTFILE] [--output-flows=OUTPUTFILE]  
        [--nhip-flows=OUTPUTFILE]  
        [--sip-packets=OUTPUTFILE] [--dip-packets=OUTPUTFILE]  
        [--sport-packets=OUTPUTFILE] [--dport-packets=OUTPUTFILE]  
        [--proto-packets=OUTPUTFILE] [--sensor-packets=OUTPUTFILE]  
        [--input-packets=OUTPUTFILE] [--output-packets=OUTPUTFILE]  
        [--nhip-packets=OUTPUTFILE]  
        [--sip-bytes=OUTPUTFILE] [--dip-bytes=OUTPUTFILE]  
        [--sport-bytes=OUTPUTFILE] [--dport-bytes=OUTPUTFILE]  
        [--proto-bytes=OUTPUTFILE] [--sensor-bytes=OUTPUTFILE]  
        [--input-bytes=OUTPUTFILE] [--output-bytes=OUTPUTFILE]  
        [--nhip-bytes=OUTPUTFILE]  
        [--note-add=TEXT] [--note-file-add=FILE]  
        [--print-filenames] [--copy-input=PATH]  
        [--compression-method=COMP_METHOD]  
        [--ipv6-policy={ignore,asv4,mix,force,only}]  
        [--site-config-file=FILENAME]  
        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}

DESCRIPTION

rwbag reads SiLK Flow records and builds one or more Bag files. A Bag is similar to a set but each key is associated with a counter. Usually the key is some aspect of a flow record (an IP address, a port, the protocol, et cetera), and the counter is a volume (such as the number of flow records or the sum or bytes or packets) for the flow records that match that key. A Bag file supports a single key field and a single counter field; use the Aggregate Bag tools (e.g., rwaggbag(1)) when the key or counter contains multiple fields.

The --bag-file switch is required and it specifies how to create a Bag file. The argument to the switch names the key field to use for the bag, the counter field, and the location where the bag file is to be written. The switch may be repeated to create multiple Bag files.

rwbag reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwbag reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.

If adding a value to a key would cause the value to overflow the maximum value that Bags support, the key’s value will be set to the maximum and processing will continue. In addition, if this is the first value to overflow in this Bag, a warning will be printed to the standard error.

If rwbag runs out of memory, it will exit immediately. The output Bag files will remain behind, each with a size of 0 bytes.

Use rwbagcat(1) to see the contents of a bag. To create a bag from textual input or from an IPset, use rwbagbuild(1). rwbagtool(1) allows you to manipulate binary bag files.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--bag-file=KEY,COUNTER,OUTPUTFILE

Bin flow records by unique KEY, compute the COUNTER for each bin, and write the result to OUTPUTFILE. The list of available KEY and COUNTER values are given immediately below. OUTPUTFILE is the name of a non-existent file, a named pipe, or the keyword stdout or - to write the binary Bag to the standard output. Repeat the --bag-file switch to create multiple Bag files in a single pass over the data. Only one OUTPUTFILE may use the standard output. See LEGACY BAG CREATION SWITCHES for deprecated methods to create Bag files. This switch or one of legacy equivalents is required. Since SiLK 3.12.0.

rwbag supports the following names for KEY. The case of KEY is ignored.

sIPv4

source IP address, either IPv4 or IPv6

sIPv6

source IP address, either IPv4 or IPv6

dIPv4

destination IP address, either IPv4 or IPv6

dIPv6

destination IP address, either IPv4 or IPv6

sPort

source port for TCP or UDP, or equivalent

dPort

destination port for TCP or UDP, or equivalent

protocol

IP protocol

packets

count of packets recorded for this flow record

bytes

count of bytes recorded for this flow record

flags

bit-wise OR of TCP flags over all packets in the flow

sTime

starting time of the flow, in seconds resolution

duration

duration of the flow, in seconds resolution

eTime

ending time of the flow, in seconds resolution

sensor

numeric ID of the sensor where the flow was collected

input

router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))

output

router SNMP output interface or postVlanId

nhIPv4

router next hop IP address, either IPv4 or IPv6

nhIPv6

router next hop IP address, either IPv4 or IPv6

initialFlags

TCP flags on first packet in the flow

sessionFlags

bit-wise OR of TCP flags over all packets except the first in the flow

attributes

flow attributes set by the flow generator

application

guess as to the content of the flow

sip-country

the country code of the source IP address. Uses the mapping file specified by the SILK_COUNTRY_CODES environment variable or the country_codes.pmap mapping file, as described in FILES. (See also ccfilter(3).) The abbreviations are those used by the Root-Zone Whois Index (see for example http://www.iana.org/cctld/cctld-whois.htm) or the following special codes: -- N/A (e.g. private and experimental reserved addresses); a1 anonymous proxy; a2 satellite provider; o1 other Since SiLK 3.12.0.

scc

an alias for sip-country

dip-country

the country code of the destination IP address

dcc

an alias for dip-country

sip-pmap:MAPNAME

the value that the source IP address maps to in the mapping file whose map-name is MAPNAME. The type of that prefix map must be IPv4-address or IPv6-address. Use --pmap-file to load the mapping file and optionally set its map-name. Since the MAPNAME must be known when the --bag-file switch is parsed, the --pmap-file switch(es) should precede the --bag-file switch(es).

dip-pmap:MAPNAME

the value that the destination IP address maps to in the mapping file whose map-name is MAPNAME. See sip-pmap:MAPNAME.

sport-pmap:MAPNAME

the value that the protocol/source-port pair maps to in the mapping file whose map-name is MAPNAME. The type of that prefix map must be proto-port. Use --pmap-file to load the mapping file and optionally set its map-name. Since the MAPNAME must be known when the --bag-file switch is parsed, the --pmap-file switch(es) should precede the --bag-file switch(es).

dport-pmap:MAPNAME

the value that the protocol/destination-port pair maps to in the mapping file whose map-name is MAPNAME. See sport-pmap:MAPNAME.

rwbag supports the following names for COUNTER. The case of COUNTER is ignored.

records

count of the number of flow records that match the key

flows

an alias for records

sum-packets

the sum of the packet counts for flow records that match the key

packets

an alias for sum-packets

sum-bytes

the sum of the byte counts for flow records that match the key

bytes

an alias for sum-bytes

--pmap-file=PATH
--pmap-file=MAPNAME:PATH

Load the the prefix map file from PATH for use when the key part of the argument to the --bag-file switch is one of sip-pmap, dip-pmap, sport-pmap, or dport-pmap. Specify PATH as - or stdin to read from the standard input. If MAPNAME is specified, it overrides the map-name contained in the prefix map file itself. If no map-name is available, rwbag exits with an error. The switch may be repeated to load multiple prefix map files; each file must have a unique map-name. To create a prefix map file, use rwpmapbuild(1). Since SiLK 3.12.0.

--note-strip

Do not copy the notes (annotations) from the input files to the output file(s). When this switch is not specified, notes from the input files are copied to the output. Since SiLK 3.12.2.

--note-add=TEXT

Add the specified TEXT to the header of every output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of every output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--invocation-strip

Do not record any command line history: do not copy the invocation history from the input files to the output file(s), and do not record the current command line invocation in the output. The invocation may be viewed with rwfileinfo(1). Since SiLK 3.12.0.

--print-filenames

Print to the standard error the names of input files as they are opened.

--copy-input=PATH

Copy all binary SiLK Flow records read as input to the specified file or named pipe. PATH may be stdout or - to write flows to the standard output as long as no Bag file is being written there.

--ipv6-policy=POLICY

Determine how IPv4 and IPv6 flows are handled when SiLK has been compiled with IPv6 support. When the switch is not provided, the SILK_IPV6_POLICY environment variable is checked for a policy. If it is also unset or contains an invalid policy, the POLICY is mix. When SiLK has not been compiled with IPv6 support, IPv6 flows are always ignored, regardless of the value passed to this switch or in the SILK_IPV6_POLICY variable. The supported values for POLICY are:

ignore

Ignore any flow record marked as IPv6, regardless of the IP addresses it contains. Only IP addresses contained in IPv4 flow records will be added to the bag(s).

asv4

Convert IPv6 flow records that contain addresses in the ::ffff:0:0/96 prefix to IPv4 and ignore all other IPv6 flow records.

mix

Process the input as a mixture of IPv4 and IPv6 flow records. When creating a bag whose key is an IP address and the input contains IPv6 addresses outside of the ::ffff:0:0/96 prefix, this policy is equivalent to force; otherwise it is equivalent to asv4.

force

Convert IPv4 flow records to IPv6, mapping the IPv4 addresses into the ::ffff:0:0/96 prefix.

only

Process only flow records that are marked as IPv6. Only IP addresses contained in IPv6 flow records will be added to the bag(s).

Regardless of the IPv6 policy, when all IPv6 addresses in the bag are in the ::ffff:0:0/96 prefix, rwbag treats them as IPv4 addresses and writes an IPv4 bag. When any other IPv6 addresses are present in the bag, the IPv4 addresses in the bag are mapped into the ::ffff:0:0/96 prefix and rwbag writes an IPv6 bag.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwbag searches for the site configuration file in the locations specified in the FILES section.

--xargs
--xargs=FILENAME

Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwbag opens each named file in turn and reads records from it as if the filenames had been listed on the command line.

--help

Print the available options and exit.

--legacy-help

Print help, including legacy switches. See the LEGACY BAG CREATION SWITCHES section below for these switches.

--version

Print the version number and information about how SiLK was configured, then exit the application.

LEGACY BAG CREATION SWITCHES

The following switches are deprecated as of SiLK 3.12.0. These switches may be used in conjunction with the --bag-file switch.

--sip-flows=OUTPUTFILE

Equivalent to --bag-file=sIPv4,records,OUTPUTFILE. Count number of flows by unique source IP.

--sip-packets=OUTPUTFILE

Equivalent to --bag-file=sIPv4,sum-packets,OUTPUTFILE. Count number of packets by unique source IP.

--sip-bytes=OUTPUTFILE

Equivalent to --bag-file=sIPv4,sum-bytes,OUTPUTFILE. Count number of bytes by unique source IP.

--dip-flows=OUTPUTFILE

Equivalent to --bag-file=dIPv4,records,OUTPUTFILE. Count number of flows by unique destination IP.

--dip-packets=OUTPUTFILE

Equivalent to --bag-file=dIPv4,sum-packets,OUTPUTFILE. Count number of packets by unique destination IP.

--dip-bytes=OUTPUTFILE

Equivalent to --bag-file=dIPv4,sum-bytes,OUTPUTFILE. Count number of bytes by unique destination IP.

--sport-flows=OUTPUTFILE

Equivalent to --bag-file=sPort,records,OUTPUTFILE. Count number of flows by unique source port.

--sport-packets=OUTPUTFILE

Equivalent to --bag-file=sPort,sum-packets,OUTPUTFILE. Count number of packets by unique source port.

--sport-bytes=OUTPUTFILE

Equivalent to --bag-file=sPort,sum-bytes,OUTPUTFILE. Count number of bytes by unique source port.

--dport-flows=OUTPUTFILE

Equivalent to --bag-file=dPort,records,OUTPUTFILE. Count number of flows by unique destination port.

--dport-packets=OUTPUTFILE

Equivalent to --bag-file=dPort,sum-packets,OUTPUTFILE. Count number of packets by unique destination port.

--dport-bytes=OUTPUTFILE

Equivalent to --bag-file=dPort,sum-bytes,OUTPUTFILE. Count number of bytes by unique destination port.

--proto-flows=OUTPUTFILE

Equivalent to --bag-file=protocol,records,OUTPUTFILE. Count number of flows by unique protocol.

--proto-packets=OUTPUTFILE

Equivalent to --bag-file=protocol,sum-packets,OUTPUTFILE. Count number of packets by unique protocol.

--proto-bytes=OUTPUTFILE

Equivalent to --bag-file=protocol,sum-bytes,OUTPUTFILE. Count number of bytes by unique protocol.

--sensor-flows=OUTPUTFILE

Equivalent to --bag-file=sensor,records,OUTPUTFILE. Count number of flows by unique sensor ID.

--sensor-packets=OUTPUTFILE

Equivalent to --bag-file=sensor,sum-packets,OUTPUTFILE. Count number of packets by unique sensor ID.

--sensor-bytes=OUTPUTFILE

Equivalent to --bag-file=sensor,sum-bytes,OUTPUTFILE. Count number of bytes by unique sensor ID.

--input-flows=OUTPUTFILE

Equivalent to --bag-file=input,records,OUTPUTFILE. Count number of flows by unique input interface index.

--input-packets=OUTPUTFILE

Equivalent to --bag-file=input,sum-packets,OUTPUTFILE. Count number of packets by unique input interface index.

--input-bytes=OUTPUTFILE

Equivalent to --bag-file=input,sum-bytes,OUTPUTFILE. Count number of bytes by unique input interface index.

--output-flows=OUTPUTFILE

Equivalent to --bag-file=output,records,OUTPUTFILE. Count number of flows by unique output interface index.

--output-packets=OUTPUTFILE

Equivalent to --bag-file=output,sum-packets,OUTPUTFILE. Count number of packets by unique output interface index.

--output-bytes=OUTPUTFILE

Equivalent to --bag-file=output,sum-bytes,OUTPUTFILE. Count number of bytes by unique output interface index.

--nhip-flows=OUTPUTFILE

Equivalent to --bag-file=nhIPv4,records,OUTPUTFILE. Count number of flows by unique next hop IP.

--nhip-packets=OUTPUTFILE

Equivalent to --bag-file=nhIPv4,sum-packets,OUTPUTFILE. Count number of packets by unique next hop IP.

--nhip-bytes=OUTPUTFILE

Equivalent to --bag-file=nhIPv4,sum-bytes,OUTPUTFILE. Count number of bytes by unique next hop IP.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

Bag of Protocol:Byte

Read the SiLK Flow file data.rw and create the Bag proto-byte.bag that contains the total byte-count seen for each protocol by using protocol as the key and sum-bytes as the counter:

 $ rwbag --bag-file=protocol,sum-bytes,proto-byte.bag data.rw

Use rwbagcat(1) to view the result:

 $ rwbagcat proto-byte.bag  
          1|            10695328|  
          6|        120536195111|  
         17|            24500079|

Specify the output path as - to pass the Bag file from rwbag directly into rwbagcat.

 $ rwbag --bag-file=protocol,sum-bytes,- data.rw    \  
   | rwbagcat  
          1|            10695328|  
          6|        120536195111|  
         17|            24500079|

Compare that to this rwuniq(1) command.

 $ rwuniq --field=protocol --value=bytes --sort-output data.rw  
 pro|               Bytes|  
   1|            10695328|  
   6|        120536195111|  
  17|            24500079|

One advantage of Bag files over rwuniq is that the data remains in binary form where it can be manipulated by rwbagtool(1).

Two Bags in a Single Pass

Read records from rwfilter(1) and build Bag files sip-flow.bag and dip-flow.bag that count the number of flows seen for each source address and for each destination address, respectively.

 $ rwfilter ... --pass=stdout                       \  
   | rwbag --bag-file=sipv4,records,sip-flow.bag    \  
        --bag-file=dipv4,records,dip-flow.bag

Using a Network Prefix

To create sip16-byte.bag that contains the number of bytes seen for each /16 found in the source address field, use the rwnetmask(1) tool prior to feeding the input to rwbag:

 $ rwfilter ... --pass=stdout                       \  
   | rwnetmask --4sip-prefix-length=16              \  
   | rwbag --bag-file=sipv4,sum-bytes,sip16-byte.bag

 $ rwbagcat sip16-byte.bag | head -4  
        10.4.0.0|               18260|  
        10.5.0.0|              536169|  
        10.9.0.0|               55386|  
       10.11.0.0|             5110438|

To print the IP addresses of an existing Bag into /16 prefixes, use the --network-structure switch of rwbagcat(1).

 $ rwfilter ... --pass=stdout                   \  
   | rwbag --bag-file=sipv4,sum-bytes,-         \  
   | rwbagcat --network-structure=B             \  
   | head -4  
        10.4.0.0/16|               18260|  
        10.5.0.0/16|              536169|  
        10.9.0.0/16|               55386|  
       10.11.0.0/16|             5110438|

Bag of Country Codes

As of SiLK 3.12.0, a Bag file may contain a country code as its key. Create scc-pkt.bag that sums the packet count by country.

 $ rwbag --bag-file=sip-country,sum-packets,scc-pkt.bag  
 $ rwbagcat scc-pkt.bag  
 --|                 840|  
 a1|                 284|  
 a2|                   1|  
 ae|                   8|

Bag of Prefix Map Values

rwbag and rwbagbuild(1) can use a prefix map file as the key in a Bag file as of SiLK 3.12.0. For example, to lookup each source address in the prefix map file ip-map.pmap that maps from address to ”type of service”, use the --pmap-file switch to specify the prefix map file, and specify the Bag’s key as sip-pmap:MAPNAME, where MAPNAME is either the map-name stored in the prefix map file or a name that is provided as part of the --pmap-file argument. (A prefix map’s map-name is available via the rwfileinfo(1) command.)

 $ rwfileinfo --field=prefix-map ip-map.pmap  
 ip-map.pmap:  
   prefix-map          v1: service-host  
 $  
 $ rwbag --pmap-file=ip-map.pmap                            \  
        --bag-file=sip-pmap:service-host,bytes,srvhost.bag  \  
        data.rw

Multiple --pmap-file switches may be specified which may be useful when generating multiple Bag files in a single invocation. On the command line, the --pmap-file switch that defines the map-name must preceded the --bag-file where the map-name is used.

The prefix map file is not stored as part of the Bag, so you must provide the name of the prefix map when running rwbagcat.

 $ rwbagcat srvhost.bag  
 rwbagcat: The --pmap-file switch is required for \  
         Bags containing sip-pmap keys  
 $ rwbagcat --pmap-file=ip-map.pmap srvhost.bag  
          external|         59950837766|  
          internal|         60602999159|  
               ntp|              588316|  
               dns|            14404581|  
              dhcp|             2560696|

rwbag also has support for prefix map files that map from a protocol-port pair to a label. The proto-port.pmap file does not have a map-name so a name must be provided on the rwbag command line.

 $ rwfileinfo --field=prefix-map proto-port.pmap  
 proto-port.pmap:  
 $  
 $ rwbag --pmap-file=srvport:proto-port.pmap                \  
        --bag-file=sip-pmap:srvport,flows,srvport.bag       \  
        data.rw  
 $ rwbagcat --pmap-file=proto-port.pmap srvport.bag | head -4  
      ICMP|               15622|  
       UDP|               62216|  
   UDP/DNS|               62216|  
  UDP/DHCP|               15614|

ENVIRONMENT

SILK_COUNTRY_CODES

This environment variable allows the user to specify the country code mapping file that rwbag uses when mapping an IP to a country for the sip-country and dip-country keys. The value may be a complete path or a file relative to the SILK_PATH. See the FILES section for standard locations of this file.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwbag may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwbag may use this environment variable. See the FILES section for details.

FILES

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

$SILK_COUNTRY_CODES
$SILK_PATH/share/silk/country_codes.pmap
$SILK_PATH/share/country_codes.pmap
/usr/local/share/silk/country_codes.pmap
/usr/local/share/country_codes.pmap

Possible locations for the country code mapping file required by the sip-country and dip-country keys.

SEE ALSO

rwbagbuild(1), rwbagcat(1), rwbagtool(1), rwfileinfo(1), rwfilter(1), rwnetmask(1), rwpmapbuild(1), rwuniq(1), ccfilter(3), sensor.conf(5), silk(7), zlib(3)

rwbagbuild

Create a binary Bag from non-flow data

SYNOPSIS

  rwbagbuild { --set-input=SETFILE | --bag-input=TEXTFILE }  
        [--delimiter=C] [--proto-port-delimiter=C]  
        [--default-count=DEFAULTCOUNT]  
        [--key-type=FIELD_TYPE] [--counter-type=FIELD_TYPE]  
        [{ --pmap-file=PATH | --pmap-file=MAPNAME:PATH }]  
        [--note-add=TEXT] [--note-file-add=FILE]  
        [--invocation-strip] [--compression-method=COMP_METHOD]  
        [--output-path=PATH]

  rwbagbuild --help

  rwbagbuild --version

DESCRIPTION

rwbagbuild builds a binary Bag file from an IPset file or from textual input. A Bag is a set where each key is associated with a counter. Usually the key is some aspect of a flow record (an IP address, a port, the protocol, et cetera), and the counter is a volume (such as the number of flow records or the sum or bytes or packets) for the flow records that match that key.

Either --set-input or --bag-input must be provided to specify the type and the location of the input file. To read from the standard input, specify stdin or - as the argument to the switch.

SET INPUT

When creating a Bag from an IPset, the value associated with each IP address is the value specified by the --default-count switch or 1 if the switch is not provided.

If the --key-type is sip-country, dip-country, or any-country, each IP address is mapped to its country code using the country code mapping file (see FILES) and that value is stored in the Bag file.

If the --key-type is sip-pmap, dip-pmap, or any-ip-pmap, each IP address is mapped to a value found in the prefix map file specified in --pmap-file and that value is stored in the Bag file.

BAG (TEXTUAL) INPUT

The textual input read from the argument to the --bag-input switch is processed a line at a time. Comments begin with a ’#’-character and continue to the end of the line; they are stripped from each line. Any line that is blank or contains only whitespace is ignored. All other lines must contain a valid key or key-counter pair; whitespace around the key and counter is ignored.

The key is typically a 32-bit integer, an IP address, a CIDR block, or a SiLK IPWildcard. When the --key-type is sport-pmap, dport-pmap, or any-port-pmap, the key is comprised of two numbers: a protocol (8-bit number) and a port (16-bit number). The delimiter separating the protocol and port may be set by --proto-port-delimiter. If not explicitly set, it is the same as the delimiter specified to --delimiter. The default delimiter is ’|’.

An IP address or integer key must be expressed in one of the following formats. rwbagbuild complains if the key field contains a mixture of IPv6 addresses and integer values.

A line may contain only a key or it may contain a key and counter. If the delimiter character is not present on a line, the line must contain only a key. If the delimiter is present, the line must contain key before the delimiter and an integer counter after the delimiter. These lines may have a delimiter after the counter; this delimiter and any text following it are ignored.

When the --default-count switch is specified, its value is used as the count for each key, and any counter value present on the line is ignored. Otherwise, the parsed count is used, or 1 is used as the counter if no delimiter was present.

For each key-count pair, the key is inserted into Bag with its count or, if the key is already present in the Bag, its total count is incremented by the count from this line. When using the --default-count switch, the count for a key that appears in the input N times is the product of N and DEFAULTCOUNT.

rwbagbuild prints an error and exits when a key or counter cannot be parsed or when a line contains a delimiter character after the key but has no count,

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

The following two switches control the type of input; one and only one must be provided:

--set-input=SETFILE

Create a Bag from an IPset. SETFILE is a filename, a named pipe, or the keyword stdin or - to read the IPset from the standard input. Counts have a volume of 1 when the --default-count switch is not specified. (IPsets are typically created by rwset(1) or rwsetbuild(1).)

--bag-input=TEXTFILE

Create a Bag from a delimited text file. TEXTFILE is a filename, a named pipe, or the keyword stdin or - to read the text from the standard input. See the DESCRIPTION section for the syntax of the TEXTFILE.

--delimiter=C

Expect the character C between each key-counter pair in the TEXTFILE read by the --bag-input switch. The default delimiter is the vertical pipe (’|’). The delimiter is ignored if the --set-input switch is specified. When the delimiter is a whitespace character, any amount of whitespace may surround and separate the key and counter. Since ’#’ is used to denote comments and newline is used to denote records, neither is a valid delimiter character.

--proto-port-delimiter=C

Expect the character C between the protocol and port that comprise a key when the --key-type is sport-pmap, dport-pmap, or any-port-pmap. Unless this switch is specified, rwbagbuild expects the key-counter delimiter to appear between the protocol and port.

--default-count=DEFAULTCOUNT

Override the counts of all values in the input text or IPset with the value of DEFAULTCOUNT. DEFAULTCOUNT must be a positive integer.

--key-type=FIELD_TYPE

Write a entry into the header of the Bag file that specifies the key contains FIELD_TYPE values. When this switch is not specified, the key type of the Bag is set to custom. The FIELD_TYPE is case insensitive. The supported FIELD_TYPEs are:

sIPv4

source IP address, IPv4 only

dIPv4

destination IP address, IPv4 only

sPort

source port

dPort

destination port

protocol

IP protocol

packets

packets, see also sum-packets

bytes

bytes, see also sum-bytes

flags

bitwise OR of TCP flags

sTime

starting time of the flow record, seconds resolution

duration

duration of the flow record, seconds resolution

eTime

ending time of the flow record, seconds resolution

sensor

sensor ID

input

SNMP input

output

SNMP output

nhIPv4

next hop IP address, IPv4 only

initialFlags

TCP flags on first packet in the flow

sessionFlags

bitwise OR of TCP flags on all packets in the flow except the first

attributes

flow attributes set by the flow generator

application

guess as to the content of the flow, as set by the flow generator

class

class of the sensor

type

type of the sensor

icmpTypeCode

an encoded version of the ICMP type and code, where the type is in the upper byte and the code is in the lower byte

sIPv6

source IP, IPv6

dIPv6

destination IP, IPv6

nhIPv6

next hop IP, IPv6

records

count of flows

sum-packets

sum of packet counts

sum-bytes

sum of byte counts

sum-duration

sum of duration values

any-IPv4

a generic IPv4 address

any-IPv6

a generic IPv6 address

any-port

a generic port

any-snmp

a generic SNMP value

any-time

a generic time value, in seconds resolution

sip-country

the country code of the source IP. Maps each IP address in the key column to a country code and stores the country code in the bag. Uses the mapping file specified by the SILK_COUNTRY_CODES environment variable or the country_codes.pmap mapping file, as described in FILES. (See also ccfilter(3).) The abbreviations are those used by the Root-Zone Whois Index (see for example http://www.iana.org/cctld/cctld-whois.htm) or the following special codes: -- N/A (e.g. private and experimental reserved addresses); a1 anonymous proxy; a2 satellite provider; o1 other Since SiLK 3.12.0.

dip-country

the country code of the destination IP. See sip-country. Since SiLK 3.12.0.

any-country

the country code of any IP address. See sip-country. Since SiLK 3.12.0.

sip-pmap

a prefix map value found from a source IP address. Maps each IP address in the key column to a value from a prefix map file and stores the value in the bag. The type of the prefix map must be IPv4-address or IPv4-address. Use the --pmap-file switch to specify the path to the file. Since SiLK 3.12.0.

dip-pmap

a prefix map value found from a destination IP address. See sip-pmap. Since SiLK 3.12.0.

any-ip-pmap:PMAP_PATH

a prefix map value found from any IP address. See sip-pmap. Since SiLK 3.12.0.

sport-pmap

a prefix map value found from a protocol/source-port pair. Each key must contain two values, a protocol and a port. Maps each protocol/port pair to a value from a prefix map file and stores the value in the bag. The type of the prefix map must be proto-port. Use the --pmap-file switch to specify the path to the file. Since SiLK 3.12.0.

dport-pmap

a prefix map value found from a protocol/destination-port pair. See sport-pmap. Since SiLK 3.12.0.

any-port-pmap

a prefix map value found from a protocol/port pair. See sport-pmap. Since SiLK 3.12.0.

custom

a number

--counter-type=FIELD_TYPE

Write a entry into the header of the Bag file that specifies the counter contains FIELD_TYPE values. When this switch is not specified, the counter type of the Bag is set to custom. Although the supported FIELD_TYPEs are the same as those for the key, the value is always treated as a number that can be summed. rwbagbuild does not use the country code or prefix map when parsing the value field.

--pmap-file=PATH
--pmap-file=MAPNAME:PATH

When the key-type is one of sip-pmap, dip-pmap, any-ip-pmap, sport-pmap, dport-pmap, or any-port-pmap, use the prefix map file located at PATH to map the key to a string. Specify PATH as - or stdin to read from the standard input. A map-name may be included in the argument to the switch, but rwbagbuild currently does not use the map-name. To create a prefix map file, use rwpmapbuild(1). Since SiLK 3.12.0.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--invocation-strip

Do not record the command used to create the Bag file in the output. When this switch is not given, the invocation is written to the file’s header, and the invocation may be viewed with rwfileinfo(1). Since SiLK 3.12.0.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--output-path=PATH

Write the binary Bag output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwbagtool exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwbagtool to exit with an error.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

Create a bag with IP addresses as keys from a text file

Assume the file mybag.txt contains the following lines, where each line contains an IP address, a comma as a delimiter, a count, and ends with a newline.

 192.168.0.1,5  
 192.168.0.2,500  
 192.168.0.3,3  
 192.168.0.4,14  
 192.168.0.5,5

To build a bag with it:

 $ rwbagbuild --bag-input=mybag.txt --delimiter=, > mybag.bag

Use rwbagcat(1) to view its contents:

 $ rwbagcat mybag.bag  
     192.168.0.1|                   5|  
     192.168.0.2|                 500|  
     192.168.0.3|                   3|  
     192.168.0.4|                  14|  
     192.168.0.5|                   5|

Create a bag with protocols as keys from a text file

To create a Bag of protocol data from the text file myproto.txt:

   1|      4|  
   6|    138|  
  17|    131|

use

 $ rwbagbuild --key-type=proto --bag-input=myproto.txt > myproto.bag  
 $ rwbagcat myproto.bag  
          1|                   4|  
          6|                 138|  
         17|                 131|

When the --key-type switch is specified, rwbagcat knows the keys should be printed as integers, and rwfileinfo(1) shows the type of the key:

 $ rwfileinfo --fields=bag myproto.bag  
 myproto.bag:  
   bag            key: protocol @ 4 octets; counter: custom @ 8 octets

Without the --key-type switch, rwbagbuild assumes the integers in myproto.txt represent IP addresses:

 $ rwbagbuild --bag-input=myproto.txt | rwbagcat  
         0.0.0.1|                   4|  
         0.0.0.6|                 138|  
        0.0.0.17|                 131|

Although the --integer-keys switch on rwbagcat forces it to print keys as integers, it is generally better to use the --key-type switch when creating the bag.

$ rwbagbuild --bag-input=myproto.txt | rwbagcat --integer-keys 1| 4| 6| 138| 17| 131|

Create a bag and override the existing counter

To ignore the counts that exist in myproto.txt and set the counts for each protocol to 1, use the --default-count switch which overrides the existing value:

 $ rwbagbuild --key-type=protocol --bag-input=myproto.txt  \  
        --default-count=1 --output-path=myproto1.bag  
 $ rwbagcat myproto1.bag  
          1|                   1|  
          6|                   1|  
         17|                   1|

Create a bag from multiple text files

To create a bag from multiple text files (X.txt, Y.txt, and Z.txt), use the UNIX cat(1) utility to concatenate the files and have rwbagbuild read the combined input. To avoid creating a temporary file, feed the output of cat as the standard input to rwbagbuild.

 $ cat X.txt Y.txt Z.txt                                \  
   | rwbagbuild --bag-input=- --output-path=xyz.bag

For each key that appears in multiple input files, rwbagbuild sums the counters for the key.

Create a bag with IP addresses as keys from an IPset file

Given the IP set myset.set, create a bag where every entry in the bag has a count of 3:

 $ rwbagbuild --set-input=myset.set --default-count=3  \  
        --out=mybag2.bag

Create a bag from multiple IPset files

Suppose we have three IPset files, A.set, B.set, and C.set:

 $ rwsetcat A.set  
 10.0.0.1  
 10.0.0.2  
 $ rwsetcat B.set  
 10.0.0.2  
 10.0.0.3  
 $ rwsetcat C.set  
 10.0.0.1  
 10.0.0.2  
 10.0.0.4

We want to create a bag file from these IPset files where the count for each IP address is the number of files that IP appears in. rwbagbuild accepts a single file as an argument, so we cannot do the following:

 $ rwbagbuild --set-input=A.set --set-input=B.set ...   # WRONG!

(Even if we could repeat the --set-input switch, specifying it multiple times would be annoying if we had 300 files instead of only 3.)

Since IPset files are (mathematical) sets, joining them together first with rwsettool(1) and then running rwbagbuild causes each IP address to get a count of 1:

 $ rwsettool --union A.set B.set C.set   \  
   | rwbagbuild --set-input=-            \  
   | rwbagcat  
        10.0.0.1|                   1|  
        10.0.0.2|                   1|  
        10.0.0.3|                   1|  
        10.0.0.4|                   1|

When rwbagbuild is processing textual input, it sums the counters for keys that appear in the input multiple times. We can use rwsetcat(1) to convert each IPset file to text and feed that as single textual stream to rwbagbuild. Use the --cidr-blocks switch on rwsetcat to reduce the amount of input that rwbagbuild must process. This is probably the best approach to the problem:

 $ rwsetcat --cidr-block *.set | rwbagbuild --bag-input=- > total1.bag  
 $ rwbagcat total1.bag  
        10.0.0.1|                   2|  
        10.0.0.2|                   3|  
        10.0.0.3|                   1|  
        10.0.0.4|                   1|

A less efficient solution is to convert each IPset to a bag and then use rwbagtool(1) to add the bags together:

 $ for i in *.set ; do  
        rwbagbuild --set-input=$i --output-file=/tmp/$i.bag ;  
   done  
 $ rwbagtool --add /tmp/*.set.bag > total2.bag  
 $ rm /tmp/*.set.bag

There is no need to create a bag file for each IPset; we can get by with only two bag files, the final bag file, total3.bag, and a temporary file, tmp.bag. We initialize total3.bag to an empty bag. As we loop over each IPset, rwbagbuild converts the IPset to a bag on its standard output, rwbagtool creates tmp.bag by adding its standard input to total3.bag, and we rename tmp.bag to total3.bag:

 $ rwbagbuild --bag-input=/dev/null --output-file=total3.bag  
 $ for i in *.set ; do  
        rwbagbuild --set-input=$i  \  
        | rwbagtool --output-file=tmp.bag --add total3.bag stdin ;  
        /bin/mv tmp.bag total3.bag ;  
   done  
 $ rwbagcat total3.bag  
        10.0.0.1|                   2|  
        10.0.0.2|                   3|  
        10.0.0.3|                   1|  
        10.0.0.4|                   1|

Create a bag where the key is the country code

As of SiLK 3.12.0, a Bag file may contain a country code as its key. In rwbagbuild, specify the --key-type as sip-country, dip-country, or any-country. That key-type works with either textual input or IPset input. The form of the textual input when mapping an IP address to a country code is identical to that when building an ordinary bag.

 $ rwbagbuild --bag-input=mybag.txt --delimiter=,       \  
        --key-type=any-country --output-file=scc1.bag  
 $ rwbagcat scc1.bag  
 --|                 527|

 $ rwbagbuild --set-input=A.set --key-type=any-country  \  
        --output-file=scc2.bag  
 $ rwbagcat scc2.bag  
 --|                   2|

Create a bag using a prefix map value as the key

rwbagbuild and rwbag(1) can use a prefix map file as the key in a Bag file as of SiLK 3.12.0. Use the --pmap-file switch to specify the prefix map file, and specify the --key-type using one of the types that end in -pmap.

For a prefix map that maps by IP addresses, use a key-type of sip-pmap, dip-pmap, or any-ip-pmap. The input may be an IPset or text. The form of the textual input is the same as for a normal bag file.

 $ rwbagbuild --set-input=A.set --key-type=sip-pmap     \  
        --pmap-file=ip-map.pmap --output=test1.bag

 $ rwbagbuild --bag-input=mybag.txt --delimiter=,       \  
        --key-type=sip-pmap --pmap-file=ip-map.pmap     \  
        --output-file=test2.bag

The prefix map file is not stored as part of the Bag, so you must provide the name of the prefix map when running rwbagcat(1).

 $ rwbagcat --pmap-file=ip-map.pmap test2.bag  
          internal|                 527|

For a prefix map file that maps by protocol-port pairs, the textual input must contain either three column (protocol, port, counter) or two columns (protocol and port) which uses the --default-counter.

 $ cat proto-port-count.txt  
 6| 25|  800|  
 6| 80| 5642|  
 6| 22  
 $ rwbagbuild --key-type=sport-pmap                 \  
        --bag-input=proto-port-count.txt            \  
        --pmap-file=proto-port-map.pmap             \  
        --output-path=service.bag  
 $ rwbagcat --pmap-file=port-map.pmap service.bag  
   TCP/SSH|                   1|  
  TCP/SMTP|                 800|  
  TCP/HTTP|                5642|

ENVIRONMENT

SILK_COUNTRY_CODES

This environment variable allows the user to specify the country code mapping file that rwbagbuild uses when mapping an IP to a country for the sip-country, dip-country, or any-country keys. The value may be a complete path or a file relative to the SILK_PATH. See the FILES section for standard locations of this file.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.

SILK_PATH

This environment variable gives the root of the install tree. When searching for the country code mapping file, rwbagbuild may use this environment variable. See the FILES section for details.

FILES

$SILK_COUNTRY_CODES
$SILK_PATH/share/silk/country_codes.pmap
$SILK_PATH/share/country_codes.pmap
/usr/local/share/silk/country_codes.pmap
/usr/local/share/country_codes.pmap

Possible locations for the country code mapping file required by the sip-country, dip-country, and any-country key-types.

SEE ALSO

rwbag(1), rwbagcat(1), rwbagtool(1), rwfileinfo(1), rwpmapbuild(1), rwset(1), rwsetbuild(1), rwsetcat(1), rwsettool(1), silk(7), ccfilter(3), zlib(3)

BUGS

The --default-count switch is poorly named.

rwbagcat

Output a binary Bag file as text

SYNOPSIS

  rwbagcat [ --network-structure[=STRUCTURE] | --bin-ips[=SCALE]  
             | --sort-counters[=ORDER]]  
        [--print-statistics[=OUTFILE]]  
        [--minkey=VALUE] [--maxkey=VALUE] [--mask-set=PATH]  
        [--mincounter=VALUE] [--maxcounter=VALUE] [--zero-counts]  
        [{ --pmap-file=PATH | --pmap-file=MAPNAME:PATH }]  
        [--key-format=FORMAT] [--integer-keys] [--zero-pad-ips]  
        [--no-columns] [--column-separator=C]  
        [--no-final-delimiter] [{--delimited | --delimited=C}]  
        [--output-path=PATH] [--pager=PAGER_PROG]  
        [--site-config-file=FILENAME]  
        [BAGFILE [BAGFILE...]]

  rwbagcat --help

  rwbagcat --version

DESCRIPTION

rwbagcat reads a binary Bag as created by rwbag(1) or rwbagbuild(1), converts it to text, and writes it to the standard output, to the pager, or to the specified output file. It can also print various statistics and summary information about the Bag.

As of SiLK 3.12.0, rwbagcat uses information in the Bag file’s header to determine how to display the key column.

In addition, rwbagcat exits with an error when asked to use an IP format to display keys that are not IP addresses.

rwbagcat reads the BAGFILEs specified on the command line; if no BAGFILE arguments are given, rwbagcat attempts to read the Bag from the standard input. BAGFILE may be the keyword stdin or a hyphen (-) to allow rwbagcat to print data from both files and piped input. If any input does not contain a Bag, rwbagcat prints an error to the standard error and exits abnormally.

When multiple BAGFILEs are specified on the command line, each is handled individually. To process the files as a single Bag, use rwbagtool(1) to combine the bags and pipe the output of rwbagtool into rwbagcat.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--network-structure
--network-structure=STRUCTURE

For each numeric value in STRUCTURE, group the IPs in the Bag into a netblock of that size and print the number of hosts, the sum of the counters, and, optionally, print the number of smaller, occupied netblocks that each larger netblock contains. When STRUCTURE begins with v6:, the IPs in the Bag are treated as IPv6 addresses, and any IPv4 addresses are mapped into the ::ffff:0:0/96 netblock. Otherwise, the IPs are treated as IPv4 addresses, and any IPv6 address outside the ::ffff:0:0/96 netblock is ignored. Aside from the initial v6: (or v4:, for consistency), STRUCTURE has one of following forms:

  1. NETBLOCK_LIST/SUMMARY_LIST. Group IPs into the sizes specified in either NETBLOCK_LIST or SUMMARY_LIST. rwbagcat prints a row for each occupied netblock specified in NETBLOCK_LIST, where the row lists the base IP of the netblock, the sum of the counters for that netblock, the number of hosts, and the number of smaller, occupied netblocks having a size that appears in either NETBLOCK_LIST or SUMMARY_LIST. (The values in SUMMARY_LIST are only summarized; they are not printed.)

  2. NETBLOCK_LIST/. Similar to the first form, except all occupied netblocks are printed, and there are no netblocks that are only summarized.

  3. NETBLOCK_LISTS. When the character S appears anywhere in the NETBLOCK_LIST, rwbagcat provides a default value for the SUMMARY_LIST. That default is 8,16,24,27 for IPv4, and 48,64 for IPv6.

  4. NETBLOCK_LIST. When neither S nor / appear in STRUCTURE, the output does not include the number of smaller, occupied netblocks.

  5. Empty. When STRUCTURE is empty or only contains v6: or v4:, the NETBLOCK_LIST prints a single row for the total network (the /0 netblock) giving the number of hosts, the sum of the counters, and the number of smaller, occupied netblocks using the same default list specified in form 3.

NETBLOCK_LIST and SUMMARY_LIST contain a comma separated list of numbers between 0 (the total network) and the size for an individual host (32 for IPv4 or 128 for IPv6). The characters T and H may be used as aliases for 0 and the host netblock, respectively. In addition, when parsing the lists as IPv4 netblocks, the characters A, B, C, and X are supported as aliases for 8, 16, 24, and 27, respectively. A comma is not required between adjacent letters. The --network-structure switch disables printing of the IPs in the Bag file; specify the H argument to the switch to print each individual IP address and its counter.

The --network-structure switch may not be combined with the --bin-ips or --sort-counters switches. As of SiLK 3.12.0, rwbagcat exits with an error if the --network-structure switch is used on a Bag file whose key-type is neither custom nor an IP address type.

--bin-ips
--bin-ips=SCALE

Invert the bag and count the total number of unique keys for a given value of the volume bin. For example, turn a Bag {sip:flow} into {flow:count(sip)}. SCALE is a string containing the value linear, binary, or decimal.

The --bin-ips switch may not be combined with the --network-structure or --sort-counters switches.

--sort-counters
--sort-counters=ORDER

Sort the output so the counters are presented in either decreasing or increasing order. Typically the output is sorted by the keys. If the ORDER argument is not given to the switch, the counters are printed in decreasing order. Valid values for ORDER are

decreasing

Print the maximum counter first. This is the default.

increasing

Print the minimum counter first.

When two counters have the same value, the smaller key is displayed first. The --sort-counters switch may not be combined with the --network-structure or --bin-ips switches. Since SiLK 3.12.2.

--print-statistics
--print-statistics=OUTFILE

Print a breakdown of the network hosts seen, and print general statistics about the keys and counters. When --print-statistics is specified, no other output is produced unless one of --sort-counters, --network-structure, or --bin-ips is also specified. When the OUTFILE argument is not given, the statistics are written to the standard output or to the pager if output is to a terminal. OUTFILE is a filename, named pipe, the keyword stderr to write to the standard error, or the keyword stdout or - to write to the standard output. If OUTFILE names an existing file, rwbagcat exits with an error unless the SILK_CLOBBER environment variable is set, in which case OUTFILE is overwritten. The output statistics produced by this switch are:

--minkey=VALUE

Output records whose key value is at least VALUE. VALUE may be an IP address or an integer in the range 0 to 4294967295 inclusive. The default is to print all records with a non-zero counter.

--maxkey=VALUE

Output records whose key value is not more than VALUE. VALUE may be an IP address or an integer in the range 0 to 4294967295 inclusive. The default is to print all records with a non-zero counter.

--mask-set=PATH

Output records whose key appears in the binary IPset read from the file PATH. (To build an IPset, use rwset(1) or rwsetbuild(1).) When used with --minkey and/or --maxkey, output records whose key is in the IPset and is also within when the specified range. As of SiLK 3.12.0, rwbagcat exits with an error if the --mask-set switch is used on a Bag file whose key-type is neither custom nor an IP address type.

--mincounter=VALUE

Output records whose counter value is at least VALUE. VALUE is an integer in the range 1 to 18446744073709551615. The default is to print all records with a non-zero counter; use --zero-counts to show records whose counter is 0.

--maxcounter=VALUE

Output records whose counter value is not more than VALUE. VALUE is an integer in the range 1 to 18446744073709551615, with the default being the maximum counter value.

--zero-counts

Print keys whose counter is zero. Normally, keys with a counter of zero are suppressed since all keys have a default counter of zero. In order to use this flag, either --mask-set or both --minkey and --maxkey must be specified. When this switch is specified, any counter limit explicitly set by the --maxcounter switch is also applied.

--pmap-file=PATH
--pmap-file=MAPNAME:PATH

Use the prefix map file located at PATH to map the key to a string when the type of the Bag’s key is one of sip-pmap, dip-pmap, any-ip-pmap, sport-pmap, dport-pmap, or any-port-pmap. This switch is required for Bag files whose key was derived from a prefix map file. The type of the prefix map file must match the key’s type, but a different prefix map file may be used. Specify PATH as - or stdin to read from the standard input. A map-name may be included in the argument to the switch, but rwbagcat currently does not use the map-name. To create a prefix map file, use rwpmapbuild(1). Since SiLK 3.12.0.

--key-format=FORMAT

Specify the format to use when printing the keys. When this switch is not specified, a Bag whose keys are known not to be IP addresses are printed as decimal numbers, and the keys for all other Bags are printed as IP addresses in the canonical format. The FORMAT is one of:

canonical

Print keys as IP addresses in the canonical format: dotted quad for IPv4 (127.0.0.1) and hexadectet for IPv6 (2001:db8::1). Note that IPv6 addresses in ::ffff:0:0/96 and some IPv6 addresses in ::/96 will be printed as a mixture of IPv6 and IPv4. As of SiLK 3.12.0, rwbagcat exits with an error when this format is used on a Bag file whose key-type is neither custom nor an IP address type.

zero-padded

Print keys as IP addresses in their canonical form, but add zeros to the output so it fully fills the width of column. The addresses 127.0.0.1 and 2001:db8::1 are printed as 127.000.000.001 and 2001:0db8:0000:0000:0000:0000:0000:0001, respectively. As of SiLK 3.12.0, rwbagcat exits with an error when this format is used on a Bag file whose key-type is neither custom nor an IP address type.

decimal

Print keys as integers in decimal format. The addresses 127.0.0.1 and 2001:db8::1 are printed as 2130706433 and 42540766411282592856903984951653826561, respectively.

hexadecimal

Print keys as integers in hexadecimal format. The addresses 127.0.0.1 and 2001:db8::1 are printed as 7f000001 and 20010db8000000000000000000000001, respectively.

force-ipv6

Print all keys as IP addresses in the canonical form for IPv6 without using any IPv4 notation. Any integer key or IPv4 address is mapped into the ::ffff:0:0/96 netblock. The addresses 127.0.0.1 and 2001:db8::1 are printed as ::ffff:7f00:1 and 2001:db8::1, respectively. As of SiLK 3.12.0, rwbagcat exits with an error when this format is used on a Bag file whose key-type is neither custom nor an IP address type.

timestamp

Print keys as time in standard SiLK format: yyyy/mm/ddThh:mm:ss. May be combined with utc or localtime. May only be used on keys whose type is custom or a time value. Since SiLK 3.12.0.

iso-time

Print keys as time in the ISO time format yyyy-mm-dd hh:mm:ss. May be combined with utc or localtime. May only be used on keys whose type is custom or a time value. Since SiLK 3.12.0.

m/d/y

Print keys as time in the format mm/dd/yyyy hh:mm:ss. May be combined with utc or localtime. May only be used on keys whose type is custom or a time value. Since SiLK 3.12.0.

utc

Print the keys as time in UTC. If no other time-related key-format is provided, formats the time using the timestamp format. May only be used on keys whose type is custom or a time value. Since SiLK 3.12.0.

localtime

Print as the keys as time and get the timezone from either the TZ environment variable or local machine. If no other time-related key-format is provided, formats the time using the timestamp format. May only be used on keys whose type is custom or a time value. Since SiLK 3.12.0.

epoch

Print keys as seconds since UNIX epoch. May only be used on keys whose type is custom or a time value. Since SiLK 3.12.0.

--integer-keys

This switch is equivalent to --key-format=decimal, it is deprecated as of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.

--zero-pad-ips

This switch is equivalent to --key-format=zero-padded, it is deprecated as of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.

--no-columns

Disable fixed-width columnar output.

--column-separator=C

Use specified character between columns and after the final column. When this switch is not specified, the default of ’|’ is used.

--no-final-delimiter

Do not print the column separator after the final column. Normally a delimiter is printed. When the network summary is requested (--network-structure=S), the separator is always printed before the summary column and never after that column.

--delimited
--delimited=C

Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default ’|’.

--output-path=PATH

Write the textual output of the --network-structure, --bin-ips, or --sort-counters switch to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output (and bypass the paging program). If PATH names an existing file, rwbagcat exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this option is not given, the output is either sent to the pager or written to the standard output.

--pager=PAGER_PROG

When output is to a terminal, invoke the program PAGER_PROG to view the output one screen full at a time. This switch overrides the SILK_PAGER environment variable, which in turn overrides the PAGER variable. If the --output-path switch is given or if the value of the pager is determined to be the empty string, no paging is performed and all output is written to the terminal.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwbagcat searches for the site configuration file in the locations specified in the FILES section. Since SiLK 3.15.0.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line.

Printing a bag

To print the contents of the bag file mybag.bag:

 $ rwbagcat mybag.bag  
      172.23.1.1|              5|  
      172.23.1.2|            231|  
      172.23.1.3|              9|  
      172.23.1.4|             19|  
   192.168.0.100|              1|  
   192.168.0.101|              1|  
   192.168.0.160|             15|  
  192.168.20.161|              1|  
  192.168.20.162|              5|  
  192.168.20.163|              5|

Displaying number of hosts by network

To print the bag with a full network breakdown:

 $ rwbagcat --network-structure=TABCHX mybag.bag  
           172.23.1.1      |              5|  
           172.23.1.2      |            231|  
           172.23.1.3      |              9|  
           172.23.1.4      |             19|  
         172.23.1.0/27     |            264|  
       172.23.1.0/24       |            264|  
     172.23.0.0/16         |            264|  
   172.0.0.0/8             |            264|  
           192.168.0.100   |              1|  
           192.168.0.101   |              1|  
         192.168.0.96/27   |              2|  
           192.168.0.160   |             15|  
         192.168.0.160/27  |             15|  
       192.168.0.0/24      |             17|  
           192.168.20.161  |              1|  
           192.168.20.162  |              5|  
           192.168.20.163  |              5|  
         192.168.20.160/27 |             11|  
       192.168.20.0/24     |             11|  
     192.168.0.0/16        |             28|  
   192.0.0.0/8             |             28|  
 TOTAL                     |            292|

In the above, lines that include a CIDR prefix display the sum of the preceding hosts. For example, there are 264 hosts in the 172.23.1.0/27 net-block.

To show an abbreviated network structure by class A and C only, including summary information:

 $ rwbagcat --network-structure=ACS mybag.bag  
     172.23.1.0/24     |            264| 4 hosts in 1 /27  
 172.0.0.0/8           |            264| 4 hosts in 1 /16, 1 /24, and 1 /27  
     192.168.0.0/24    |             17| 3 hosts in 2 /27s  
     192.168.20.0/24   |             11| 3 hosts in 1 /27  
 192.0.0.0/8           |             28| 6 hosts in 1 /16, 2 /24s, and 3 /27s

Inverting a bag

To bin by number of unique IP addresses by volume:

 $ rwbagcat --bin-ips mybag.bag  
               1|              3|  
               5|              3|  
               9|              1|  
              15|              1|  
              19|              1|  
             231|              1|

This means there were 3 source hosts in the bag that had a single flow; 3 hosts that had 5 flows; and one host each that had 9, 15, 19, and 231 flows.

For a log2 breakdown of the counts:

 $ rwbagcat --bin-ips=binary mybag.bag  
    2^0 to 2^1-1|              3|  
    2^2 to 2^3-1|              3|  
    2^3 to 2^4-1|              2|  
    2^4 to 2^5-1|              1|  
    2^7 to 2^8-1|              1|

Sorting the bag by counter value

rwbagcat normally presents the data in order of increasing key value. To sorted based on the counter value, specify the --sort-counter switch. The default sort order is from maximum counter to minimum counter.

 $ rwbagcat --sort-counter mybag.bag  
      172.23.1.2|                 231|  
      172.23.1.4|                  19|  
   192.168.0.160|                  15|  
      172.23.1.3|                   9|  
      172.23.1.1|                   5|  
  192.168.20.162|                   5|  
  192.168.20.163|                   5|  
   192.168.0.100|                   1|  
   192.168.0.101|                   1|  
  192.168.20.161|                   1|

To change the sort order, specify the increasing argument to the --sort-counter switch:

 $ rwbagcat --sort-counter=increasing mybag.bag  
   192.168.0.100|                   1|  
   192.168.0.101|                   1|  
  192.168.20.161|                   1|  
      172.23.1.1|                   5|  
  192.168.20.162|                   5|  
  192.168.20.163|                   5|  
      172.23.1.3|                   9|  
   192.168.0.160|                  15|  
      172.23.1.4|                  19|  
      172.23.1.2|                 231|

The order of the keys is consistent for keys have the same counter value. The following output is limited to those keys whose value is 5. The output is first shown without the --sort-counter switch, then with the data sorted by increasing and decreasing counter value.

 $ rwbagcat --delim=, mybag.bag | grep ,5  
 172.23.1.1,5  
 192.168.20.162,5  
 192.168.20.163,5

 $ rwbagcat --delim=, --sort-counter=increasing mybag.bag | grep ,5  
 172.23.1.1,5  
 192.168.20.162,5  
 192.168.20.163,5

 $ rwbagcat --delim=, --sort-counter=decreasing mybag.bag | grep ,5  
 172.23.1.1,5  
 192.168.20.162,5  
 192.168.20.163,5

Displaying bags that use prefix map values as the key

rwbag(1) and rwbagbuild(1) can use a prefix map file as the key in a bag file as of SiLK 3.12.0. When attempting to display these Bag files, you must specify the --pmap-file switch on the rwbagcat command line can map the prefix map values to the labels. If the --pmap-file is not given, rwbagcat displays an error.

 $ rwbagcat service.bag  
 rwbagcat: The --pmap-file switch is required for \  
         Bags containing sport-pmap keys

In addition, the type of the prefix map file must match the key-type in the bag file: a prefix map type of IPv4-address or IPv6-address when the key was mapped from an IP address, and a prefix map type of proto-port when the key was mapped from a protocol-port pair. The type of key in a bag may be determined by rwfileinfo(1).

 $ rwfileinfo --fields=bag service.bag  
 service.bag:  
   bag          key: sport-pmap @ 4 octets; counter: custom @ 8 octets

 $ rwbagcat --pmap-file=ip-map.pmap service.bag  
 rwbagcat: Cannot use IPv4-address prefix map for \  
        Bag containing sport-pmap keys

 $ rwbagcat --pmap-file=port-map.pmap service.bag  
   TCP/SSH|                   1|  
  TCP/SMTP|                 800|  
  TCP/HTTP|                5642|

The only check is whether the prefix map file is the correct type. A different prefix map file could be used. If a value in the bag file does not have an index in the prefix map file, the numeric index of the label is displayed.

 $ echo ’label 1 none’                                      \  
   | rwpmapbuild --mode=proto-port --input-file=-           \  
        --output-file=tmp.pmap  
 $ rwbagcat --pmap-file=tmp.pmap service.bag  
   7|                   1|  
   8|                 800|  
   9|                5642|

Displaying statistics
 $ rwbagcat --print-statistics mybag.bag

 Statistics  
     number of keys:  10  
    sum of counters:  292  
        minimum key:  172.23.1.1  
        maximum key:  192.168.20.163  
    minimum counter:  1  
    maximum counter:  231  
               mean:  29.2  
           variance:  5064  
 standard deviation:  71.16  
               skew:  2.246  
           kurtosis:  8.1  
    nodes allocated:  0 (0 bytes)  
    counter density:  inf%

ENVIRONMENT

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_PAGER

When set to a non-empty string, rwbagcat automatically invokes this program to display its output a screen at a time. If set to an empty string, rwbagcat does not automatically page its output.

PAGER

When set and SILK_PAGER is not set, rwbagcat automatically invokes this program to display its output a screen at a time.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwbagcat may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwbagcat may use this environment variable. See the FILES section for details.

TZ

When the argument to the --key-format switch includes localtime or when a SiLK installation is built to use the local timezone, the value of the TZ environment variable determines the timezone in which rwbagcat displays timestamps. (If both of those are false, the TZ environment variable is ignored.) If the TZ environment variable is not set, the machine’s default timezone is used. Setting TZ to the empty string or 0 causes timestamps to be displayed in UTC. For system information on the TZ variable, see tzset(3) or environ(7). (To determine if SiLK was built with support for the local timezone, check the Timezone support value in the output of rwbagcat --version.)

FILES

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwbag(1), rwbagbuild(1), rwbagtool(1), rwpmapbuild(1), rwfileinfo(1), rwset(1), rwsetbuild(1), silk(7)

rwbagtool

Perform high-level operations on binary Bag files

SYNOPSIS

  rwbagtool { --add | --subtract | --minimize | --maximize  
              | --divide | --scalar-multiply=VALUE  
              | --compare={lt | le | eq | ge | gt} }  
        [--intersect=SETFILE | --complement-intersect=SETFILE]  
        [--mincounter=VALUE] [--maxcounter=VALUE]  
        [--minkey=VALUE] [--maxkey=VALUE]  
        [--invert] [--coverset] [--ipset-record-version=VERSION]  
        [--output-path=PATH]  
        [--note-strip] [--note-add=TEXT] [--note-file-add=FILE]  
        [--compression-method=COMP_METHOD]  
        [BAGFILE[ BAGFILE...]]

  rwbagtool --help

  rwbagtool --version

DESCRIPTION

rwbagtool performs various operations on Bags. It can add Bags together, subtract a subset of data from a Bag, perform key intersection of a Bag with an IP set, extract the key list of a Bag as an IP set, or filter Bag records based on their counter value.

BAGFILE is a the name of a file or a named pipe, or the names stdin or - to have rwbagtool read from the standard input. If no Bag file names are given on the command line, rwbagtool attempts to read a Bag from the standard input. If BAGFILE does not contain a Bag, rwbagtool prints an error to stderr and exits abnormally.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

Operation switches

The first set of options are mutually exclusive; only one may be specified. If none are specified, the counters in the Bag files are summed.

--add

Sum the counters for each key for all Bag files given on the command line. If a key does not exist, it has a counter of zero. If no other operation is specified, the add operation is the default.

--subtract

Subtract from the first Bag file all subsequent Bag files. If a key does not appear in the first Bag file, rwbagtool assumes it has a value of 0. If any counter subtraction results in a negative number, the key will not appear in the resulting Bag file.

--minimize

Cause the output to contain the minimum counter seen for each key. Keys that do not appear in all input Bags will not appear in the output.

--maximize

Cause the output to contain the maximum counter seen for each key. The output will contain each key that appears in any input Bag.

--divide

Divide the first Bag file by the second Bag file. It is an error if more than two Bag files are specified. Every key in the first Bag file must appear in the second file; the second Bag may have keys that do not appear in the first, and those keys will not appear in the output. Since Bags do not support floating point numbers, the result of the division is rounded to the nearest integer (values ending in .5 are rounded up). If the result of the division is less than 0.5, the key will not appear in the output.

--scalar-multiply=VALUE

Multiply each counter in the Bag file by the scalar VALUE, where VALUE is an integer in the range 1 to 18446744073709551615. This switch accepts a single Bag as input.

--compare=OPERATION

Compare the key/counter pairs in exactly two Bag files. It is an error if more than two Bag files are specified. The keys in the output Bag will only be those whose counter in the first Bag is OPERATION the counter in the second Bag. The counters for all keys in the output will be 1. Any key that does not appear in both input Bag files will not appear in the result. The possible OPERATION values are the strings:

lt

GetCounter(Bag1, key) < GetCounter(Bag2, key)

le

GetCounter(Bag1, key) <= GetCounter(Bag2, key)

eq

GetCounter(Bag1, key) == GetCounter(Bag2, key)

ge

GetCounter(Bag1, key) >= GetCounter(Bag2, key)

gt

GetCounter(Bag1, key) > GetCounter(Bag2, key)

Masking/Limiting switches

The result of the above operation is an intermediate Bag file. The following switches are applied next to remove entries from the intermediate Bag:

--intersect=SETFILE

Mask the keys in the intermediate Bag using the set in SETFILE. SETFILE is the name of a file or a named pipe containing an IPset, or the name stdin or - to have rwbagtool read the IPset from the standard input. If SETFILE does not contain an IPset, rwbagtool prints an error to stderr and exits abnormally. Only key/counter pairs where the key matches an entry in SETFILE are written to the output. (IPsets are typically created by rwset(1) or rwsetbuild(1).)

--complement-intersect=SETFILE

As --intersect, but only writes key/counter pairs for keys which do not match an entry in SETFILE.

--mincounter=VALUE

Cause the output to contain only those records whose counter value is VALUE or higher. The allowable range is 1 to the maximum counter value; the default is 1.

--maxcounter=VALUE

Cause the output to contain only those records whose counter value is VALUE or lower. The allowable range is 1 to the maximum counter value; the default is the maximum counter value.

--minkey=VALUE

Cause the output to contain only those records whose key value is VALUE or higher. Default is 0 (or 0.0.0.0). Accepts input as an integer or as an IP address in dotted decimal notation.

--maxkey=VALUE

Cause the output to contain only those records whose key value is VALUE or higher. Default is 4294967295 (or 255.255.255.255). Accepts input as an integer or as an IP address in dotted decimal notation.

Output switches

The following switches control the output.

--invert

Generate a new Bag whose keys are the counters in the intermediate Bag and whose counter is the number of times the counter was seen. For example, this turns the Bag {sip:flow} into the Bag {flow:count(sip)}. Any counter in the intermediate Bag that is larger than the maximum possible key will be attributed to the maximum key; to prevent this, specify --maxcounter=4294967295.

--coverset

Instead of creating a Bag file as the output, write an IPset which contains the keys contained in the intermediate Bag.

--ipset-record-version=VERSION

Specify the format of the IPset records that are written to the output when the --coverset switch is used. VERSION may be 2, 3, 4, 5 or the special value 0. When the switch is not provided, the SILK_IPSET_RECORD_VERSION environment variable is checked for a version. The default version is 0. Since SiLK 3.11.0.

 0 

Use the default version for an IPv4 IPset and an IPv6 IPset, currently 2 and 3, respectively.

 2 

Create a file that may hold only IPv4 adresses and is readable by all versions of SiLK.

 3 

Create a file that may hold IPv4 or IPv6 adresses and is readable by SiLK 3.0 and later.

 4 

Create a file that may hold IPv4 or IPv6 adresses and is readable by SiLK 3.7 and later. These files are more compact that version 3 and often more compact than version 2.

 5 

Create a file that may hold only IPv6 adresses and is readable by SiLK 3.14 and later. When this version is specified, IPsets containing only IPv4 addresses are written in version 4. These files are usually more compact that version 4.

--output-path=PATH

Write the resulting Bag to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwbagtool exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwbagtool to exit with an error.

--note-strip

Do not copy the notes (annotations) from the input files to the output file. Normally notes from the input files are copied to the output.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

The examples assume the following contents for the files:

 Bag1.bag    Bag2.bag    Bag3.bag    Bag4.bag    Mask.set  
  3|  10|     1|   1|     2|   8|     1|   1|          2  
  4|   7|     4|   2|     4|  10|     4|   3|          4  
  6|  14|     7|  32|     6|  14|     6|   4|          6  
  7|  23|     8|   2|     7|  12|     7|   4|          8  
  8|   2|                 9|   8|     8|   6|

Adding Bag files
 $ rwbagtool --add Bag1.bag Bag2.bag > Bag-sum.bag  
 $ rwbagcat --integer-keys Bag-sum.bag  
  1|   1|  
  3|  10|  
  4|   9|  
  6|  14|  
  7|  55|  
  8|   4|

 $ rwbagtool --add Bag1.bag Bag2.bag Bag3.bag > Bag-sum2.bag  
 $ rwbagcat --integer-keys Bag-sum2.bag  
  1|   1|  
  2|   8|  
  3|  10|  
  4|  19|  
  6|  28|  
  7|  67|  
  8|   4|  
  9|   8|

Subtracting Bag Files
 $ rwbagtool --sub Bag1.bag Bag2.bag > Bag-diff.bag  
 $ rwbagcat --integer-keys Bag-diff.bag  
  3|  10|  
  4|   5|  
  6|  14|

 $ rwbagtool --sub Bag2.bag Bag1.bag > Bag-diff2.bag  
 $ rwbagcat --integer-keys Bag-diff2.bag  
  1|   1|  
  7|   9|

Getting the Minimum Value
 $ rwbagtool --minimize Bag1.bag Bag2.bag Bag3.bag > Bag-min.bag  
 $ rwbagcat --integer-keys Bag-min.bag  
  4|   2|  
  7|  12|

Getting the Maximum Value
 $ rwbagtool --maximize Bag1.bag Bag2.bag Bag3.bag > Bag-max.bag  
 $ rwbagcat --integer-keys Bag-max.bag  
  1|   1|  
  2|   8|  
  3|  10|  
  4|  10|  
  6|  14|  
  7|  32|  
  8|   2|  
  9|   8|

Dividing Bag Files
 $ rwbagtool --divide Bag2.bag Bag4.bag > Bag-div1.bag  
 $ rwbagcat --integer-keys Bag-div1.bag  
   1|   1|  
   4|   1|  
   7|   8|

However, when the order is reversed:

 $ rwbagtool --divide Bag4.bag Bag2.bag > Bag-div2.bag  
 rwbagtool: Error dividing bags; key 6 not in divisor bag

To work around this issue, use the --coverset switch to create a copy of Bag4.bag that contains only the keys in Bag2.bag

 $ rwbagtool --coverset Bag2.bag > Bag2-keys.set  
 $ rwbagtool --intersect=Bag2-keys.set  Bag4.bag  > Bag4-small.bag  
 $ rwbagtool --divide Bag4-small.bag Bag2.bag > Bag-div2.bag  
 $ rwbagcat --integer-keys Bag-div2.bag  
   1|   1|  
   4|   2|  
   8|   3|

Or, in a single piped command without writing the IPset to disk:

 $ rwbagtool --coverset Bag2.bag                \  
   | rwbagtool --intersect=-  Bag4.bag          \  
   | rwbagtool --divide -  Bag2.bag             \  
   | rwbagcat --integer-keys  
   1|   1|  
   4|   2|  
   8|   3|

Scalar Multiplication
 $ rwbagtool --scalar-multiply=7 Bag1.bag > Bag-multiply.bag  
 $ rwbagcat --integer-keys Bag-multiply.bag  
  3|  70|  
  4|  49|  
  6|  98|  
  7| 161|  
  8|  14|

Comparing Bag Files
 $ rwbagtool --compare=lt Bag1.bag Bag2.bag > Bag-lt.bag  
 $ rwbagcat --integer-keys Bag-lt.bag  
  7|   1|

 $ rwbagtool --compare=le Bag1.bag Bag2.bag > Bag-le.bag  
 $ rwbagcat --integer-keys Bag-le.bag  
  7|   1|  
  8|   1|

 $ rwbagtool --compare=eq Bag1.bag Bag2.bag > Bag-eq.bag  
 $ rwbagcat --integer-keys Bag-eq.bag  
  8|   1|

 $ rwbagtool --compare=ge Bag1.bag Bag2.bag > Bag-ge.bag  
 $ rwbagcat --integer-keys Bag-ge.bag  
  4|   1|  
  8|   1|

 $ rwbagtool --compare=gt Bag1.bag Bag2.bag > Bag-gt.bag  
 $ rwbagcat --integer-keys Bag-gt.bag  
  4|   1|

Making a Cover Set
 $ rwbagtool --coverset Bag1.bag Bag2.bag Bag3.bag > Cover.set  
 $ rwsetcat --integer-keys Cover.set  
  1  
  2  
  3  
  4  
  6  
  7  
  8  
  9

Inverting a Bag
 $ rwbagtool --invert Bag1.bag > Bag-inv1.bag  
 $ rwbagcat --integer-keys Bag-inv1.bag  
  2|   1|  
  7|   1|  
 10|   1|  
 14|   1|  
 23|   1|

 $ rwbagtool --invert Bag2.bag > Bag-inv2.bag  
 $ rwbagcat --integer-keys Bag-inv2.bag  
  1|   1|  
  2|   2|  
 32|   1|

 $ rwbagtool --invert Bag3.bag > Bag-inv3.bag  
 $ rwbagcat --integer-keys Bag-inv3.bag  
  8|   2|  
 10|   1|  
 12|   1|  
 14|   1|

Masking Bag Files
 $ rwbagtool --intersect=Mask.set Bag1.bag > Bag-mask.bag  
 $ rwbagcat --integer-keys Bag-mask.bag  
  4|   7|  
  6|  14|  
  8|   2|

 $ rwbagtool --complement-intersect=Mask.set Bag1.bag > Bag-mask2.bag  
 $ rwbagcat --integer-keys Bag-mask2.bag  
  3|  10|  
  7|  23|

Restricting the Output
 $ rwbagtool --add --maxkey=5 Bag1.bag Bag2.bag > Bag-res1.bag  
 $ rwbagcat --integer-keys Bag-res1.bag  
  1|   1|  
  3|  10|  
  4|   9|

 $ rwbagtool --minkey=3 --maxkey=6 Bag1.bag > Bag-res2.bag  
 $ rwbagcat --integer-keys Bag-res2.bag  
  3|  10|  
  4|   9|  
  6|  14|

 $ rwbagtool --mincounter=20 Bag1.bag Bag2.bag > Bag-res3.bag  
 $ rwbagcat --integer-keys Bag-res3.bag  
  7|  55|

 $ rwbagtool --sub --maxcounter=9 Bag1.bag Bag2.bag > Bag-res4.bag  
 $ rwbagcat --integer-keys Bag-res4.bag  
  4|   5|

ENVIRONMENT

SILK_IPSET_RECORD_VERSION

This environment variable is used as the value for the --ipset-record-version when that switch is not provided. Since SiLK 3.7.0.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.

SEE ALSO

rwbag(1), rwbagbuild(1), rwbagcat(1), rwfileinfo(1), rwset(1), rwsetbuild(1), rwsetcat(1), silk(7), zlib(3)

rwcat

Concatenate SiLK Flow files into single stream

SYNOPSIS

  rwcat [--output-path=PATH] [--note-add=TEXT] [--note-file-add=FILE]  
        [--print-filenames] [--byte-order={big | little | native}]  
        [--ipv4-output] [--compression-method=COMP_METHOD]  
        [--site-config-file=FILENAME]  
        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE...]]}

  rwcat --help

  rwcat --version

DESCRIPTION

rwcat reads SiLK Flow records and writes the records in the standard binary SiLK format to the specified output-path; rwcat writes the records to the standard output when stdout is not the terminal and --output-path is not provided.

rwcat reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwcat reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.

rwcat does not copy the invocation history and annotations (notes) from the header(s) of the source file(s) to the destination file. The --note-add or --note-file-add switch may be used to add a new annotation to the destination file.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--output-path=PATH

Write the binary SiLK Flow records to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwcat exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. When PATH ends in .gz, the output is compressed using the library associated with gzip(1). If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwcat to exit with an error.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--byte-order=ENDIAN

Set the byte order for the output SiLK Flow records. The argument is one of the following:

native

Use the byte order of the machine where rwcat is running. This is the default.

big

Use network byte order (big endian) for the output.

little

Write the output in little endian format.

--ipv4-output

Force the output to contain only IPv4 flow records. When this switch is specified, IPv6 flow records that contain addresses in the ::ffff:0:0/96 prefix are converted to IPv4 and written to the output, and all other IPv6 records are ignored. When SiLK has not been compiled with IPv6 support, rwcat acts as if this switch were always in effect.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--print-filenames

Print the names of input files and the number of records each file contains as the files are read.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwcat searches for the site configuration file in the locations specified in the FILES section.

--xargs
--xargs=FILENAME

Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwcat opens each named file in turn and reads records from it as if the filenames had been listed on the command line.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

To combine the results of several rwfilter(1) runs---stored in the files run1.rw, run2.rw, ... runN.rw---together to create the file combined.rw, you can use:

 $ rwcat --output=combined.rw  *.rw

If the shell complains about too many arguments, you can use the UNIX find(1) function and pipe its output to rwcat:

 $ find . -name ’*.rw’ -print                   \  
   | rwcat --xargs --output=combined.rw

ENVIRONMENT

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwcat may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwcat may use this environment variable. See the FILES section for details.

FILES

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwfilter(1), rwfileinfo(1), silk(7), gzip(1), find(1), zlib(3)

BUGS

Although rwcat will read from the standard input, this feature should be used with caution. rwcat will treat the standard input as a single file, as it has no way to know when one file ends and the next begins. The following will not work:

 $ cat run1.rw run2.rw | rwcat --output=combined.rw     # WRONG!

The header of run2.rw will be treated as data of run1.rw, resulting in corrupt output.

rwcombine

Combine flows denoting a long-lived session into a single flow

SYNOPSIS

  rwcombine [--actions=ACTIONS] [--ignore-fields=FIELDS]  
        [--max-idle-time=NUM]  
        [{--print-statistics | --print-statistics=FILENAME}]  
        [--temp-directory=DIR_PATH] [--buffer-size=SIZE]  
        [--note-add=TEXT] [--note-file-add=FILE]  
        [--compression-method=COMP_METHOD] [--print-filenames]  
        [--output-path=PATH] [--site-config-file=FILENAME]  
        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}

  rwcombine --help

  rwcombine --help-fields

  rwcombine --version

DESCRIPTION

rwcombine reads SiLK Flow records from one or more input sources, searches for flow records where the attributes field denotes records that were prematurely created or were continuations of prematurely created flows, and attempts to combine those records into a single record. All the unmodified SiLK records and the combined records are written to the file specified by the --output-path switch or to the standard output when the --output-path switch is not provided and the standard output is not connected to a terminal.

Some flow exporters, such as yaf(1), provide fields that describe characteristics about the flow record, and these characteristics are stored in the attributes field of SiLK Flow records. The two flags that rwcombine considers are:

T

The flow generator prematurely created a record for a long-lived session due to the connection’s lifetime reaching the active timeout of the flow generator. (Also, when yaf is run with the --silk switch, it prematurely creates a flow and marks it with T if the byte count of the flow cannot be stored in a 32-bit value.)

C

The flow generator created this flow as a continuation of long-running connection, where the previous flow for this connection met a timeout. (yaf only sets this flag when it is invoked with the --silk switch.)

A very long-running session may be represented by multiple flow records, where the first record is marked with the T flag, the final record is marked with the C flag, and intermediate records are marked with both C (this record continues an earlier flow) and T (this record also met the active time-out). rwcombine attempts to combine these multiple flow records into a single record.

The input to rwcombine does not need to be sorted. As part of its processing, rwcombine may re-order the records before writing them.

rwcombine reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwcombine reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.

Algorithm

The algorithm rwcombine uses to combine records is

  1. rwcombine reads SiLK flow records, examines the attributes field on each record, and immediately writes to the destination stream all records where both the time-out flag (T) and the continuation flag (C) are not set. Records where one or both of those flags are set are stored until all input records have been read.

  2. rwcombine groups the stored records into bins where the following fields for each record in each bin are identical: sIP, dIP, sPort, dPort, protocol, sensor, in, out, nhIP, application, class, and type.

  3. For each bin, the records are stored by time (sTime and elapsed).

  4. Within a bin, rwcombine combines two records into a single record when the attributes field of the first record has the T (time-out) flag set and the second record has the C (continuation) flag set. When combining records, the bytes field and packets fields are summed, the initialFlags from the first record is used, the sessionFlags field becomes the bit-wise OR of both sessionFlags fields and the second record’s initialFlags field, and the eTime is set to that of the second flow.

  5. If the second record’s T flag was set, rwcombine checks to see if the third record’s C flag is set. If it is, the third record becomes part of the new record.

  6. The previous step repeats for the records in the bin until the bin contains a single record, the most recently added record did not have the T flag set, or the next record in the bin does not have the C flag set.

  7. After examining a bin, rwcombine writes the record(s) the bin contains to the destination stream.

  8. Steps 3 through 7 are repeated for each bin.

The --ignore-fields switch allows the user to remove fields from the set that rwcombine uses when grouping records in Step 2.

When combining two records into one (Step 4), rwcombine completely disregards the difference between the first record’s end-time and the second record’s start-time (the idle time). To tell rwcombine not to combine those records when the difference is greater than a limit, specify that value as the argument to the --max-idle-time switch.

To see information on the number of flows combined and the minimum and maximum idle times, specify the --print-statistics switch.

During its processing, rwcombine will try to allocate a large (near 2GB) in-memory array to hold the records. (You may use the --buffer-size switch to change this maximum buffer size.) If more records are read than will fit into memory, the in-core records are temporarily stored on disk as described by the --temp-directory switch. When all records have been read, the on-disk files are merged to produce the output.

By default, the temporary files are stored in the /tmp directory. Because the sizes of the temporary files may be large, it is strongly recommended that /tmp not be used as the temporary directory, and rwcombine will print a warning when /tmp is used. To modify the temporary directory used by rwcombine, provide the --temp-directory switch, set the SILK_TMPDIR environment variable, or set the TMPDIR environment variable.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--actions=ACTIONS

Select the type of action(s) that rwcombine should take to combine the input records. The default action is all, and the following actions are supported:

all

Perform all the actions described below.

timeout

Combine into a single flow record those records where the timeout flags in the attributes field indicate that the flow exporter has divided a long-lived session into multiple flow records.

This switch is provided for future expansion of rwcombine, since at present rwcombine supports a single action. When writing a script that uses rwcombine, specify --action=timeout for compatibility with future versions of rwcombine.

--ignore-fields=FIELDS

Ignore the fields listed in FIELDS when determining if two flow records should be grouped into the same bin; that is, treat FIELDS as being identical across all flows. By default, rwcombine puts records into a bin when the records have identical values for the following fields: sIP, dIP, sPort, dPort, protocol, sensor, in, out, nhIP, application, class, and type.

FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Field-names are case-insensitive. Example:

 --ignore-fields=sensor,12-15

The list of supported fields are:

sIP,1

source IP address

dIP,2

destination IP address

sPort,3

source port for TCP and UDP, or equivalent

dPort,4

destination port for TCP and UDP, or equivalent

protocol,5

IP protocol

sensor,12

name or ID of sensor at the collection point

in,13

router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))

out,14

router SNMP output interface or postVlanId

nhIP,15

router next hop IP

class,20,type,21

class and type of sensor at the collection point (represented internally by a single value)

application,29

guess as to the content of the flow. Some software that generates flow records from packet data, such as yaf(1), will inspect the contents of the packets that make up a flow and use traffic signatures to label the content of the flow. SiLK calls this label the application; yaf refers to it as the appLabel. The application is the port number that is traditionally used for that type of traffic (see the /etc/services file on most UNIX systems). For example, traffic that the flow generator recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard HTTP/web port (80).

--max-idle-time=NUM

Do not combine flow records when the start time of the second flow record begins NUM seconds after the end time of the first flow record. NUM may be fractional. If not specified, the maximum idle time may be considered infinite.

--print-statistics
--print-statistics=FILENAME

Print to the standard error or to the specified FILENAME the number of flows records read and written, the number of flows that did not require combining, the number of flows combined, the number that could not be combined, and minimum and maximum idle time between combined flow records.

--temp-directory=DIR_PATH

Specify the name of the directory in which to store data files temporarily when more records have been read that will fit into RAM. This switch overrides the directory specified in the SILK_TMPDIR environment variable, which overrides the directory specified in the TMPDIR variable, which overrides the default, /tmp.

--buffer-size=SIZE

Set the maximum size of the buffer to use for holding the records, in bytes. A larger buffer means fewer temporary files need to be created, reducing the I/O wait times. The default maximum for this buffer is near 2GB. The SIZE may be given as an ordinary integer, or as a real number followed by a suffix K, M or G, which represents the numerical value multiplied by 1,024 (kilo), 1,048,576 (mega), and 1,073,741,824 (giga), respectively. For example, 1.5K represents 1,536 bytes, or one and one-half kilobytes. (This value does not represent the absolute maximum amount of RAM that rwcombine will allocate, since additional buffers will be allocated for reading the input and writing the output.)

--output-path=PATH

Write the binary SiLK Flow records to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwcombine exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwcombine to exit with an error.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--print-filenames

Print to the standard error the names of input files as they are opened.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwcombine searches for the site configuration file in the locations specified in the FILES section.

--xargs
--xargs=FILENAME

Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwcombine opens each named file in turn and reads records from it as if the filenames had been listed on the command line.

--help

Print the available options and exit.

--help-fields

Print the description and alias(es) of each field and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

The output from rwcut(1) shows the flow exporter split this long-lived ssh session into multiple flow records:

 $ rwfilter --saddr=192.168.126.252 --dport=22 --pass=- data.rw \  
   | rwcut --fields=flags,attributes,stime,etime  
    flags|attribut|                  sTime|                  eTime|  
  S PA   |T       |2009/02/13T00:29:59.563|2009/02/13T00:59:39.668|  
    PA   |TC      |2009/02/13T00:59:39.668|2009/02/13T01:29:19.478|  
    PA   |TC      |2009/02/13T01:29:19.478|2009/02/13T01:58:48.890|  
    PA   |TC      |2009/02/13T01:58:48.891|2009/02/13T02:28:43.599|  
 F  PA   | C      |2009/02/13T02:28:43.600|2009/02/13T02:32:58.272|

Here is the other half of that conversation:

 $ rwfilter --daddr=192.168.126.252 --sport=22 --pass=- data.rw \  
   | rwcut --fields=flags,attributes,stime,etime  
    flags|attribut|                  sTime|                  eTime|  
  S PA   |T       |2009/02/13T00:30:00.060|2009/02/13T00:59:39.667|  
    PA   |TC      |2009/02/13T00:59:39.670|2009/02/13T01:29:19.478|  
    PA   |TC      |2009/02/13T01:29:19.481|2009/02/13T01:58:48.890|  
    PA   |TC      |2009/02/13T01:58:48.893|2009/02/13T02:28:43.599|  
 F  PA   | C      |2009/02/13T02:28:43.600|2009/02/13T02:32:58.271|

Use rwuniq(1) to compute the byte and packet counts for that ssh session:

 $ rwfilter --any-addr=192.168.126.252 --aport=22 --pass=- data.rw \  
   | rwuniq --fields=sip,dip,sport,dport --values=records,byte,packets  
             sIP|            dIP|sPort|dPort|Records|  Bytes|Packets|  
   10.11.156.107|192.168.126.252|   22|28975|      5|4677240|   3881|  
 192.168.126.252|  10.11.156.107|28975|   22|      5| 281939|   3891|

Invoke rwcombine on these records and store the result in the file combined.rw:

 $ rwfilter --any-addr=192.168.126.252 --aport=22 --pass=- data.rw \  
   | rwcombine --print-statistics --output-path=combined.rw  
 FLOW RECORD COUNTS:  
 Read:                                    10  
 Initially Complete:           -           0 *  
 Sorted & Examined:            =          10  
 Missing end:                  -           0 *  
 Missing start & end:          -           0 *  
 Missing start:                -           0 *  
 Prior to combining:           =          10  
 Eliminated:                   -           8  
 Made complete:                =           2 *  
 Written:                                  2 (sum of *)

 IDLE TIMES:  
 Minimum:        0:00:00:00.000  
 Penultimate:    0:00:00:00.000  
 Maximum:        0:00:00:00.003

View the resulting records:

 $ rwcut --fields=sip,dip,sport,dport,bytes,packets,flags combined.rw  
             sIP|            dIP|sPort|dPort|  bytes|packets|   flags|  
   10.11.156.107|192.168.126.252|   22|28975|4677240|   3881|FS PA   |  
 192.168.126.252|  10.11.156.107|28975|   22| 281939|   3891|FS PA   |

 $ rwcut --fields=sip,attributes,stime,etime combined.rw  
             sIP|attribut|                  sTime|                  eTime|  
   10.11.156.107|        |2009/02/13T00:30:00.060|2009/02/13T02:32:58.271|  
 192.168.126.252|        |2009/02/13T00:29:59.563|2009/02/13T02:32:58.272|

ENVIRONMENT

SILK_TMPDIR

When set and --temp-directory is not specified, rwcombine writes the temporary files it creates to this directory. SILK_TMPDIR overrides the value of TMPDIR.

TMPDIR

When set and SILK_TMPDIR is not set, rwcombine writes the temporary files it creates to this directory.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwcombine may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwcombine may use this environment variable. See the FILES section for details.

SILK_TEMPFILE_DEBUG

When set to 1, rwcombine prints debugging messages to the standard error as it creates, re-opens, and removes temporary files.

FILES

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

${SILK_TMPDIR}/
${TMPDIR}/
/tmp/

Directory in which to create temporary files.

SEE ALSO

rwfilter(1), rwcut(1), rwuniq(1), rwfileinfo(1), sensor.conf(5), silk(7), yaf(1), zlib(3)

NOTES

The first release of rwcombine occurred in SiLK 3.9.0.

rwcompare

Compare the records in two SiLK Flow files

SYNOPSIS

  rwcompare [--quiet] [--site-config-file] FILE1 FILE2

  rwcompare --help

  rwcompare --version

DESCRIPTION

rwcompare opens the two files named on the command and compares the SiLK Flow records they contain. If the records are identical, rwcompare exits with status 0. If any of the records differ, rwcompare prints a message and exits with status 1. If there is an issue reading either file, an error is printed and the exit status is 2. Use the --quiet switch to suppress all output (error messages included). You may use - or stdin for one of the file names, in which case rwcompare reads from the standard input.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--quiet

Do not print a message if the files differ, and do not an print error message if a file cannot be opened or read.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwcombine searches for the site configuration file in the locations specified in the FILES section.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. Some input lines are split over multiple lines in order to improve readability, and a backslash (\) is used to indicate such lines. The examples assume the existence of the file data.rw that contains SiLK Flow records. The exit status of the most recent command is available in the shell variable $?.

Compare a file with itself:

 $ rwcompare data.rw data.rw  
 $ echo $?  
 0

Compare a file with itself, where one instance of the file is read from the standard input:

 $ rwcat data.rw | rwcompare - data.rw  
 $ echo $?  
 0

Use rwsort(1) to modify one instance of the file and compare the results:

 $ rwsort --fields=proto data.rw | rwcompare - data.rw  
 - data.rw differ: record 1  
 $ echo $?  
 1

Run the command again and use the --quiet switch:

 $ rwsort --fields=proto data.rw | rwcompare --quiet - data.rw  
 $ echo $?  
 1

Compare the file with input containing two copies of the file:

 $ rwcat data.rw data.rw | rwcompare data.rw -  
 data.rw - differ: EOF data.rw  
 $ echo $?  
 1

Compare the file with /dev/null:

 $ rwcompare --quiet /dev/null data.rw  
 $ echo $?  
 2

rwcompare checks whether two files have the same records in the same order. To compare two arbitrary files, use rwsort(1) to reorder the records. Make certain to provide enough fields to the rwsort command so that the records are in the same order.

 $ rwsort --fields=1-10,12-15,20-29 data.rw > /tmp/sorted-data.rw  
 $ rwsort --fields=1-10,12-15,20-29 other-data.rw   \  
   | rwcompare /tmp/sorted-data.rw -  
 /tmp/sorted-data.rw - differ: record 103363

ENVIRONMENT

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwcombine may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwcombine may use this environment variable. See the FILES section for details.

FILES

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwfileinfo(1), rwcat(1), rwsort(1), silk(7)

rwcount

Print traffic summary across time

SYNOPSIS

  rwcount [--bin-size=SIZE] [--load-scheme=LOADSCHEME]  
        [--start-time=START_TIME] [--end-time=END_TIME]  
        [--skip-zeroes] [--bin-slots] [--epoch-slots]  
        [--timestamp-format=FORMAT] [--no-titles]  
        [--no-columns] [--column-separator=CHAR]  
        [--no-final-delimiter] [{--delimited | --delimited=CHAR}]  
        [--print-filenames] [--copy-input=PATH] [--output-path=PATH]  
        [--pager=PAGER_PROG] [--site-config-file=FILENAME]  
        [{--legacy-timestamps | --legacy-timestamps={1,0}}]  
        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}

  rwcount --help

  rwcount --version

DESCRIPTION

rwcount summarizes SiLK flow records across time. It counts the records in the input stream, and groups their byte and packet totals into time bins. rwcount produces textual output with one row for each bin.

rwcount reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwcount reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.

rwcount splits each flow record into bins whose size is determined by the argument to the --bin-size switch. When that switch is not provided, rwcount uses 30-second bins by default.

By default, the first row of data rwcount prints is the bin containing the starting time of the earliest record that appears in the input. rwcount then prints a row for every bin until it reaches the bin containing the most recent ending time. Rows whose counts are zero are printed unless the --skip-zero switch is specified.

The --start-time and --end-time switches tell rwcount to use a specific time for the first row and the final row. The --start-time switch always sets the time stamp on the first bin to the specified time. With the --end-time switch, rwcount computes a maximum end-time by setting any unspecified hour, minute, second, and millisecond field to its maximum value, and the final bin is that which contains the maximum end-time.

When --start-time and --end-time are both specified, rwcount reserves the memory for the bins before it begins processing the records. If the memory cannot be allocated, rwcount exits. If this happens, try reducing the time span or increasing the bin-size.

Load Scheme

A router or other flow generator summarizes the traffic it sees into records. In addition to the five-tuple (source port and address, destination port and address, and protocol), the record has its start time, end time, total byte count, and total packet count. There is no way to know how the bytes and packets were distributed during the duration of the record: their distribution could be front-loaded, back-loaded, uniform, et cetera.

When the start and end times of a individual flow record put that record into a single bin, rwcount can simply add that record’s volume (byte and packet counts) to the bin.

When the duration of a flow record causes it to span multiple bins, rwcount must to told how to allocate the volume among the bins. The --load-scheme switch determines this, and it has supports the following allocation schemes:

time-proportional

Divides the total volume of the flow by the duration of the flow, and multiplies the quotient by the time spent in the bin. Thus, the volume the flow contributes to a bin is proportional to the time the flow spent in the bin. This models a flow where the volume/second ratio is uniform.

bin-uniform

Divides the volume of the flow by the number of bins the flow spans, and adds the quotient to each of the bins. In this scheme, the volume/bin ratio is uniform.

start-spike

Adds the total volume for the flow into the bin containing the start time of the flow. This models a flow that is front-loaded to the point where the entire volume is a single spike occurring in the initial millisecond of flow.

middle-spike

Determines the time at the midpoint of the flow, and adds the entire volume for the flow into the bin containing that time.

end-spike

Adds the total volume for the flow into the bin containing the end time of the flow. This models a flow that is back-loaded to the point where the entire volume is a single spike occurring in final millisecond of the flow.

maximum-volume

Adds the entire volume for the flow into every bin that contains any part of the flow. In theory, the distribution of the bytes in the record could be a spike that occurs at any point during the flow’s duration. This scheme allows one to determine, in aggregate, the maximum possible volume that could have occurred during this bin. In this scheme, the Records column gives the number of records that were active during the bin.

minimum-volume

Acts as though the volume for the flow occurred in some other bin. It is possible that a record that spans multiple bins did not contribute any volume to the current bin. This scheme allows one to determine, in aggregate, the minimum possible volume that may have occurred during this bin. The Records column in this scheme, as in the maximum-volume scheme, gives the number of flow records that were active during the bin.

Be aware that the ”spike” load-schemes allocate the entire flow to a single bin. This can create the impression that there is more traffic occurring during a particular time window that the physical network supports.

The maximum-volume and minimum-volume schemes are used to compute the maximum and minimum volumes that could have been transferred during any one bin. maximum-volume intentionally over-counts the flow volume and minimum-volume intentionally under-counts.

To see the effect of the various load-schemes, suppose rwcount is using 60-second bins and the input contains two records. The first record begins at 12:03:50, ends at 12:06:20, and contains 9,000 bytes (60 bytes/second for 150 seconds). This record may contribute to bins at 12:03, 12:04, 12:05, and 12:06. The second record begins at 12:04:05 and lasts 15 seconds; this record’s volume always contributes its 200 bytes to the 12:04 bin. The --load-scheme option splits the byte-counts of the records as follows:

 BIN                 12:03:00    12:04:00    12:05:00    12:06:00

 time-proportional        600        3800        3600        1200  
 bin-uniform             2250        2450        2250        2250  
 start-spike             9000         200           0           0  
 middle-spike               0         200        9000           0  
 end-spike                  0         200           0        9000  
 maximum-volume          9000        9200        9000        9000  
 minimum-volume             0         200           0           0

For the record that spans multiple bins: the time-proportional scheme assumes 60 bytes/second, the bin-uniform scheme divides the volume evenly by the four bins, the middle-spike scheme assumes all the volume occurs at 12:05:05, the maximum-volume scheme adds the volume to every bin, and the minimum-volume scheme ignores the record.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--bin-size=SIZE

Denote the size of each time bin, in seconds; defaults to 30 seconds. rwcount supports millisecond size bins; SIZE may be a floating point value equal to or greater than than 0.001.

--load-scheme=LOADSCHEME

Specify how a flow record that spans multiple bins allocates its bytes and packets among the bins. The default scheme is time-proportional, which assumes the volume/second ratio of the flow record is constant. See the Load Scheme section for additional information on the load-scheme choices. The LOADSCHEME may be one of the following names or numbers; names may be abbreviated to the shortest prefix that is unique.

time-proportional,4

Allocate the volume in proportion to the amount of time the flow spent in the bin.

bin-uniform,0

Allocate the volume evenly across the bins that contain any part of the flow’s duration.

start-spike,1

Allocate the entire volume to the bin containing the start time of the flow.

middle-spike,3

Allocate the entire volume to the bin containing the time at the midpoint of the flow.

end-spike,2

Allocate the entire volume to the bin containing the end time of the flow.

maximum-volume,5

Allocate the entire volume to all of the bins containing any part of the flow.

minimum-volume,6

Allocate the flow’s volume to a bin only if the flow is completely contained within the bin; otherwise ignore the flow.

--start-time=START_TIME

Set the time of the first bin to START_TIME. When this switch is not given, the first bin is one that holds the starting time of the earliest record. The START_TIME may be specified in a format of yyyy/mm/dd[:HH[:MM[:SS[.sss]]]] (or T may be used in place of : to separate the day and hour). The time must be specified to at least day precision, and unspecified hour, minute, second, and millisecond values are set to zero. Whether the date strings represent times in UTC or the local timezone depend on how SiLK was compiled, which can be determined from the Timezone support setting in the output from rwcount --version. Alternatively, the time may be specified as seconds since the UNIX epoch, and an unspecified milliseconds value is set to 0.

--end-time=END_TIME

Set the time of the final bin to END_TIME. When this switch is not given, the final bin is one that holds the ending time of the latest record. The format of END_TIME is the same as that for START_TIME. Unspecified hour, minute, second, and millisecond values are set to 23, 59, 59, and 999 respectively. When END_TIME is specified as seconds since the UNIX epoch, an unspecified milliseconds value is set to 999. When both --start-time and --end-time are used, the END_TIME is adjusted so that the final bin represents a complete interval.

--skip-zeroes

Disable printing of bins with no traffic. By default, all bins are printed.

--bin-slots

Use the internal bin index as the label for each bin in the output; the default is to label each bin with the time in a human-readable format.

--epoch-slots

Use the UNIX epoch time (number of seconds since midnight UTC on 1970-01-01) as the label for each bin in the output; the default is to label each bin with the time in a human-readable format. This switch is equivalent to --timestamp-format=epoch. This switch is deprecated as of SiLK 3.11.0, and it will be removed in the SiLK 4.0 release.

--timestamp-format=FORMAT

Specify the format and/or timezone to use when printing timestamps. When this switch is not specified, the SILK_TIMESTAMP_FORMAT environment variable is checked for a default format and/or timezone. If it is empty or contains invalid values, timestamps are printed in the default format, and the timezone is UTC unless SiLK was compiled with local timezone support. FORMAT is a comma-separated list of a format and/or a timezone. The format is one of:

default

Print the timestamps as YYYY/MM/DDThh:mm:ss .

iso

Print the timestamps as YYYY-MM-DD hh:mm:ss .

m/d/y

Print the timestamps as MM/DD/YYYY hh:mm:ss .

epoch

Print the timestamps as the number of seconds since 00:00:00 UTC on 1970-01-01.

When a timezone is specified, it is used regardless of the default timezone support compiled into SiLK. The timezone is one of:

utc

Use Coordinated Universal Time to print timestamps.

local

Use the TZ environment variable or the local timezone.

--no-titles

Turn off column titles. By default, titles are printed.

--no-columns

Disable fixed-width columnar output.

--column-separator=C

Use specified character between columns and after the final column. When this switch is not specified, the default of ’|’ is used.

--no-final-delimiter

Do not print the column separator after the final column. Normally a delimiter is printed.

--delimited
--delimited=C

Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default ’|’.

--print-filenames

Print to the standard error the names of input files as they are opened.

--copy-input=PATH

Copy all binary SiLK Flow records read as input to the specified file or named pipe. PATH may be stdout or - to write flows to the standard output as long as the --output-path switch is specified to redirect rwcount’s textual output to a different location.

--output-path=PATH

Write the textual output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output (and bypass the paging program). If PATH names an existing file, rwcount exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is either sent to the pager or written to the standard output.

--pager=PAGER_PROG

When output is to a terminal, invoke the program PAGER_PROG to view the output one screen full at a time. This switch overrides the SILK_PAGER environment variable, which in turn overrides the PAGER variable. If the --output-path switch is given or if the value of the pager is determined to be the empty string, no paging is performed and all output is written to the terminal.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwcount searches for the site configuration file in the locations specified in the FILES section.

--legacy-timestamps
--legacy-timestamps=NUM

When NUM is not specified or is 1, this switch is equivalent to --timestamp-format=m/d/y. Otherwise, the switch has no effect. This switch is deprecated as of SiLK 3.0.0, and it will be removed in the SiLK 4.0 release.

--xargs
--xargs=FILENAME

Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwcount opens each named file in turn and reads records from it as if the filenames had been listed on the command line.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

--start-epoch=START_TIME

Alias the --start-time switch. This switch is deprecated as of SiLK 3.8.0.

--end-epoch=START_TIME

Alias the --end-time switch. This switch is deprecated as of SiLK 3.8.0.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

To count all web traffic on Feb 12, 2009, into 1 hour bins:

 $ rwfilter --pass=stdout --start-date=2009/02/12:00        \  
        --end-date=2009/02/12:23 --proto=6 --aport=80       \  
   | rwcount --bin-size=3600  
                Date|      Records|          Bytes|      Packets|  
 2009/02/12T00:00:00|      1490.49|   578270918.16|    463951.55|  
 2009/02/12T01:00:00|      1459.33|   596455716.52|    457487.80|  
 2009/02/12T02:00:00|      1529.06|   562602842.44|    451456.41|  
 2009/02/12T03:00:00|      1503.89|   562683116.38|    455554.81|  
 2009/02/12T04:00:00|      1561.89|   590554569.78|    489273.81|  
 ....

To bin the records according to their start times, use the --load-scheme switch:

 $ rwfilter ... --pass=stdout       \  
   | rwcount --bin-size=3600 --load-scheme=1  
                Date|      Records|          Bytes|      Packets|  
 2009/02/12T00:00:00|      1494.00|   580350969.00|    464952.00|  
 2009/02/12T01:00:00|      1462.00|   596145212.00|    457871.00|  
 2009/02/12T02:00:00|      1526.00|   561629416.00|    451088.00|  
 2009/02/12T03:00:00|      1502.00|   563500618.00|    455262.00|  
 2009/02/12T04:00:00|      1562.00|   589265818.00|    489279.00|  
 ...

To bin the records by their end times: $ rwfilter ... --pass=stdout \| rwcount --bin-size=3600 --load-scheme=2 Date| Records| Bytes| Packets| 2009/02/12T00:00:00| 1488.00| 577132372.00| 463393.00| 2009/02/12T01:00:00| 1458.00| 596956697.00| 457376.00| 2009/02/12T02:00:00| 1530.00| 562806395.00| 451551.00| 2009/02/12T03:00:00| 1506.00| 562101791.00| 455671.00| 2009/02/12T04:00:00| 1562.00| 591408602.00| 489371.00| ...

To force the hourly bins to run from 30 minutes past the hour, use the --start-time switch:

 $ rwfilter ... --pass=stdout       \  
   | rwcount --bin-size=3600 --start-time=2002/12/31:23:30  
                Date|      Records|          Bytes|      Packets|  
 2009/02/12T00:30:00|      1483.26|   581251364.04|    456554.40|  
 2009/02/12T01:30:00|      1494.00|   575037453.00|    449280.00|  
 2009/02/12T02:30:00|      1486.36|   559700466.61|    447700.15|  
 2009/02/12T03:30:00|      1555.23|   588882400.58|    480724.48|  
 2009/02/12T04:30:00|      1537.79|   564756248.52|    472003.45|  
 ...

ENVIRONMENT

SILK_TIMESTAMP_FORMAT

This environment variable is used as the value for --timestamp-format when that switch is not provided. Since SiLK 3.11.0.

SILK_PAGER

When set to a non-empty string, rwcount automatically invokes this program to display its output a screen at a time. If set to an empty string, rwcount does not automatically page its output.

PAGER

When set and SILK_PAGER is not set, rwcount automatically invokes this program to display its output a screen at a time.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwcount may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwcount may use this environment variable. See the FILES section for details.

TZ

When the argument to the --timestamp-format switch includes local or when a SiLK installation is built to use the local timezone, the value of the TZ environment variable determines the timezone in which rwcount displays timestamps. (If both of those are false, the TZ environment variable is ignored.) If the TZ environment variable is not set, the machine’s default timezone is used. Setting TZ to the empty string or 0 causes timestamps to be displayed in UTC. For system information on the TZ variable, see tzset(3) or environ(7). (To determine if SiLK was built with support for the local timezone, check the Timezone support value in the output of rwcount --version.) The TZ environment variable is also used when rwcount parses the timestamp specified in the --start-time or --end-time switches if SiLK is built with local timezone support.

FILES

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwfilter(1), rwuniq(1), silk(7), tzset(3), environ(7)

BUGS

Unlike rwuniq(1), rwcount does not support counting the number of distinct IPs in a bin. However, using the --bin-time switch on rwuniq can provide time-based binning similar to what rwcount supports. Note that rwuniq always bins by the each record’s start-time (similar to rwcount --load-factor=1), and there is no support in rwuniq for dividing a SiLK record among multiple time bins.

rwcut

Print selected fields of binary SiLK Flow records

SYNOPSIS

  rwcut [{--fields=FIELDS | --all-fields}]  
        {[--start-rec-num=START_NUM] [--end-rec-num=END_NUM]  
         | [--tail-recs=TAIL_START_NUM]}  
        [--num-recs=REC_COUNT] [--dry-run] [--icmp-type-and-code]  
        [--timestamp-format=FORMAT] [--epoch-time]  
        [--ip-format=FORMAT] [--integer-ips] [--zero-pad-ips]  
        [--integer-sensors] [--integer-tcp-flags]  
        [--no-titles] [--no-columns] [--column-separator=CHAR]  
        [--no-final-delimiter] [{--delimited | --delimited=CHAR}]  
        [--print-filenames] [--copy-input=PATH] [--output-path=PATH]  
        [--pager=PAGER_PROG] [--site-config-file=FILENAME]  
        [--ipv6-policy={ignore,asv4,mix,force,only}]  
        [{--legacy-timestamps | --legacy-timestamps={1,0}}]  
        [--plugin=PLUGIN [--plugin=PLUGIN ...]]  
        [--python-file=PATH [--python-file=PATH ...]]  
        [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]  
        [--pmap-column-width=NUM]  
        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}

  rwcut [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]  
        [--plugin=PLUGIN ...] [--python-file=PATH ...] --help

  rwcut [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]  
        [--plugin=PLUGIN ...] [--python-file=PATH ...] --help-fields

  rwcut --version

DESCRIPTION

rwcut reads binary SiLK Flow records and prints the user-selected record attributes (or fields) to the terminal in a textual, bar-delimited (|) format. See the EXAMPLES section below for sample output.

rwcut reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwcut reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.

The user may provide the --fields switch to select the record attributes to print. When --fields is not specified rwcut prints the source and destination IP address, source and destination port, protocol, packet count, byte count, TCP flags, start time, duration, end time, and the sensor name. The fields are printed in the order in which they occur in the --fields switch. Fields may be repeated.

A subset of the input records may be selected by using the --start-rec-num, --end-rec-num, --num-recs, and --tail-recs switches.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--fields=FIELDS

FIELDS contains the list of flow attributes (a.k.a. fields or columns) to print. The columns will be displayed in the order the fields are specified. Fields may be repeated. FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Field-names are case-insensitive. Example:

 --fields=stime,10,1-5

If the --fields switch is not given, FIELDS defaults to:

 sIP,dIP,sPort,dPort,protocol,packets,bytes,flags,sTime,dur,eTime,sensor

The complete list of built-in fields that the SiLK tool suite supports follows, though note that not all fields are present in all SiLK file formats; when a field is not present, its value is 0.

sIP,1

source IP address

dIP,2

destination IP address

sPort,3

source port for TCP and UDP, or equivalent

dPort,4

destination port for TCP and UDP, or equivalent

protocol,5

IP protocol

packets,pkts,6

packet count

bytes,7

byte count

flags,8

bit-wise OR of TCP flags over all packets

sTime,9

starting time of flow in millisecond resolution

duration,10

duration of flow in millisecond resolution

eTime,11

end time of flow in millisecond resolution

sensor,12

name or ID of sensor at the collection point

class,20

class of sensor at the collection point

type,21

type of sensor at the collection point

sTime+msec,22

starting time of flow including milliseconds (milliseconds are always displayed); this field is deprecated as of SiLK 3.8.1, and it will be removed in the SiLK 4.0 release

eTime+msec,23

end time of flow including milliseconds (milliseconds are always displayed); this field is deprecated as of SiLK 3.8.1, and it will be removed in the SiLK 4.0 release

dur+msec,24

duration of flow including milliseconds (milliseconds are always displayed); this field is deprecated as of SiLK 3.8.1, and it will be removed in the SiLK 4.0 release

iType

the ICMP type value for ICMP or ICMPv6 flows and empty for non-ICMP flows. This field was introduced in SiLK 3.8.1.

iCode

the ICMP code value for ICMP or ICMPv6 flows and empty for non-ICMP flows. See note at iType.

icmpTypeCode,25

equivalent to iType,iCode. This field is deprecated as of SiLK 3.8.1.

Many SiLK file formats do not store the following fields and their values will always be 0; they are listed here for completeness:

in,13

router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))

out,14

router SNMP output interface or postVlanId

nhIP,15

router next hop IP

SiLK can store flows generated by enhanced collection software that provides more information than NetFlow v5. These flows may support some or all of these additional fields; for flows without this additional information, the field’s value is always 0.

initialFlags,26

TCP flags on first packet in the flow

sessionFlags,27

bit-wise OR of TCP flags over all packets except the first in the flow

attributes,28

flow attributes set by the flow generator:

S

all the packets in this flow record are exactly the same size

F

flow generator saw additional packets in this flow following a packet with a FIN flag (excluding ACK packets)

T

flow generator prematurely created a record for a long-running connection due to a timeout. (When the flow generator yaf(1) is run with the --silk switch, it will prematurely create a flow and mark it with T if the byte count of the flow cannot be stored in a 32-bit value.)

C

flow generator created this flow as a continuation of long-running connection, where the previous flow for this connection met a timeout (or a byte threshold in the case of yaf).

Consider a long-running ssh session that exceeds the flow generator’s active timeout. (This is the active timeout since the flow generator creates a flow for a connection that still has activity). The flow generator will create multiple flow records for this ssh session, each spanning some portion of the total session. The first flow record will be marked with a T indicating that it hit the timeout. The second through next-to-last records will be marked with TC indicating that this flow both timed out and is a continuation of a flow that timed out. The final flow will be marked with a C, indicating that it was created as a continuation of an active flow.

application,29

guess as to the content of the flow. Some software that generates flow records from packet data, such as yaf, will inspect the contents of the packets that make up a flow and use traffic signatures to label the content of the flow. SiLK calls this label the application; yaf refers to it as the appLabel. The application is the port number that is traditionally used for that type of traffic (see the /etc/services file on most UNIX systems). For example, traffic that the flow generator recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard HTTP/web port (80).

The following fields provide a way to label the IPs or ports on a record. These fields require external files to provide the mapping from the IP or port to the label:

sType,16

for the source IP address, the value 0 if the address is non-routable, 1 if it is internal, or 2 if it is routable and external. Uses the mapping file specified by the SILK_ADDRESS_TYPES environment variable, or the address_types.pmap mapping file, as described in addrtype(3).

dType,17

as sType for the destination IP address

scc,18

for the source IP address, a two-letter country code abbreviation denoting the country where that IP address is located. Uses the mapping file specified by the SILK_COUNTRY_CODES environment variable, or the country_codes.pmap mapping file, as described in ccfilter(3). The abbreviations are those used by the Root-Zone Whois Index (see for example http://www.iana.org/cctld/cctld-whois.htm) or the following special codes: -- N/A (e.g. private and experimental reserved addresses); a1 anonymous proxy; a2 satellite provider; o1 other

dcc,19

as scc for the destination IP

src-MAPNAME

label determined by passing the source IP or the protocol/source-port to the user-defined mapping defined in the prefix map associated with MAPNAME. See the description of the --pmap-file switch below and the pmapfilter(3) manual page.

dst-MAPNAME

as src-MAPNAME for the destination IP or protocol/destination-port.

sval
dval

These are deprecated field names created by pmapfilter that correspond to src-MAPNAME and dst-MAPNAME, respectively. These fields are available when a prefix map is used that is not associated with a MAPNAME.

Finally, the list of built-in fields may be augmented by the run-time loading of PySiLK code or plug-ins written in C (also called shared object files or dynamic libraries), as described by the --python-file and --plugin switches.

--all-fields

Instruct rwcut to print all known fields. This switch may not be combined with the --fields switch. This switch suppresses error messages from the plug-ins.

--plugin=PLUGIN

Augment the list of fields by using run-time loading of the plug-in (shared object) whose path is PLUGIN. The switch may be repeated to load multiple plug-ins. The creation of plug-ins is described in the silk-plugin(3) manual page. When PLUGIN does not contain a slash (/), rwcut will attempt to find a file named PLUGIN in the directories listed in the FILES section. If rwcut finds the file, it uses that path. If PLUGIN contains a slash or if rwcut does not find the file, rwcut relies on your operating system’s dlopen(3) call to find the file. When the SILK_PLUGIN_DEBUG environment variable is non-empty, rwcut prints status messages to the standard error as it attempts to find and open each of its plug-ins.

--start-rec-num=START_NUM

Begin printing with the START_NUM’th record by skipping the first START_NUM-1 records. The default is 1; that is, to start printing at the first record; START_NUM must be a positive integer. If START_NUM is greater than the number of input records, rwcut only outputs the title. This switch may not be combined with the --tail-recs switch. When using multiple input files, records are treated as a single stream for the purposes of the --start-rec-num, --end-rec-num, --tail-recs, and --num-recs switches. This switch does not affect the records written to the stream specified by --copy-input.

--end-rec-num=END_NUM

Stop printing after the END_NUM’th record. When END_NUM is 0, the default, printing stops once all input records have been printed; that is, END_NUM is effectively infinity. If this value is non-zero, it must not be less than START_NUM. This switch may not be combined with the --tail-recs switch. When using multiple input files, records are treated as a single stream for the purposes of the --start-rec-num, --end-rec-num, --tail-recs, and --num-recs switches. This switch does not affect the records written to the stream specified by --copy-input.

--tail-recs=TAIL_START_NUM

Begin printing once rwcut is TAIL_START_NUM records from end of the input stream, where TAIL_START_NUM is a positive integer. rwcut will print the remaining records in the input stream unless --num-recs is also specified and is less than TAIL_START_NUM. The --tail-recs switch is similar to the --start-rec-num switch except it counts from the end of the input stream. This switch may not be combined with the --start-rec-num and --end-rec-num switches. When using multiple input files, records are treated as a single stream for the purposes of the --start-rec-num, --end-rec-num, --tail-recs, and --num-recs switches. This switch does not affect the records written to the stream specified by --copy-input.

--num-recs=REC_COUNT

Print no more than REC_COUNT records. Specifying a REC_COUNT of 0 will print all records, which is the default. This switch is ignored under the following conditions: When both --start-rec-num and --end-rec-num are specified; when only --end-rec-num is given and END_NUM is less than REC_COUNT; when --tail-recs is specified and TAIL_START_NUM is less than REC_COUNT. When using multiple input files, records are treated as a single stream for the purposes of the --start-rec-num, --end-rec-num, --tail-recs, and --num-recs switches. This switch does not affect the records written to the stream specified by --copy-input.

--dry-run

Causes rwcut to print the column headers and exit. Useful for testing.

--icmp-type-and-code

Unlike TCP or UDP, ICMP messages do not use ports, but instead have types and codes. Specifying this switch will cause rwcut to print, for ICMP records, the message’s type and code in the sPort and dPort columns, respectively. Use of this switch has been discouraged since SiLK 0.9.10. As for SiLK 3.8.1, this switch is deprecated and it will be removed in SiLK 4.0; use the iType and iCode fields instead.

--timestamp-format=FORMAT

Specify the format, timezone, and/or modifier to use when printing timestamps. When this switch is not specified, the SILK_TIMESTAMP_FORMAT environment variable is checked for a format, timezone, and modifier. If it is empty or contains invalid values, timestamps are printed in the default format, and the timezone is UTC unless SiLK was compiled with local timezone support. FORMAT is a comma-separated list of a format, a timezone, and/or a modifier. The format is one of:

default

Print the timestamps as YYYY /MM/DDThh:mm:ss.sss.

iso

Print the timestamps as YYYY -MM-DD hh:mm:ss.sss.

m/d/y

Print the timestamps as MM/DD/YYYY hh:mm:ss.sss.

epoch

Print the timestamps as the number of seconds since 00:00:00 UTC on 1970-01-01.

When a timezone is specified, it is used regardless of the default timezone support compiled into SiLK. The timezone is one of:

utc

Use Coordinated Universal Time to print timestamps.

local

Use the TZ environment variable or the local timezone.

One modifier is available:

no-msec

Truncate the milliseconds value on the timestamps and on the duration field. When milliseconds are truncated, the sum of the printed start time and duration may not equal the printed end time.

--epoch-time

Print timestamps as epoch time (number of seconds since midnight GMT on 1970-01-01). This switch is equivalent to --timestamp-format=epoch, it is deprecated as of SiLK 3.0.0, and it will be removed in the SiLK 4.0 release.

--ip-format=FORMAT

Specify how IP addresses are printed. When this switch is not specified, the SILK_IP_FORMAT environment variable is checked for a format. If it is empty or contains an invalid format, IPs are printed in the canonical format. The FORMAT is one of:

canonical

Print IP addresses in their canonical form: dotted quad for IPv4 (127.0.0.1) and hexadectet for IPv6 (2001:db8::1). Note that IPv6 addresses in ::ffff:0:0/96 and some IPv6 addresses in ::/96 will be printed as a mixture of IPv6 and IPv4.

zero-padded

Print IP addresses in their canonical form, but add zeros to the output so it fully fills the width of column. The addresses 127.0.0.1 and 2001:db8::1 are printed as 127.000.000.001 and 2001:0db8:0000:0000:0000:0000:0000:0001, respectively. When the --ipv6-policy is force, the output for 127.0.0.1 becomes 0000:0000:0000:0000:0000:ffff:7f00:0001.

decimal

Print IP addresses as integers in decimal format. The addresses 127.0.0.1 and 2001:db8::1 are printed as 2130706433 and 42540766411282592856903984951653826561, respectively.

hexadecimal

Print IP addresses as integers in hexadecimal format. The addresses 127.0.0.1 and 2001:db8::1 are printed as 7f000001 and 20010db8000000000000000000000001, respectively.

force-ipv6

Print all IP addresses in the canonical form for IPv6 without using any IPv4 notation. Any IPv4 address is mapped into the ::ffff:0:0/96 netblock. The addresses 127.0.0.1 and 2001:db8::1 are printed as ::ffff:7f00:1 and 2001:db8::1, respectively.

--integer-ips

Print IP addresses as integers. This switch is equivalent to --ip-format=decimal, it is deprecated as of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.

--zero-pad-ips

Print IP addresses as fully-expanded, zero-padded values in their canonical form. This switch is equivalent to --ip-format=zero-padded, it is deprecated as of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.

--integer-sensors

Print the integer ID of the sensor rather than its name.

--integer-tcp-flags

Print the TCP flag fields (flags, initialFlags, sessionFlags) as an integer value. Typically, the characters F,S,R,P,A,U,E,C are used to represent the TCP flags.

--no-titles

Turn off column titles. By default, titles are printed.

--no-columns

Disable fixed-width columnar output.

--column-separator=C

Use specified character between columns and after the final column. When this switch is not specified, the default of ’|’ is used.

--no-final-delimiter

Do not print the column separator after the final column. Normally a delimiter is printed.

--delimited
--delimited=C

Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default ’|’.

--print-filenames

Print to the standard error the names of input files as they are opened.

--copy-input=PATH

Copy all binary SiLK Flow records read as input to the specified file or named pipe. PATH may be stdout or - to write flows to the standard output as long as the --output-path switch is specified to redirect rwcut’s textual output to a different location.

--output-path=PATH

Write the textual output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output (and bypass the paging program). If PATH names an existing file, rwcut exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is either sent to the pager or written to the standard output.

--pager=PAGER_PROG

When output is to a terminal, invoke the program PAGER_PROG to view the output one screen full at a time. This switch overrides the SILK_PAGER environment variable, which in turn overrides the PAGER variable. If the --output-path switch is given or if the value of the pager is determined to be the empty string, no paging is performed and all output is written to the terminal.

--ipv6-policy=POLICY

Determine how IPv4 and IPv6 flows are handled when SiLK has been compiled with IPv6 support. When the switch is not provided, the SILK_IPV6_POLICY environment variable is checked for a policy. If it is also unset or contains an invalid policy, the POLICY is mix. When SiLK has not been compiled with IPv6 support, IPv6 flows are always ignored, regardless of the value passed to this switch or in the SILK_IPV6_POLICY variable. The supported values for POLICY are:

ignore

Ignore any flow record marked as IPv6, regardless of the IP addresses it contains. Only records marked as IPv4 will be printed.

asv4

Convert IPv6 flow records that contain addresses in the ::ffff:0:0/96 prefix to IPv4 and ignore all other IPv6 flow records.

mix

Process the input as a mixture of IPv4 and IPv6 flow records.

force

Convert IPv4 flow records to IPv6, mapping the IPv4 addresses into the ::ffff:0:0/96 prefix.

only

Print only flow records that are marked as IPv6 and ignore IPv4 flow records in the input.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwcut searches for the site configuration file in the locations specified in the FILES section.

--legacy-timestamps
--legacy-timestamps=NUM

When NUM is not specified or is 1, this switch is equivalent to --timestamp-format=m/d/y,no-msec. Otherwise, the switch has no effect. This switch is deprecated as of SiLK 3.0.0, and it will be removed in the SiLK 4.0 release.

--xargs
--xargs=FILENAME

Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwcut opens each named file in turn and reads records from it as if the filenames had been listed on the command line.

--help

Print the available options and exit. Specifying switches that add new fields or additional switches before --help will allow the output to include descriptions of those fields or switches.

--help-fields

Print the description and alias(es) of each field and exit. Specifying switches that add new fields before --help-fields will allow the output to include descriptions of those fields.

--version

Print the version number and information about how SiLK was configured, then exit the application.

--pmap-file=MAPNAME:PATH
--pmap-file=PATH

Instruct rwcut to load the mapping file located at PATH and create the src-MAPNAME and dst-MAPNAME fields. When MAPNAME is provided explicitly, it will be used to refer to the fields specific to that prefix map. If MAPNAME is not provided, rwcut will check the prefix map file to see if a map-name was specified when the file was created. If no map-name is available, rwcut creates the fields sval and dval. Multiple --pmap-file switches are supported as long as each uses a unique value for map-name. The --pmap-file switch(es) must precede the --fields switch. For more information, see pmapfilter(3).

--pmap-column-width=NUM

When printing a label associated with a prefix map, this switch gives the maximum number of characters to use when displaying the textual value of the field.

--python-file=PATH

When the SiLK Python plug-in is used, rwcut reads the Python code from the file PATH to define additional fields for possible output. This file should call register_field() for each field it wishes to define. For details and examples, see the silkpython(3) and pysilk(3) manual pages.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

The standard output from rwcut resembles the following (with the text wrapped for readability):

            sIP|            dIP|sPort|dPort|pro|\  
    10.30.30.31|    10.70.70.71|   80|36761|  6|\

        packets|     bytes|    flags|\  
              7|      3227|FS PA    |\

                    sTime| duration|                  eTime|senso|  
  2003/01/01T00:00:14.625|    3.959|2003/01/01T00:00:18.584|EDGE1|

The first line of the output is the title line which shows the names of the selected fields; the --no-titles switch will disable the printing of the title line. The second line and onward will contain the printed representation of the records, with one line per record.

A common use of rwcut is to read the output of rwfilter(1). For example, to see representative TCP traffic:

 $ rwfilter --start-date=2002/01/19:00 --end-date=2002/01/19:01     \  
        --proto=6 --pass=stdout                                     \  
   | rwcut

To see only selected fields, use the --fields switch. For example, to print only the protocol for each record in the input file data.rw, use:

 $ rwcut --fields=proto  data.rw

The silkpython(3) manual page provides examples that use PySiLK to create and print arbitrary fields for rwcut.

The order of the FIELDS is significant, and fields can be repeated. For example, here is a case where in addition to the default fields of 1-12, you also to prefix each row with an integer form of the destination IP and the start time to make processing by another tool (e.g., a spreadsheet) easier. However, within the default fields of 1-12, you want to see dotted-decimal IP addresses. (The num2dot(1) tool converts the numeric fields in column positions three and four to dotted quad IPs.)

 $ rwfilter ... --pass=stdout \  
   | rwcut --fields=2,9,1-12 --ip-format=decimal --timestamp-format=epoch \  
   | num2dot --ip-field=3,4

Both of the following commands print the title line and the first record in the input stream:

 $ rwcut --num-recs=1  data.rw

 $ rwcut --end-rec-num=1  data.rw

The following prints all records except the first (plus the title):

 $ rwcut --start-rec-num=2  data.rw

These three commands print only the second record:

 $ rwcut --no-title --start-rec-num=2 --num-recs=1  data.rw

 $ rwcut --no-title --start-rec-num=2 --end-rec-num=2  data.rw

 $ rwcut --no-title --end-rec-num=2 --num-recs=1  data.rw

This command prints the title line and the final record in the input stream:

 $ rwcut --tail-recs=1  data.rw

This command prints the next to last record in the input stream:

 $ rwcut --no-title --tail-recs=2 --num-recs=1  data.rw

ENVIRONMENT

SILK_IPV6_POLICY

This environment variable is used as the value for --ipv6-policy when that switch is not provided.

SILK_IP_FORMAT

This environment variable is used as the value for --ip-format when that switch is not provided. Since SiLK 3.11.0.

SILK_TIMESTAMP_FORMAT

This environment variable is used as the value for --timestamp-format when that switch is not provided. Since SiLK 3.11.0.

SILK_PAGER

When set to a non-empty string, rwcut automatically invokes this program to display its output a screen at a time. If set to an empty string, rwcut does not automatically page its output.

PAGER

When set and SILK_PAGER is not set, rwcut automatically invokes this program to display its output a screen at a time.

PYTHONPATH

This environment variable is used by Python to locate modules. When --python-file is specified, rwcut must load the Python files that comprise the PySiLK package, such as silk/__init__.py. If this silk/ directory is located outside Python’s normal search path (for example, in the SiLK installation tree), it may be necessary to set or modify the PYTHONPATH environment variable to include the parent directory of silk/ so that Python can find the PySiLK module.

SILK_PYTHON_TRACEBACK

When set, Python plug-ins will output traceback information on Python errors to the standard error.

SILK_COUNTRY_CODES

This environment variable allows the user to specify the country code mapping file that rwcut uses when computing the scc and dcc fields. The value may be a complete path or a file relative to the SILK_PATH. See the FILES section for standard locations of this file.

SILK_ADDRESS_TYPES

This environment variable allows the user to specify the address type mapping file that rwcut uses when computing the sType and dType fields. The value may be a complete path or a file relative to the SILK_PATH. See the FILES section for standard locations of this file.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwcut may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files and plug-ins, rwcut may use this environment variable. See the FILES section for details.

TZ

When the argument to the --timestamp-format switch includes local or when a SiLK installation is built to use the local timezone, the value of the TZ environment variable determines the timezone in which rwcut displays timestamps. (If both of those are false, the TZ environment variable is ignored.) If the TZ environment variable is not set, the machine’s default timezone is used. Setting TZ to the empty string or 0 causes timestamps to be displayed in UTC. For system information on the TZ variable, see tzset(3) or environ(7). (To determine if SiLK was built with support for the local timezone, check the Timezone support value in the output of rwcut --version.)

SILK_PLUGIN_DEBUG

When set to 1, rwcut prints status messages to the standard error as it attempts to find and open each of its plug-ins. In addition, when an attempt to register a field fails, rwcut prints a message specifying the additional function(s) that must be defined to register the field in rwcut. Be aware that the output can be rather verbose.

FILES

$SILK_ADDRESS_TYPES
$SILK_PATH/share/silk/address_types.pmap
$SILK_PATH/share/address_types.pmap
/usr/local/share/silk/address_types.pmap
/usr/local/share/address_types.pmap

Possible locations for the address types mapping file required by the sType and dType fields.

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

$SILK_COUNTRY_CODES
$SILK_PATH/share/silk/country_codes.pmap
$SILK_PATH/share/country_codes.pmap
/usr/local/share/silk/country_codes.pmap
/usr/local/share/country_codes.pmap

Possible locations for the country code mapping file required by the scc and dcc fields.

${SILK_PATH}/lib64/silk/
${SILK_PATH}/lib64/
${SILK_PATH}/lib/silk/
${SILK_PATH}/lib/
/usr/local/lib64/silk/
/usr/local/lib64/
/usr/local/lib/silk/
/usr/local/lib/

Directories that rwcut checks when attempting to load a plug-in.

NOTES

If you are interested in only a few fields, use the --fields option to reduce the volume of data to be produced. For example, if you are checking to see which internal host got hit with the slammer worm (signature: UDP, destPort 1434, pkt size 404), then the following rwfilter, rwcut combination will be much faster than simply using default values:

 $ rwfilter --proto-17 --dport=1434 --bytes-per-packet=404-404      \  
   | rwcut --fields=dip,stime

SEE ALSO

rwfilter(1), num2dot(1), addrtype(3), ccfilter(3), pmapfilter(3), silk-plugin(3), silkpython(3), pysilk(3), sensor.conf(5), silk(7), yaf(1), dlopen(3), tzset(3), environ(7)

rwdedupe

Eliminate duplicate SiLK Flow records

SYNOPSIS

  rwdedupe [--ignore-fields=FIELDS] [--packets-delta=NUM]  
        [--bytes-delta=NUM] [--stime-delta=NUM] [--duration-delta=NUM]  
        [--temp-directory=DIR_PATH] [--buffer-size=SIZE]  
        [--note-add=TEXT] [--note-file-add=FILE]  
        [--compression-method=COMP_METHOD] [--print-filenames]  
        [--output-path=PATH] [--site-config-file=FILENAME]  
        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}

  rwdedupe --help

  rwdedupe --help-fields

  rwdedupe --version

DESCRIPTION

rwdedupe reads SiLK Flow records from one or more input sources. Records that appear in the input file(s) multiple times will only appear in the output stream once; that is, duplicate records are not written to the output. The SiLK Flows are written to the file specified by the --output-path switch or to the standard output when the --output-path switch is not provided and the standard output is not connected to a terminal.

Note: As part of its processing, rwdedupe re-orders the records before writing them.

rwdedupe reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwdedupe reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.

By default, rwdedupe will consider one record to be a duplicate of another when all the fields in the records match exactly. From another point on view, any difference in two records results in both records appearing in the output. Note that all means every field that exists on a SiLK Flow record. The complete list of fields is specified in the description of --ignore-fields in the OPTIONS section below.

To have rwdedupe ignore fields in the comparison, specify those fields in the --ignore-fields switch. When --ignore-fields=FIELDS is specified, a record is considered a duplicate of another if all fields except those in FIELDS match exactly. rwdedupe will treat FIELDS as being identical across all records. Put another way, if the only difference between two records is in the FIELDS fields, only one of those records will be written to the output.

The --packets-delta, --bytes-delta, --stime-delta and --duration-delta switches allow for ”fuzziness” in the input. For example, if --stime-delta=NUM is specified and the only difference between two records is in the sTime fields, and the fields are within NUM milliseconds of each other, only one record will be written to the output.

During its processing, rwdedupe will try to allocate a large (near 2GB) in-memory array to hold the records. (You may use the --buffer-size switch to change this maximum buffer size.) If more records are read than will fit into memory, the in-core records are temporarily stored on disk as described by the --temp-directory switch. When all records have been read, the on-disk files are merged to produce the output.

By default, the temporary files are stored in the /tmp directory. Because of the sizes of the temporary files, it is strongly recommended that /tmp not be used as the temporary directory, and rwdedupe will print a warning when /tmp is used. To modify the temporary directory used by rwdedupe, provide the --temp-directory switch, set the SILK_TMPDIR environment variable, or set the TMPDIR environment variable.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--ignore-fields=FIELDS

Ignore the fields listed in FIELDS when determining if two flow records are identical; that is, treat FIELDS as being identical across all flows. By default, all fields are treated as significant.

FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Field-names are case-insensitive. Example:

 --ignore-fields=stime,12-15

The list of supported fields are:

sIP,1

source IP address

dIP,2

destination IP address

sPort,3

source port for TCP and UDP, or equivalent

dPort,4

destination port for TCP and UDP, or equivalent

protocol,5

IP protocol

packets,pkts,6

packet count

bytes,7

byte count

flags,8

bit-wise OR of TCP flags over all packets

sTime,9

starting time of flow (milliseconds resolution)

duration,10

duration of flow (milliseconds resolution)

sensor,12

name or ID of sensor at the collection point

in,13

router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))

out,14

router SNMP output interface or postVlanId

nhIP,15

router next hop IP

class,20,type,21

class and type of sensor at the collection point (represented internally by a single value)

initialFlags,26

TCP flags on first packet in the flow

sessionFlags,27

bit-wise OR of TCP flags over all packets except the first in the flow

attributes,28

flow attributes set by flow generator

application,29

guess as to the content of the flow. Some software that generates flow records from packet data, such as yaf(1), will inspect the contents of the packets that make up a flow and use traffic signatures to label the content of the flow. SiLK calls this label the application; yaf refers to it as the appLabel. The application is the port number that is traditionally used for that type of traffic (see the /etc/services file on most UNIX systems). For example, traffic that the flow generator recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard HTTP/web port (80).

--packets-delta=NUM

Treat the packets field on two records as being the same if the values differ by NUM packets or less. If not specified, the default is 0.

--bytes-delta=NUM

Treat the bytes field on two records as being the same if the values differ by NUM bytes or less. If not specified, the default is 0.

--stime-delta=NUM

Treat the start-time field on two records as being the same if the values differ by NUM milliseconds or less. If not specified, the default is 0.

--duration-delta=NUM

Treat the duration field on two records as being the same if the values differ by NUM milliseconds or less. If not specified, the default is 0.

--temp-directory=DIR_PATH

Specify the name of the directory in which to store data files temporarily when more records have been read that will fit into RAM. This switch overrides the directory specified in the SILK_TMPDIR environment variable, which overrides the directory specified in the TMPDIR variable, which overrides the default, /tmp.

--buffer-size=SIZE

Set the maximum size of the buffer to use for holding the records, in bytes. A larger buffer means fewer temporary files need to be created, reducing the I/O wait times. The default maximum for this buffer is near 2GB. The SIZE may be given as an ordinary integer, or as a real number followed by a suffix K, M or G, which represents the numerical value multiplied by 1,024 (kilo), 1,048,576 (mega), and 1,073,741,824 (giga), respectively. For example, 1.5K represents 1,536 bytes, or one and one-half kilobytes. (This value does not represent the absolute maximum amount of RAM that rwdedupe will allocate, since additional buffers will be allocated for reading the input and writing the output.)

--output-path=PATH

Write the binary SiLK Flow records to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwdedupe exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwdedupe to exit with an error.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--print-filenames

Print to the standard error the names of input files as they are opened.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwdedupe searches for the site configuration file in the locations specified in the FILES section.

--xargs
--xargs=FILENAME

Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwdedupe opens each named file in turn and reads records from it as if the filenames had been listed on the command line.

--help

Print the available options and exit.

--help-fields

Print the description and alias(es) of each field and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

LIMITATIONS

When the temporary files and the final output are stored on the same file volume, rwdedupe will require approximately twice as much free disk space as the size of input data.

When the temporary files and the final output are on different volumes, rwdedupe will require between 1 and 1.5 times as much free space on the temporary volume as the size of the input data.

EXAMPLE

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line.

Suppose you have made several rwfilter(1) runs to find interesting traffic:

 $ rwfilter --start-date=2008/02/04 ... --pass=data1.rw  
 $ rwfilter --start-date=2008/02/04 ... --pass=data2.rw  
 $ rwfilter --start-date=2008/02/04 ... --pass=data3.rw  
 $ rwfilter --start-date=2008/02/04 ... --pass=data4.rw

You now want to merge that traffic into a single output file, but you want to ensure that any records appearing in multiple output files are only counted once. You can use rwdedupe to merge the output files to a single file, data.rw:

 $ rwdedupe data1.rw data2.rw data3.rw data4.rw --output=data.rw

ENVIRONMENT

SILK_TMPDIR

When set and --temp-directory is not specified, rwdedupe writes the temporary files it creates to this directory. SILK_TMPDIR overrides the value of TMPDIR.

TMPDIR

When set and SILK_TMPDIR is not set, rwdedupe writes the temporary files it creates to this directory.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwdedupe may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwdedupe may use this environment variable. See the FILES section for details.

SILK_TEMPFILE_DEBUG

When set to 1, rwdedupe prints debugging messages to the standard error as it creates, re-opens, and removes temporary files.

FILES

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

${SILK_TMPDIR}/
${TMPDIR}/
/tmp/

Directory in which to create temporary files.

SEE ALSO

rwfilter(1), rwfileinfo(1), sensor.conf(5), silk(7), yaf(1), zlib(3)

rwfglob

Print files that rwfilter’s File Selection switches will access

SYNOPSIS

  rwfglob { [--class=CLASS] [--type={all | TYPE[,TYPE ...]}]  
            | [--flowtypes=CLASS/TYPE[,CLASS/TYPE ...]] }  
        [--sensors=SENSOR[,SENSOR ...]]  
        [--start-date=YYYY/MM/DD[:HH] [--end-date=YYYY/MM/DD[:HH]]]  
        [--data-rootdir=ROOT_DIRECTORY] [--site-config-file=FILENAME]  
        [--print-missing-files] [--no-block-check] [--no-file-names]  
        [--no-summary]

  rwfglob [--data-rootdir=ROOT_DIRECTORY]  
        [--site-config-file=FILENAME] --help

  rwfglob --version

DESCRIPTION

rwfglob accepts the normal File Selection options of rwfilter(1) and prints, to the standard output, the names of the files that would normally be accessed, one file name per line. At the end, a summary is printed, to the standard output, of the number of files that rwfglob found. To suppress the printing of the file names and/or the summary, specify the --no-file-names and/or --no-summary switches, respectively.

By default, rwfglob only prints the names of files that exist. When the --print-missing-files switch is provided, rwfglob prints, to the standard error, the names of files that it did not find, one file name per line, preceded by the text ’Missing ’.

For each file it finds, rwfglob will check the size of the file and the number of blocks allocated to the file. If the block count is zero but the file size is non-zero, rwfglob treats the file as existing but as residing on tape. The names of these files are printed to the standard output, but each name is preceded by the text ’  \t*** ON_TAPE ***’ where ’\t’ represents a tab character. The summary line will include the number of files that rwfglob believes are on tape. To suppress this check and to remove the count from the summary line, use the --no-block-check switch.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

Selection Switches

This set of switches are the same as those used by rwfilter to select the files to process. At least one of these switches must be provided.

--class=CLASS

The --class switch is used to specify a group of files to print. Only a single class may be selected with the --class switch; for multiple classes, use the --flowtypes switch. Classes are defined in the silk.conf(5) site configuration file. If the --class option is not given, the default-class as specified in silk.conf is used. To see the available classes and the default class, either examine the output from rwfglob --help or invoke rwsiteinfo(1) with the switch --fields=class,default-class.

--type={all | TYPE[,TYPE]}

The --type predicate further specifies data within the selected CLASS by listing the TYPEs of traffic to process. The switch takes a comma-separated list of types or the keyword all which specifies all types for the specified CLASS. Types are defined in silk.conf, they typically refer to the direction of the flow, and they may vary by class. When the --type switch is not specified, a list of default types is used. The default-type list is determined by the value of CLASS, and the default types generally include only incoming traffic. To see the available types and the default types for each class, examine the --help output of rwfglob or run rwsiteinfo with --fields=class,type,default-type.

--flowtypes=CLASS/TYPE[,CLASS/TYPE
...]

The --flowtypes predicate provides an alternate way to specify class/type pairs. The --flowtypes switch allows a single rwfglob invocation to print data from multiple classes. The keyword all may be used for the CLASS and/or TYPE to select all classes and/or types.

--sensors=SENSOR[,SENSOR
...]

The --sensor switch is used to select data from specific sensors. The parameter is a comma separated list of sensor names, sensor IDs (integers), and/or ranges of sensor IDs. Sensors are defined in the silk.conf(5) site configuration file, and the rwsiteinfo(1) command can be used to print a mapping of sensor names to IDs and classes. When the --sensor switch is not specified, the default is to use all sensors which are valid for the specified class(es).

--start-date=YYYY/MM/DD[:HH]
--end-date=YYYY/MM/DD[:HH]

The date predicates indicate which days and hours to consider when creating the list of files. The dates may be expressed as seconds since the UNIX epoch or in YYYY/MM/DD[:HH] format, where the hour is optional. A T may be used in place of the : to separate the day and hour. Whether the YYYY/MM/DD[:HH] strings represent times in UTC or the local timezone depend on how SiLK was compiled. To determine how your version of SiLK was compiled, see the Timezone support setting in the output from rwfglob --version.

When times are expressed in YYYY/MM/DD[:HH] format:

When at least one time is expressed as seconds since the UNIX epoch:

When neither --start-date nor --end-date is given, rwfglob prints all files for the current day.

It is an error to specify --end-date without specifying --start-date.

--data-rootdir=ROOT_DIRECTORY

Tell rwfglob to use ROOT_DIRECTORY as the root of the data repository, which overrides the location given in the SILK_DATA_ROOTDIR environment variable, which in turn overrides the location that was compiled into rwfglob (/data).

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwfglob searches for the site configuration file in the locations specified in the FILES section.

--print-missing-files

This option prints to the standard error the names of the files that rwfglob expected to find but did not. The file names are preceded by the text ’Missing ’; each file name appears on a separate line. This switch is useful for debugging, but the list of files it produces can be misleading. For example, suppose there is a decommissioned sensor that still appears in the silk.conf file; rwfglob considers these data files as missing even though their absence is expected. Use the output from this switch judiciously.

Application Switches
--no-block-check

This option instructs rwfglob not to check whether the file exists on tape by checking whether the number of blocks allocated to the file is zero. By default, rwfglob precedes a file name that has a block count of 0 with the text ’  \t*** ON_TAPE ***’.

--no-file-names

This option instructs rwfglob not to print the names of the files that it successfully finds. By default, rwfglob prints the names of the files it finds and a summary line showing the number of files it found. When both this switch and --print-missing-files are specified, rwfglob prints only the names of missing files (and the summary).

--no-summary

This option instructs rwfglob not to print the summary line (that is, the line that shows the number of files found). By default, rwfglob prints the names of the files it finds and a summary line showing the number of files it found.

--help

Print the available options and exit. The available classes and types will be included in output; you may specify a different root directory or site configuration file before --help to see the classes and types available for that site.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line.

Looking at a day on a single sensor:

 $ rwfglob --start=2003/10/11 --sensor=2  
 /data/in/2003/10/11/in-GAMMA_20031011.23  
 /data/in/2003/10/11/in-GAMMA_20031011.22  
 /data/in/2003/10/11/in-GAMMA_20031011.21  
 /data/in/2003/10/11/in-GAMMA_20031011.20  
 /data/in/2003/10/11/in-GAMMA_20031011.19  
 /data/in/2003/10/11/in-GAMMA_20031011.18  
 /data/in/2003/10/11/in-GAMMA_20031011.17  
 /data/in/2003/10/11/in-GAMMA_20031011.16  
 /data/in/2003/10/11/in-GAMMA_20031011.15  
 /data/in/2003/10/11/in-GAMMA_20031011.14  
 /data/in/2003/10/11/in-GAMMA_20031011.13  
 /data/in/2003/10/11/in-GAMMA_20031011.12  
 /data/in/2003/10/11/in-GAMMA_20031011.11  
 /data/in/2003/10/11/in-GAMMA_20031011.10  
 /data/in/2003/10/11/in-GAMMA_20031011.09  
 /data/in/2003/10/11/in-GAMMA_20031011.08  
 /data/in/2003/10/11/in-GAMMA_20031011.07  
 /data/in/2003/10/11/in-GAMMA_20031011.06  
 /data/in/2003/10/11/in-GAMMA_20031011.05  
 /data/in/2003/10/11/in-GAMMA_20031011.04  
 /data/in/2003/10/11/in-GAMMA_20031011.03  
 /data/in/2003/10/11/in-GAMMA_20031011.02  
 /data/in/2003/10/11/in-GAMMA_20031011.01  
 /data/in/2003/10/11/in-GAMMA_20031011.00  
 globbed 24 files; 0 on tape

If you only want the summary, specify --no-file-names

 $ rwfglob --start-date=2003/10/11 --sensor=2 --no-file-names  
 globbed 24 files; 0 on tape

ENVIRONMENT

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. This value overrides the compiled-in value, and rwfglob uses it unless the --data-rootdir switch is specified. In addition, rwfglob may use this value when searching for the SiLK site configuration file. See the FILES section for details.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwfglob may use this environment variable. See the FILES section for details.

TZ

When a SiLK installation is built to use the local timezone (to determine if this is the case, check the Timezone support value in the output from rwfglob --version), the value of the TZ environment variable determines the timezone in which rwfglob parses timestamps. (The date on the filenames that rwfglob returns are always in UTC.) If the TZ environment variable is not set, the default timezone is used. Setting TZ to 0 or the empty string causes timestamps to be parsed as UTC. The value of the TZ environment variable is ignored when the SiLK installation uses utc. For system information on the TZ variable, see tzset(3) or environ(7).

FILES

${SILK_CONFIG_FILE}
ROOT_DIRECTORY/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided, where ROOT_DIRECTORY/ is the directory rwfglob is using as the root of the data repository.

${SILK_DATA_ROOTDIR}/
/data/

Locations for the root directory of the data repository when the --data-rootdir switch is not specified.

SEE ALSO

rwfilter(1), rwsiteinfo(1), silk.conf(5), silk(7), tzset(3), environ(7)

BUGS

The --print-missing-files option needs to be smarter about what files are really missing.

The output of --print-missing-files goes to the standard error, while all other output goes to the standard output. To redirect the output of --print-missing-files to the standard output, use the following in a Bourne-compatible shell:

 $ rwfglob --print-missing-files ... 2>&1

The block count check is of unknown portability across different tape-farm systems.

rwfileinfo

Print information about a SiLK file

SYNOPSIS

  rwfileinfo [--fields=FIELDS] [--summary] [--no-titles]  
        [--site-config-file=FILENAME]  
        {--xargs | --xargs=FILENAME | FILE [FILE...]}

  rwfileinfo --help

  rwfileinfo --help-fields

  rwfileinfo --version

DESCRIPTION

rwfileinfo prints information about a binary SiLK file that can be determined by reading the file’s header and by moving quickly over the data blocks in the file.

rwfileinfo requires one or more filename arguments to be given on the command line or the use of the --xargs switch. When the --xargs switch is provided, rwfileinfo reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line. rwfileinfo does not read a SiLK file’s content from the standard input by default, but it does when either - or stdin is given as a filename argument.

When the --summary switch is given, rwfileinfo first prints the information for each individual file and then prints the number of files processed, the sum of the individual file sizes, and the sum of the individual record counts.

Field Descriptions

By default, rwfileinfo prints the following information for each file argument. Use the --fields switch to modify which pieces of information are printed.

(rwfileinfo prints each field in the order in which support for that field was added to SiLK. The field descriptions are presented here in a more logical order.)

file-size

The size of the file on disk as reported by the operating system. rwfileinfo prints 0 for the file-size when reading from the standard input.

version

Every binary file written by SiLK has a version number field. Since SiLK 1.0.0, the version number field has been used to indicate the general structure (or layout) of the file. The file structure adopted in SiLK 1.0.0 uses a version number of 16 and has a header section and a data section. The header section begins with 16 bytes that specify well-defined values, and those bytes are followed by one or more variably-sized header entries. The specifics of the data section depend on the content of the file.

header-length

The header-length field shows the number of octets required by header (i.e., the initial 16 bytes and the header entries). Since everything after the header is data, the header-length is the starting offset of the data section. The smallest header length is 24 bytes, but typically the header is padded to be an integer multiple of the record-length. The header-length that rwfileinfo prints for a file is determined dynamically by reading the file’s header.

silk-version

When a SiLK tool creates a binary file, the tool writes the current SiLK release number (such as 3.9.0) into the file’s header as a way to help diagnose issues should a bug with a particular release of SiLK be discovered in the future.

byte-order

Every SiLK file has a byte-order or endian field. SiLK uses the machine’s native representation of integers when writing data, and this field shows what representation the file contains. BigEndian is network byte order and littleEndian is used by Intel chips. The rwswapbytes(1) tool changes a file’s integer representation, and some tools have a --byte-order switch that allows the user to specify the integer representation of output files. The header-section of a file is always written in network byte order.

compression

SiLK tools may use the zlib library (http://zlib.net/), the LZO library (http://www.oberhumer.com/opensource/lzo/), or the snappy library (http://google.github.io/snappy/) to compress the data section of a file. The compression field specifies which library (if any) was used to compress the data section. If a file is compressed with a library that was not included in an installation of SiLK, SiLK is unable to read the data section of the file. Many SiLK tools accept the --compression-method switch to choose a particular compression method. (The compression field does not indicate whether the entire file has been compressed with an external compression utility such as gzip(1).)

format

Every binary file written by SiLK has two fields in the header that specify exactly what the file contains: the format and the record-version. In general, the format indicates the content type of the file and the record-version indicates the evolution of that content.

The contents of a file whose format is FT_IPSET, FT_RWBAG, or FT_PREFIXMAP is fairly obvious (an IPset, a Bag, a prefix map).

There are many different file formats for writing SiLK Flow records, but the SiLK analysis tools largely use a single Flow file format. That format is FT_RWIPV6ROUTING if SiLK has been compiled with IPv6 support, or FT_RWGENERIC otherwise. A file that uses the FT_RWGENERIC format is only capable of holding IPv4 addresses.

The other SiLK Flow file formats are created by rwflowpack(8) as it writes flow records to the repository. These formats often omit fields and use reduced bit-sizes for fields to reduce the space required for an individual flow record.

The record-version field indicates changes within the general type specified by the format field. For example, SiLK incremented the record-version of the formats that hold flow records when the resolution of record timestamps was changed from seconds to milliseconds.

record-version

Together with the format fields specifies the contents of the file. See the discussion of format for details.

record-length

Files created by SiLK 1.0.0 and later have a record length field. This field contains the length of an individual record, and this value is dependent on the format and record-version fields described above. Some files (such as those containing IPsets or prefix maps) do not write individual records to the output, and the record length is 1 for these files.

count-records

The count-records field is generated dynamically by determining the length the data section would require if it were completely uncompressed and dividing it by the record-length. When the record-length is 1 (such as for IPset files), the count-records field does not provide much information beyond the length of the uncompressed data. For an uncompressed file, adding header-length to the product of count-records and record-length is equal to the file-size.

The fields given above are either present in the well-defined header or are computed by reading the file.

The following fields are generated by reading the header entries and determining if one or more header entries of the specified type are present. The field is not printed in the output when the header entry is not present in the file.

command-lines

Many of the SiLK tools write a header entry to the output file that contains the command line invocation used to create that file, and some of the SiLK tools also copy the command line history from their input files to the output file. (The --invocation-strip switch on the tools can be used to prevent copying and recording of the invocation.) The command lines are stored in individual header entries and this field displays those entries with the most recent invocation at the end of the list.

The command line history is has a couple of issues:

annotations

Most of SiLK tools that create binary output files provide the --note-add and --note-file-add switches which allow an arbitrary annotation to be added to the header of a file. Some tools also copy the annotations from the source files to the destination files. The annotations are stored in individual header entries and this field displays those entries.

ipset

SiLK 3.0.0 and SiLK 3.7.0 introduced new output formats for IPset data structures, and these formats are denoted by record-versions 3 and 4, respectively. (To select these formats, use the --record-version switch on rwset(1), rwsetbuild(1), or rwsettool(1), or use the --ipset-record-version switch on rwbagtool(1).) When the record-version is 3, the file contains a version of the IPset data structure that can be read directly into memory, and the file contains a header entry that specifies the number of nodes, the number of branches from each node, the number of leaves, the size of the nodes and leaves, and which node is the root of the tree. When the record-version is 4, the header entry specifies whether the file contains IPv4 addresses or IPv6 addresses.

bag

Since SiLK 3.0.0, the tools that write binary Bag files (rwbag(1), rwbagbuild(1), and rwbagtool(1)) have written a header entry that specifies the type and size of the key and of the counter in the file.

prefix-map

When using rwpmapbuild(1) to create a prefix map file, a string that specifies a mapname may be provided. rwpmapbuild writes the mapname to a header entry in the prefix map file. The mapname is used to generate command line switches or field names when the --pmap-file switch is specified to several of the SiLK tools (see pmapfilter(3) for details). When displaying the mapname, rwfileinfo prefixes it with the string v1: which denotes a version number for the prefix-map header entry. (The version number is printed for completeness.)

packed-file-info

When rwflowpack(8) creates a SiLK Flow file for the repository, all the records in the file have the same starting hour, the same sensor, and the same flowtype (class/type pair). rwflowpack writes a header entry to the file that contains these values, and this field displays those values. (To print the names for the sensor and flowtype, the silk.conf(5) file must be accessible.)

probe-name

When flowcap(8) creates a SiLK flow file, it adds a header entry specifying the name of the probe from which the data was collected.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--fields=FIELDS

Specify what information to print for each file argument on the command line. FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Field-names are case-insensitive and may be shortened to a unique prefix. When the --fields option is not given, all fields are printed if the file contains the necessary information. The fields are always printed in the order they appear here regardless of the order they are specified in FIELDS.

The possible field values are given next with a brief description of each. For a full description of each field, see Field Descriptions above.

format,1

The contents of the file as a name and the corresponding hexadecimal ID.

version,2

An integer describing the layout or structure of the file.

byte-order,3

Either BigEndian or littleEndian to indicate the representation used to store integers in the file (network or non-network byte order).

compression,4

The compression library (if any) used to compress the data-section of the file, specified as a name and its decimal ID.

header-length,5

The octet length of the file’s header; alternatively the offset where data begins.

record-length,6

The octet length of a single record or the value 1 if the file’s content is not record-based.

count-records,7

The number of records in the file, computed by dividing the uncompressed data length by the record-length.

file-size,8

The size of the file on disk as reported by the operating system.

command-lines,9

The command line invocation used to generate this file.

record-version,10

The version of the records contained in the file.

silk-version,11

The release of SiLK that wrote this file.

packed-file-info,12

For a repository Flow file generated by rwflowpack(8), this prints the timestamp of the starting hour, the flowtype, and the sensor of each flow record in the file.

probe,13

For a Flow file generated by flowcap(8), the name of the probe where the flow records where initially collected.

annotations,14

The notes (annotations) that users have added to the file’s header.

prefix-map,15

For a prefix map file, the mapname that was set when the file was created by rwpmapbuild(1).

ipset,16

For an IPset file whose record-version is 3, a description of the tree data structure. For an IPset file whose record-version is 4, the type of IP addresses (IPv4 or IPv6).

bag,17

For a bag file, the type and size of the key and of the counter.

aggregate-bag,18

For an aggregate bag file, the field types that comprise the key and the counter.

--summary

After the data for each individual file is printed, print a summary that shows the number of files processed, the sum of the individual file sizes, and the total number of records contained in those files.

--no-titles

Suppress printing of the file name and field names. The output contains only the values, where each value is printed left-justified on a single line.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwfileinfo searches for the site configuration file in the locations specified in the FILES section.

--xargs
--xargs=FILENAME

Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwfileinfo opens each named file in turn and prints its information as if the filenames had been listed on the command line. Since SiLK 3.15.0.

--help

Print the available options and exit.

--help-fields

Print a description of each field, its alias, and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLE

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line.

Get information about the file tcp-data.rw:

 $ rwfileinfo tcp-data.rw  
 tcp-data.rw:  
   format(id)          FT_RWGENERIC(0x16)  
   version             16  
   byte-order          littleEndian  
   compression(id)     none(0)  
   header-length       208  
   record-length       52  
   record-version      5  
   silk-version        1.0.1  
   count-records       7  
   file-size           572  
   command-lines  
                    1  rwfilter --proto=6 --pass=tcp-data.rw ...  
   annotations  
                    1  This is some interesting TCP data

Return a single value which is the number of records in the file tcp-data.rw:

 $ rwfileinfo --no-titles --field=count-records tcp-data.rw  
 7

ENVIRONMENT

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwfileinfo may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwfileinfo may use this environment variable. See the FILES section for details.

FILES

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwfilter(1), rwbag(1), rwbagbuild(1), rwbagtool(1), rwpmapbuild(1), rwset(1), rwsetbuild(1), rwsettool(1) rwswapbytes(1), silk.conf(5), pmapfilter(3), flowcap(8), rwflowpack(8), silk(7), gzip(1)

rwfilter

Choose which SiLK Flow records to process

SYNOPSIS

  rwfilter INPUT_ARGS OUTPUT_ARGS PARTITIONING_ARGS [MISC_ARGS]

Selection switches, input switches, or input files are required:

  rwfilter ...  
        {{ [--class=CLASS] [--type={all | TYPE[,TYPE ...]}]  
           | [--flowtypes=CLASS/TYPE[,CLASS/TYPE ...]] }  
         [--sensors=SENSOR[,SENSOR ...]]  
         [--start-date=YYYY/MM/DD[:HH] [--end-date=YYYY/MM/DD[:HH]]]  
         [--data-rootdir=ROOT_DIRECTORY] [--print-missing-files] }  
        | [--input-pipe=INPUT_PATH]  
        | [--xargs] | [--xargs=INPUT_PATH]  
        | [INPUT_PATH [INPUT_PATH...]]

One or more output switches are required:

  rwfilter ...  
        [--all-destination=ALL_PATH [--all-destination=ALL_PATH ...]]  
        [--fail-destination=FAIL_PATH [--fail-destination=FAIL_PATH ...]]  
        [--pass-destination=PASS_PATH [--pass-destination=PASS_PATH ...]]  
        [{ --print-statistics[=STATS_PATH]  
           | --print-volume-statistics[=STATS_PATH] }]

One or more partitioning switches are required:

  rwfilter ...  
        [--ack-flag=SCALAR] [--active-time=TIME_WINDOW]  
        [{--any-address=IP_WILDCARD | --not-any-address=IP_WILDCARD}]  
        [--any-cc=COUNTRY_CODE_LIST]  
        [{--any-cidr=IP_OR_CIDR_LIST | --not-any-cidr=IP_OR_CIDR_LIST}]  
        [--any-index=INTEGER_LIST]  
        [{--anyset=IP_SET_FILENAME | --not-anyset=IP_SET_FILENAME}]  
        [--aport=INTEGER_LIST] [--application=INTEGER_LIST]  
        [--attributes=ATTRIBUTES_LIST]  
        [--bytes=INTEGER_RANGE] [--bytes-per-packet=DECIMAL_RANGE]  
        [--cwr-flag=SCALAR]  
        [{--daddress=IP_WILDCARD | --not-daddress=IP_WILDCARD}]  
        [--dcc=COUNTRY_CODE_LIST]  
        [{--dcidr=IP_OR_CIDR_LIST | --not-dcidr=IP_OR_CIDR_LIST}]  
        [{--dipset=IP_SET_FILENAME | --not-dipset=IP_SET_FILENAME}]  
        [--dport=INTEGER_LIST] [--dtype=SCALAR]  
        [--duration=DECIMAL_RANGE] [--ece-flag=SCALAR]  
        [--etime=TIME_WINDOW] [--fin-flag=SCALAR]  
        [--flags-all=HIGH_MASK_FLAGS_LIST]  
        [--flags-initial=HIGH_MASK_FLAGS_LIST]  
        [--flags-session=HIGH_MASK_FLAGS_LIST]  
        [--icmp-code=INTEGER_LIST] [--icmp-type=INTEGER_LIST]  
        [--input-index=INTEGER_LIST] [--ip-version=INTEGER_LIST]  
        [--ipa-src-expr=IPA_EXPR] [--ipa-dst-expr=IPA_EXPR]  
        [--ipa-any-expr=IPA_EXPR]  
        [{--next-hop-id=IP_WILDCARD | --not-next-hop-id=IP_WILDCARD}]  
        [{--nhcidr=IP_OR_CIDR_LIST | --not-nhcidr=IP_OR_CIDR_LIST}]  
        [{--nhipset=IP_SET_FILENAME | --not-nhipset=IP_SET_FILENAME}]  
        [--output-index=INTEGER_LIST] [--packets=INTEGER_RANGE]  
        [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]  
         { [--pmap-src-MAPNAME=LABELS] [--pmap-dst-MAPNAME=LABELS]  
           [--pmap-any-MAPNAME=LABELS] } ]  
        [--protocol=INTEGER_LIST] [--psh-flag=SCALAR]  
        [--python-expr=PYTHON_EXPR]  
        [--python-file=FILENAME [--python-file=FILENAME ...]]  
        [--rst-flag=SCALAR]  
        [{--saddress=IP_WILDCARD | --not-saddress=IP_WILDCARD}]  
        [--scc=COUNTRY_CODE_LIST]  
        [{--scidr=IP_OR_CIDR_LIST | --not-scidr=IP_OR_CIDR_LIST}]  
        [{--sipset=IP_SET_FILENAME | --not-sipset=IP_SET_FILENAME}]  
        [--sport=INTEGER_LIST] [--stime=TIME_WINDOW] [--stype=SCALAR]  
        [--syn-flag=SCALAR] [--tcp-flags=TCP_FLAGS]  
        [--tuple-file=TUPLE_FILENAME { [--tuple-fields=FIELDS]  
                                       [--tuple-direction=DIRECTION]  
                                       [--tuple-delimiter=CHAR] } ]  
        [--urg-flag=SCALAR]

Miscellaneous switches:

  rwfilter ...  
        [--compression-method=COMP_METHOD] [--dry-run]  
        [--max-fail-records=N] [--max-pass-records=N]  
        [--note-add=TEXT] [--note-file-add=FILE]  
        [--plugin=PLUGIN [--plugin=PLUGIN ...]]  
        [--print-filenames] [--site-config-file=FILENAME]  
        [--threads=N]

Help switches:

  rwfilter [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]  
        [--plugin=PLUGIN ...] [--python-file=PATH]  
        [--data-rootdir=ROOT_DIRECTORY] [--site-config-file=FILENAME]  
        --help

  rwfilter --version

DESCRIPTION

rwfilter serves two purposes: (1) It acts as an interface to the data store to select which SiLK Flow records to process, and (2) it partitions those records into one or more pass and/or fail streams.

The Selection Switches let one choose flow records from the SiLK data store by specifying where the flow was collected (its sensor), the date of collection, and/or the flow’s direction. The act of selecting records from the data store is sometimes called a ”data pull”.

The Partitioning Switches describe various types of traffic behavior (e.g., TCP traffic, or all traffic going to port 80). When a flow record matches all of the behaviors, it can be written to a pass stream (i.e., file). If a record fails to match any of these behavior predicates, it can be written to a fail stream. (You may also write every record rwfilter reads to an all stream.) These output streams from rwfilter are always binary SiLK Flow records. The output must be either written to a file or piped into another tool in the SiLK Suite, and rwfilter complains if it determines you are attempting to send the stream to a terminal. To view the records, pipe the records into rwcut(1).

In addition to the partitioning switches built in to rwfilter, additional partitioning predicates can be created as C or PySiLK plug-ins, and these can be loaded into rwfilter using the --plugin and/or --python-file switches as described below.

Instead of using the selection switches to choose flow records from the data store, rwfilter can apply the partitioning switches to existing files of SiLK flow records---such as files generated by a previous invocation of rwfilter. To run rwfilter in this mode, you may

When rwfilter is reading flow records from input files, some of the selection switches act as partitioning switches. The remaining selection switches may not be specified when using the alternate forms of input, and it is an error to specify multiple types of input.

Unlike many other tools in the SiLK tool suite, rwfilter requires that you specify one or more Output Switches that tell rwfilter what types of output to produce.

Finally, there are Miscellaneous Switches that control other aspects of rwfilter.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

Selection Switches

To read files from the data store, use the following options to specify which files to process. When rwfilter gets its input from files listed on the command line or from the --xargs or --input-pipe switches, the first four switches (--class, --type, --flowtypes, and --sensors) act as partitioning switches, and specifying any other selection switch produces an error.

--class=CLASS

The --class switch is used to specify a group of data to process. Only a single class may be selected with the --class switch; for multiple classes, use the --flowtypes switch. Classes are defined in the silk.conf(5) site configuration file. If the --class option is not given, the default-class as specified in silk.conf is used. To see the available classes and the default class, either examine the output from rwfilter --help or invoke rwsiteinfo(1) with the switch --fields=class,default-class.

--type={all | TYPE[,TYPE]}

The --type predicate further specifies data within the selected CLASS by listing the TYPEs of traffic to process. The switch takes a comma-separated list of types or the keyword all which specifies all types for the specified CLASS. Types are defined in silk.conf, they typically refer to the direction of the flow, and they may vary by class. When the --type switch is not specified, a list of default types is used. The default-type list is determined by the value of CLASS, and the default types generally include only incoming traffic. To see the available types and the default types for each class, examine the --help output of rwfilter or run rwsiteinfo with --fields=class,type,default-type.

--flowtypes=CLASS/TYPE[,CLASS/TYPE
...]

The --flowtypes predicate provides an alternate way to specify class/type pairs. The --flowtypes switch allows a single rwfilter invocation to process data from multiple classes. The keyword all may be used for the CLASS and/or TYPE to select all classes and/or types.

--sensors=SENSOR[,SENSOR
...]

The --sensor switch is used to select data from specific sensors. The parameter is a comma separated list of sensor names, sensor IDs (integers), and/or ranges of sensor IDs. Sensors are defined in the silk.conf(5) site configuration file, and the rwsiteinfo(1) command can be used to print a mapping of sensor names to IDs and classes. When the --sensor switch is not specified, the default is to use all sensors which are valid for the specified class(es).

--start-date=YYYY/MM/DD[:HH]
--end-date=YYYY/MM/DD[:HH]

The date predicates indicate which days and hours to consider when creating the list of files. The dates may be expressed as seconds since the UNIX epoch or in YYYY/MM/DD[:HH] format, where the hour is optional. A T may be used in place of the : to separate the day and hour. Whether the YYYY/MM/DD[:HH] strings represent times in UTC or the local timezone depend on how SiLK was compiled. To determine how your version of SiLK was compiled, see the Timezone support setting in the output from rwfilter --version.

When times are expressed in YYYY/MM/DD[:HH] format:

When at least one time is expressed as seconds since the UNIX epoch:

When neither --start-date nor --end-date is given, rwfilter processes all files for the current day.

It is an error to specify --end-date without specifying --start-date.

It is an error to specify --start-date when rwfilter believes there is some other input specified (see Non-Selection Input Switches).

--data-rootdir=ROOT_DIRECTORY

Tell rwfilter to use ROOT_DIRECTORY as the root of the data repository, which overrides the location given in the SILK_DATA_ROOTDIR environment variable, which in turn overrides the location that was compiled into rwfilter (/data). It is an error to specify this switch when files are specified on the command line or Non-Selection Input Switches are given.

--print-missing-files

This option prints to the standard error the names of the files that rwfilter’s file selection switches expected to find but did not. The file names are preceded by the text ’Missing ’; each file name appears on a separate line. This switch is useful for debugging, but the list of files it produces can be misleading. For example, suppose there is a decommissioned sensor that still appears in the silk.conf file; rwfilter considers these data files as missing even though their absence is expected. Use the output from this switch judiciously. It is an error to specify this switch when files are specified on the command line or Non-Selection Input Switches are given.

Non-Selection Input Switches

Instead of using the Selection Switches to read flow records from files in the data store, you can tell rwfilter to process files named on the command line or use one (and only one) of the following switches. To have rwfilter read flow records from the standard input, specify stdin or - as the name of an input file or use the (deprecated) --input-pipe switch.

--xargs
--xargs=INPUT_PATH

Read the names of the input files from INPUT_PATH or from the standard input if INPUT_PATH is not provided. The input is expected to have one filename per line. rwfilter opens each named file in turn and reads records from it as if the filenames had been listed on the command line.

--input-pipe=INPUT_PATH

Specify a source for SiLK Flow records, where INPUT_PATH is a named pipe or the string stdin or - to represent the standard input. You do not need to use this switch, you can simply specify the named pipe or the strings stdin or - on the command line. NOTE: This switch is deprecated, and it will be removed in the SiLK 4.0 release.

Output Switches

At least one of the following output switches must be provided:

--all-destination=ALL_PATH

Write every SiLK Flow record to ALL_PATH, where ALL_PATH refers to a file, a named pipe, the string stderr to refer to the standard error, or the strings stdout or - to refer to the standard output. This switch may be repeated to write all input records to multiple locations.

--fail-destination=FAIL_PATH

Write SiLK Flow records that have failed ANY of the partitioning predicates to FAIL_PATH, where FAIL_PATH refers to a non-existent file, a named pipe, the string stderr to refer to the standard error, or the strings stdout or - to refer to the standard output. This switch may be repeated to write records that fail any predicate to multiple locations.

--pass-destination=PASS_PATH

Write SiLK Flow records that have passed ALL of the partitioning predicates to PASS_PATH, where PASS_PATH refers to a non-existent file, a named pipe, the string stderr to refer to the standard error, or the strings stdout or - to refer to the standard output. This switch may be repeated to write records that pass every predicate to multiple locations.

--print-statistics
--print-statistics=STATS_PATH

Print a one line summary specifying the number of files processed, the total number of records read, the number of records that passed all partitioning predicates, and the number of records that failed. If STATS_PATH is provided, the summary is printed there; otherwise it is printed to the standard error. This switch cannot be mixed with --print-volume-statistics. When running rwfilter with multiple threads and --max-pass-records or --max-fail-records is specified, the statistics may not match the number of records written by rwfilter.

--print-volume-statistics
--print-volume-statistics=STATS_PATH

Print a four line summary of rwfilter’s processing. For each of all records, records that pass all the partitioning predicates, and records that fail, print the number of flow records and the number of packets and bytes represented by those flow records. The output also includes the number of files processed. If STATS_PATH is provided, the summary is printed there; otherwise it is printed to the standard error. This switch cannot be mixed with --print-statistics. When running rwfilter with multiple threads and --max-pass-records or --max-fail-records is specified, the statistics may not match the number of records written by rwfilter.

Partitioning Switches

rwfilter supports the following partitioning switches, at least one of which must be specified (unless the only Output Switch is --all-destination). The switches are AND’ed together; i.e., to pass the filter, the record must pass the test implied by each switch. Any record that does not pass is written to the fail-destination(s), if specified.

Each partitioning switch defines a test. These tests can be grouped into several broad categories; within each category, the tests are applied in the order in which the switches appear on the command line. The categories of the partitioning tests are:

Partitioning Switches for IP Addresses

There are three families of switches that partition based on an IP address. Each family can partition by the source IP, the destination IP, the next hop IP, or either source or destination IP. Each family includes a --not-* variant to reverse the sense of the test.

The --*cidr-family takes as its argument an IP_OR_CIDR_LIST, which is a single IP address 10.1.2.3, a single CIDR block FF01::/16, or a comma separated list of IPs and/or CIDR blocks 10.0.1.0/24,10.0.2.3,10.0.4.0/24. The IP_OR_CIDR_LIST supports IPv4 and IPv6 addresses.

The --*set-family requires that you store the IPs in a binary IPset file and pass the name of the file to the switch. IPset files are created from SiLK Flow records with rwset(1), or from textual input with rwsetbuild(1).

The --*address-family (which includes --next-hop-id) takes as its argument a single IP address, a single CIDR block, or a single SiLK IP Wildcard. A SiLK IP Wildcard may represent multiple, disjointed IPv4 or IPv6 addresses. An IP Wildcard contains an IP in its canonical form, except each part of the IP (where part is an octet for IPv4 or a hexadectet for IPv6) may be a single value, a range, a comma separated list of values and ranges, or the letter x to signify any value for that part of the IP (that is, 0-255 for IPv4). You may not specify a CIDR suffix when using the IP Wildcard notation. The following IP_WILDCARDs all represent the same value:

 ::ffff:0:0/112  
 ::ffff:0:x  
 ::ffff:0:aaab-ffff,aaaa,0-aaa9  
 ::ffff:0.0.0.0/112  
 ::ffff:0.0.128-254,0-126,255,127.x

The next hop address often has a value of 0.0.0.0 since the default configuration of SiLK does not store the next hop address in the data repository.

With one restriction, any combination of IP partitioning switches is allowed in a single rwfilter invocation: A positive and negative version of the same switch (e.g., --sipset and --not-sipset) is not allowed. (--sipset and --not-scidr may be used together, as can --sipset and --not-dipset.)

The address-partitioning switches are:

--scidr=IP_OR_CIDR_LIST

Pass the record if its source IP address matches a value in IP_OR_CIDR_LIST, a comma separated list of IPs and/or CIDR blocks. See also --saddress and --sipset.

--dcidr=IP_OR_CIDR_LIST

Pass the record if its destination IP address matches a value in IP_OR_CIDR_LIST. See also --daddress and --dipset.

--any-cidr=IP_OR_CIDR_LIST

Pass the record if either its source or its destination IP address matches a value in IP_OR_CIDR_LIST. This switch does not consider the next hop IP address. See also --any-address and --anyset.

--nhcidr=IP_OR_CIDR_LIST

Pass the record if its next hop IP address matches a value in IP_OR_CIDR_LIST. See also --next-hop-id and --nhipset.

--not-scidr=IP_OR_CIDR_LIST

Pass the record if its source IP address does not match a value in IP_OR_CIDR_LIST, a comma separated list of IPs and/or CIDR blocks. See also --not-saddress and --not-sipset.

--not-dcidr=IP_OR_CIDR_LIST

Pass the record if its destination IP address does not match a value in IP_OR_CIDR_LIST. See also --not-daddress and --not-dipset.

--not-any-cidr=IP_OR_CIDR_LIST

Pass the record if neither its source nor its destination IP address matches a value in IP_OR_CIDR_LIST. See also --not-any-address and --not-anyset.

--not-nhcidr=IP_OR_CIDR_LIST

Pass the record if its next hop IP address does not match a value in IP_OR_CIDR_LIST. See also --not-next-hop-id and --not-nhipset.

--saddress=IP_WILDCARD

Pass the record if its source IP address is matched by the SiLK IP Wildcard IP_WILDCARD. To match on multiple IPs, use --scidr or create an IPset and use --sipset.

--daddress=IP_WILDCARD

Pass the record if its destination IP address is matched by IP_WILDCARD, a SiLK IP Wildcard. See also --dcidr and --dipset.

--any-address=IP_WILDCARD

Pass the record if either its source or its destination IP address is matched by IP_WILDCARD, a SiLK IP Wildcard. This switch does not consider the next hop IP address. See also --any-cidr and --anyset.

--next-hop-id=IP_WILDCARD

Pass the record if its next hop IP address is matched by this IP_WILDCARD, a SiLK IP Wildcard. To match on multiple IPs, use --nhcidr or create an IPset and use --nhipset.

--not-saddress=IP_WILDCARD

Pass the record if its source IP address is not matched by this IP_WILDCARD, a SiLK IP Wildcard. See also --not-scidr and --not-sipset.

--not-daddress=IP_WILDCARD

Pass the record if its destination IP address is not matched by this IP_WILDCARD. See also --not-dcidr and --not-dipset.

--not-any-address=IP_WILDCARD

Pass the record if neither its source nor its destination IP address is matched by this IP_WILDCARD. Does not consider the next hop address. See also --not-any-cidr and --not-anyset.

--not-next-hop-id=IP_WILDCARD

Pass the record if its next hop IP address is not matched by this IP_WILDCARD. See also --not-nhcidr and --not-nhipset.

--sipset=IP_SET_FILENAME

Pass the record if its source IP address is in the list of IPs contained in the binary set file IP_SET_FILENAME. See also --scidr.

--dipset=IP_SET_FILENAME

As --sipset for the destination IP address. See also --dcidr.

--anyset=IP_SET_FILENAME

Pass the record if either its source IP address or its destination IP address is in the list of IPs contained in the binary set file IP_SET_FILENAME. Does not consider the next hop IP. See also --any-cidr.

--nhipset=IP_SET_FILENAME

As --sipset for the next-hop IP address. See also --nhcidr.

--not-sipset=IP_SET_FILENAME

Pass the record if its source IP address is not in the list of IPs contained in the binary set file IP_SET_FILENAME. See also --not-scidr.

--not-dipset=IP_SET_FILENAME

As --not-sipset for the destination IP address. See also --not-dcidr.

--not-anyset=IP_SET_FILENAME

Pass the record if neither its source IP address nor its destination IP address is in the list of IPs contained in the binary set file IP_SET_FILENAME. Does not consider the next hop IP. See also --not-any-cidr.

--not-nhipset=IP_SET_FILENAME

As --not-sipset for the next hop IP address. See also --not-nhcidr.

Partitioning Switches for Remainder of Five-Tuple

The following switches partition based on the protocol and source or destination port. The parameter to each of these switches is an INTEGER_LIST, which is a comma-separated list of individual non-negative integer values and ranges of those values. For example, 1,2,3,5-10,99-103. A range may be specified without an upper limit, such as 1-, in which case the upper limit is set to the maximum value.

--sport=INTEGER_LIST

Pass the record if its source port is in this INTEGER_LIST, possible values are 0-65535.

--dport=INTEGER_LIST

Pass the record if its destination port is in this INTEGER_LIST, possible values are 0-65535

--aport=INTEGER_LIST

Pass the record if its source port and/or its destination port is in this INTEGER_LIST, possible values are 0-65535. For example, use --aport=25 to see all SMTP conversions regardless or where they originated.

--protocol=INTEGER_LIST

Pass the record if its IP Suite Protocol is in this INTEGER_LIST, possible values are 0-255.

--icmp-type=INTEGER_LIST

Pass the record if its ICMP (or ICMPv6) type is in this INTEGER_LIST; possible values 0-255. This switch also verifies that the flow’s protocol is 1 (or 58 if the flow is IPv6). It is an error to specify a --protocol that does not include 1 and/or 58.

--icmp-code=INTEGER_LIST

Pass the record if its ICMP (or ICMPv6) code is in this INTEGER_LIST; possible values 0-255. This switch also verifies that the flow’s protocol is 1 (or 58 if the flow is IPv6). It is an error to specify a --protocol that does not include 1 and/or 58.

Partitioning Switches for Time

These switches partition based on whether the time stamps on the flow record occur within the specified time window. The form of the argument is range of two dates, start-window and end-window, each in the form YYYY/MM/DD[:HH[:MM[:SS[.ssssss]]]], for example 2003/01/31:23:45:00.000-2003/01/31:23:59:59.999 represents the last fifteen minutes of Jan 31, 2003. (A T may be used in place of : to separate the day and hour.) The start-window and end-window must be set to at least day precision. For the start-window, unspecified hour, minute, second, and millisecond values are set to 0; for the end-window, those values are set to 23, 59, 59, and 999 respectively. Thus 2003/01/31:23-2003/01/31:23 becomes 2003/01/31:23:00:00.000-2003/01/31:23:59:59.999. If an end-window is not given, it is set to the start-window, giving a window of a single millisecond. The date strings are considered to be in the timezone specified when SiLK was compiled, which you can determine from the output of rwfilter --version. You may also specify the times as seconds since the UNIX epoch; when the end-time is in epoch seconds, an unspecified milliseconds value is set to 999 and otherwise the value is unchanged.

--active-time=TIME_WINDOW

Pass the record if the record was active at ANY time during this TIME_WINDOW. If a single time is specified, pass the record if it was active at that instant.

--stime=TIME_WINDOW

Pass the record if its starting time is in this TIME_WINDOW.

--etime=TIME_WINDOW

As --stime for the ending time.

--duration=DECIMAL_RANGE

Pass the record if its duration--that is, the record’s end time minus its start time, as measured in seconds--is in this DECIMAL_RANGE. Use floating point numbers to specify millisecond values. The range should be specified as MIN-MAX; for example, 5.0-10.031. If a single value is given, the duration must match that value exactly. The upper limit may be omitted; for example, a range of 1.5- passes records whose duration is at least 1.5 seconds.

Partitioning Switches for Volume

The following switches partition based on the volume of the flow; that is, the number of bytes or packets. For additional volume-related switches, load the flowrate plug-in as described in the flowrate(3) manual page.

These switches accept a range of non-negative integers or decimal values. If the upper limit is omitted, the volume must be at least that size. If the argument is a single value, the volume must match that value exactly.

--bytes=INTEGER_RANGE

Pass the record if its byte count is in this INTEGER_RANGE.

--packets=INTEGER_RANGE

Pass the record if its packet count is in this INTEGER_RANGE.

--bytes-per-packet=DECIMAL_RANGE

Pass the record if its average bytes per packet count (bytes/packet) is in this DECIMAL_RANGE.

Partitioning Switches for TCP Flags

When a flow generator creates a flow record from TCP packets, it creates a field that is the bit-wise OR of the TCP flags from all packets that comprise that flow record. Some flow generators, such as yaf(1), can export two TCP flag fields: one contains the flags on the first packet in the flow, and the second contains the bit-wise OR of the remaining packets.

To partition records based on their TCP flags values, there is a recommended set of switches and legacy-supported switches. The switches accept the following letters to represent the named TCP flag: F=FIN; S=SYN; R=RST; P=PSH; A=ACK; U=URG; E=ECE; C=CWR.

The recommended set of switches take a comma separated list of pairs of TCP flags, where the pair is separated by a slash (/). The value to the left of the slash is the HIGH_SET and it must be a subset of the value to the right of the slash, which is the MASK_SET. For a record to pass the filter, the flags in the HIGH_SET must be on and the remaining flags in MASK_SET must be off. Flags not in MASK_SET may have any value. If a list of pairs is given, the record passes if any pair in the list matches. For example, --flags-all=S/S,A/A passes flows that have either the SYN or the ACK flag set, --flags-all=S/SA passes flow records where SYN is high and ACK is low, and --flags-all=/F passes flows where FIN is off. This list of flag pairs is called a HIGH_MASK_FLAGS_LIST.

The recommended switches for TCP flag partitioning are:

--flags-all=HIGH_MASK_FLAGS_LIST

Pass the record if any of the HIGH_SET/MASK_SET pairs is true when looking at the bit-wise OR of the TCP flags across all packets in the flow.

--flags-initial=HIGH_MASK_FLAGS_LIST

As --flags-all, except this switch considers only the initial packet in the flow, for flow generators that can generate that field.

--flags-session=HIGH_MASK_FLAGS_LIST

As --flags-all, except this switch considers the bit-wise OR of the TCP flags across the second through the final packet in the flow; that is, ignoring the flags on the first packet.

The TCP-flag partitioning switches supported for legacy reasons are:

--tcp-flags=TCP_FLAGS

Pass the record if, for any one of its packets, any of the specified TCP_FLAGS was on, where TCP_FLAGS contains the letters F,S,R,P,A,U,E,C. For example, --tcp-flags=ASF passes records where ACK is set, or SYN is set, or FIN is set.

--ack-flag={0|1}

Set to 0, only passes records where the ACK Flag is Low, Set to 1, only passes records where the ACK Flag is high.

--cwr-flag={0|1}

As --ack-flag for the CWR Flag

--ece-flag={0|1}

As --ack-flag for the ECE Flag

--fin-flag={0|1}

As --ack-flag for the ACK Flag

--psh-flag={0|1}

As --ack-flag for the PSH Flag

--rst-flag={0|1}

As --ack-flag for the RST Flag

--syn-flag={0|1}

As --ack-flag for the SYN Flag

--urg-flag={0|1}

As --ack-flag for the URG Flag

Partitioning Switches for Other Flow Characteristics

Other than the --ip-version switch, the fields queried by the following switches may always be zero. The default configuration of SiLK does not store the fields that contain the SNMP values. The other fields are not present in NetFlow v5, and require use of properly-configured enhanced collection software, such as yaf(1), http://tools.netsa.cert.org/yaf/.

--ip-version={4|6|4,6}

Passes the record if its IP Version is in the specified list. This switch determines how IPv4 and IPv6 flow records are handled when SiLK has been compiled with IPv6 support. When the argument to this switch is 4, rwfilter writes records marked as IPv6 to the fail-destination, regardless of the IP addresses it contains. When the argument to this switch is 6, rwfilter writes records marked as IPv4 to the fail-destination. When SiLK has not been compiled with IPv6 support, the only legal value for this switch is 4, and any IPv6 flows in the input ignored (that is, they are not written to either the pass-destination nor the fail-destination).

--application=INTEGER_LIST

Some flow generation software can inspect the contents of the packets that comprise a flow and use traffic signatures to label the content of the flow. SiLK calls this label the application; yaf refers to it as the appLabel (see the applabel(1) manual page in the yaf distribution). The application value is the port number that is traditionally used for that type of traffic (see the /etc/services file on most UNIX systems). For example, traffic that the flow generator recognizes as FTP has a value of 21, even if that traffic is being routed through the standard HTTP/web port (80). The flow generator uses a value for 0 if the application cannot be determined. The --application switch passes the flow if the flow’s application value is in the specified INTEGER_LIST, which is a comma separated list of integers from 0 to 65535 inclusive and ranges of those integers. The list of valid appLabels is determined by your site’s yaf installation.

--attributes=ATTRIBUTES_LIST

The attributes field in SiLK Flow records describes characteristics about how the flow record was generated or about the packets that comprise the flow record. The ATTRIBUTES_LIST argument is similar to the HIGH_MASK_FLAGS_LIST argument to the --flags-all switch. ATTRIBUTES_LIST is a comma separated list of up to 8 HIGH_ATTRIBUTES/MASK_ATTRIBUTES pairs, where HIGH_ATTRIBUTES and MASK_ATTRIBUTES are strings of the characters S,T,C,F, and HIGH_ATTRIBUTES is a subset of MASK_ATTRIBUTES. rwfilter passes the record if, for any pair of attributes in the list, the attributes listed in HIGH_ATTRIBUTES are set and the remaining attributes in MASK_ATTRIBUTES are not-set. The valid attributes are:

S

All the packets in this flow record are exactly the same size.

T

The flow generator prematurely created a record for a long-lived session due to the connection’s lifetime reaching the active timeout of the flow generator. (Also, when yaf is run with the --silk switch, it prematurely creates a flow and marks it with T if the byte count of the flow cannot be stored in a 32-bit value.)

C

The flow generator created this flow as a continuation of long-running connection, where the previous flow for this connection met a timeout.

F

The flow generator saw additional packets in this flow following a packet with the FIN flag set (excluding ACK packets).

For a long-lived connection spanning several flow records, the first flow record is marked with a T indicating that it hit the active timeout. The second through next-to-last records are marked with CT indicating that the flow is a continuation of a connection that timed out and that this flow also timed out. The final flow is marked with a C, indicating that it was created as a continuation of an active flow.

--input-index=INTEGER_LIST

Pass the record if its in field is in this INTEGER_LIST, which is a comma separated list of integers from 0 to 65535, inclusive, and ranges of those integers. When present, the in field normally contains the incoming SNMP interface, but it may contain the vlanId if the packing tools were configured to capture it (see sensor.conf(5)).

--output-index=INTEGER_LIST

Pass the record if its out field is in this INTEGER_LIST. When present, the out field normally contains the outgoing SNMP interface, but it may contain the postVlanId if the packing tools were configured to capture it.

--any-index=INTEGER_LIST

Pass the record if its in field or if its out field is in this INTEGER_LIST.

Selection Switches Acting as Partitioning Switches

The following four switches are normally file selection switches, that is they select which files rwfilter reads within the data repository. However, when rwfilter gets input without querying the data repository (that is, from files listed on the command line, from files specified by --xargs, or from the --input-pipe), these switches become partitioning switches and determine whether a record is written to the pass-destination or fail-destination.

--class=CLASS

Pass the record if its class is CLASS and its type is listed in the --type switch, or its type is in the default type list for CLASS when --type is not specified. Use rwfilter --help to see the list of available classes and types, and the defaults.

--flowtypes=CLASS/TYPE[,CLASS/TYPE
...]

Pass the record its if class/type value is one of those listed. The keyword all may be used for the CLASS and/or TYPE to select all classes and/or types. This switch cannot be used when either --class or --type is used. Use rwfilter --help to see the list of available classes and types.

--sensors=SENSOR[,SENSOR
...]

Pass the record if its sensor is one of those listed. The parameter is a comma separated list of sensor names, sensor IDs (integers), and/or ranges of sensor IDs. Use the rwsiteinfo(1) command to see the list of sensors.

--type={all | TYPE[,TYPE]}

Pass the record if its type is one of those listed and its class is specified by --class, or its class is the default class when the --class switch is not specified. Use rwfilter --help to see the list of available classes and types, and the defaults.

Partitioning Switches that use Additional Mapping Files

Additional partitioning switches are available that allow one to partition flow records depending on a label, where the label is computed from an IP address or port on the record and an additional mapping file.

--pmap-file=MAPNAME:PATH
--pmap-file=PATH

Instruct rwfilter to load the mapping file located at PATH and create new switches --pmap-src-MAPNAME, --pmap-dst-MAPNAME, and --pmap-any-MAPNAME. When MAPNAME is provided, it is used to refer to the switches specific to that prefix map. If MAPNAME is not provided, rwfilter checks the prefix map file to see if a map-name was specified when the file was created. If no map-name is available, rwfilter creates legacy switches as described below. Multiple --pmap-file switches are supported as long as each uses a unique map-name. The --pmap-file switch(es) must precede all other --pmap-* switches. For more information, see pmapfilter(3).

--pmap-src-MAPNAME=LABELS

If the prefix map associated with MAPNAME is an IP prefix map, this matches records with a source IPv4 address that maps to a label contained in the list of labels in LABELS.

If the prefix map associated with MAPNAME is a proto-port prefix map, this matches records with a protocol and source port combination that maps to a label contained in the list of labels in LABELS.

--pmap-dst-MAPNAME=LABELS

Similar to --pmap-src-MAPNAME, but uses the destination IP or the protocol and destination port.

--pmap-any-MAPNAME=LABELS

If the prefix map associated with MAPNAME is an IP prefix map, this matches records with a source IP address or a destination IP address that maps to a label contained in the list of labels in LABELS.

If the prefix map associated with MAPNAME is a port/protocol prefix map, this matches records with a protocol and source port or destination port combination that maps to a label contained in the list of labels in LABELS.

--pmap-saddress=LABELS
--pmap-daddress=LABELS
--pmap-any-address=LABELS

These are deprecated switches created by pmapfilter that correspond to --pmap-src-MAPNAME, --pmap-dst-MAPNAME, and --pmap-any-MAPNAME, respectively. These switches are available when an IP prefix map is used that is not associated with a MAPNAME.

--pmap-sport-proto=LABELS
--pmap-dport-proto=LABELS
--pmap-any-port-proto=LABELS

These are deprecated switches created by pmapfilter that correspond to --pmap-src-MAPNAME, --pmap-dst-MAPNAME, and --pmap-any-MAPNAME, respectively. These switches are available when a proto-port prefix map is used that is not associated with a MAPNAME.

--scc=COUNTRY_CODE_LIST
--dcc=COUNTRY_CODE_LIST
--any-cc=COUNTRY_CODE_LIST

Pass the record if one its IP addresses maps to a country code that is specified in COUNTRY_CODE_LIST. For --scc, the source IP must match. For --dcc, the destination IP must match. For --any-cc, either the source or the destination must match. COUNTRY_CODE_LIST is a comma separated list of lowercase two-letter country codes---based on the Root-Zone Whois Index (see for example http://www.iana.org/cctld/cctld-whois.htm)---as well as the following special codes:

--

N/A (e.g. private and experimental reserved addresses)

a1

anonymous proxy

a2

satellite provider

o1

other

For example: cx,uk,kr,jp,--. To use this switch, the country code mapping file must be available in the default location, or in the location specified by the SILK_COUNTRY_CODES environment variable. See ccfilter(3) for details.

--stype={0|1|2|3}
--dtype={0|1|2|3}

Pass a flow record depending on whether the IP address is internal, external, or non-routable. These switches use the mapping file specified by the SILK_ADDRESS_TYPES environment variable, or the address_types.pmap mapping file, as described in addrtype(3). When the parameter is 0, pass the record if its source (--stype) IP address or destination (--dtype) IP address is non-routable. When 1, pass if internal. When 2, pass if external (i.e., routable but not internal). When 3, pass if not internal (non-routable or external).

Partitioning Switches across Multiple Fields

The --tuple-* family of switches allows the user to partition flow records based on multiple values of the five-tuple.

--tuple-file=TUPLE_FILENAME

This switch provides support for partitioning by arbitrary subsets of the basic five-tuple:

 {source-ip,destination-ip,source-port,destination-ip-port,protocol}

A SiLK Flow record passes the test when the record’s fields match one of the tuples; if the SiLK record does not match any tuple, the record fails. The tuples are read from the text file TUPLE_FILENAME which must contain lines of delimited fields. The default delimiter is |, but may be specified with the --tuple-delimiter switch. Each field contains one member of the tuple; the fields may appear in any order. The fields may represent any subset of the five-tuple, but each line in the file must define the same subset. A field that is present but has no value generates an error. If you want the field to match any value, it is best that you not include that field in your input.

In addition to the tuple-lines, TUPLE_FILENAME may contain blank lines and comments (which begin with # and continue to the end of the line). The first line of TUPLE_FILENAME may contain a title labeling the fields in the file. This title line is ignored when the --tuple-fields switch is given.

The IP fields may contain an IPv4 address, an integer, or a IP in CIDR block notation. Comma-separated lists (80,443) and ranges (0-1023,8080) are supported for the ports and protocol fields. NOTE: Currently the code is not clever in its support for CIDR notation and ranges in that each occurrence is fully expanded. When this occurs, the memory required to hold the search tree quickly grows.

--tuple-fields=FIELDS

FIELDS contains the list of fields (columns) to parse from the TUPLE_FILENAME in the order in which they appear in the file. When this switch is not provided, rwfilter treats the first line in TUPLE_FILENAME as a title line and attempts to determine the fields (a la rwtuc(1)); rwfilter exits if it cannot determine the fields.

FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Names can be abbreviated to their shortest unique prefix. The field names and their descriptions are:

sIP,sip,1

source IP address

dIP,dip,2

destination IP address

sPort,sport,3

source port

dPort,dport,4

destination port

protocol,5

IP protocol

--tuple-direction=DIRECTION

Allows you to change the comparison between the tuple and the SiLK Flow record. This switch allows one to look for traffic in the reverse direction (or both directions) without having to write all of the rules twice. The available directions are:

forward

The tuple’s fields are compared against the corresponding fields on the flow; that is, sIP is compared with sIP, dIP with dIP, sPort with sPort, dPort with dPort, and protocol with protocol. This is the default.

reverse

The tuple’s fields are compared against the opposite fields on the flow; that is, sIP is compared with dIP, dIP with sIP, sPort with dPort, dPort with sPort, and protocol with protocol.

both

Both of the above comparisons are performed.

--tuple-delimiter=CHAR

Specifies the character separating the input fields. When the switch is not provided, the default of | is used.

Partitioning Switches that use the PySiLK Plug-in

The SiLK Python plug-in provides support for filtering by expressions or complex functions written in the Python programming language. See the silkpython(3) and pysilk(3) manual pages for information and examples for how to use Python to manipulate SiLK data structures. When multiple Partitioning Switches are given, the Python plug-in is the next-to-last to be invoked. Only the code specified by the --plugin switch is called after the Python code.

--python-file=FILENAME

Pass the record if the result of the processing the flow with the function named rwfilter() in FILENAME is true. The function should take a single silk.RWRec object as an argument. See silkpython(3) for details.

--python-expr=PYTHON_EXPRESSION

Pass the record if the result of the processing the flow with the specified PYTHON_EXPRESSION is true. The expression is evaluated as if it appeared in the following context:

 from silk import *  
 def rwfilter(rec):  
     return (PYTHON_EXPRESSION)

Partitioning Switches that use the IP-Association Plug-In

The IPA plug-in, ipafilter.so, provides switches that can partition flows using data in an IP Association database. For this plug-in to be available, SiLK must be compiled with IPA support and IPA must be configured. See ipafilter(3) and http://tools.netsa.cert.org/ipa/ for additional information.

--ipa-src-expr=IPA_EXPR

Use IPA_EXPR to partition flows based on the source IP of the flow matching the IPA_EXPR expression.

--ipa-dst-expr=IPA_EXPR

Use IPA_EXPR to partition flows based on the destination IP of the flow matching the IPA_EXPR expression.

--ipa-any-expr=IPA_EXPR

Use IPA_EXPR to partition flows based on either the source or destination IP of the flow matching the IPA_EXPR expression.

Miscellaneous Switches
--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--dry-run

Perform a sanity check on the input arguments to check that the arguments are acceptable. In addition, prints to the standard output the names of the files that would be accessed (and the names of missing files if --print-missing is specified). rwfglob(1) can also be used to generate the lists of files that rwfilter would access.

--help

Print the available options and exit. Options that add fields (for example, options that load plug-ins, prefix maps, or PySiLK extensions) can be specified before the --help switch so that the new options appear in the output. The available classes and types are included in output; you may specify a different root directory or site configuration file before --help to see the classes and types available for that site.

--max-fail-records=N

Write N records to each --fail-destination. rwfilter stops reading input once it has written these N records unless --pass-destination or --all-destination switch(es) are also specified.

--max-pass-records=N

Write N records to each --pass-destination. rwfilter stops reading input once it has written these N records unless --fail-destination or --all-destination switch(es) are also specified.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--plugin=PLUGIN

Augment the partitioning switches by using run-time loading of the plug-in (shared object) whose path is PLUGIN. The switch may be repeated to load multiple plug-ins. The creation of plug-ins is described in the silk-plugin(3) manual page. When multiple partitioning switches are given, the code specified by the --plugin switch(es) is last to be invoked. When PLUGIN does not contain a slash (/), rwfilter attempts to find a file named PLUGIN in the directories listed in the FILES section. If rwfilter finds the file, it uses that path. If PLUGIN contains a slash or if rwfilter does not find the file, rwfilter relies on your operating system’s dlopen(3) call to find the file. When the SILK_PLUGIN_DEBUG environment variable is non-empty, rwfilter prints status messages to the standard error as it attempts to find and open each of its plug-ins.

--print-filenames

Print the names of input files as they are read. This can be useful feedback for a long-running rwfilter process.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwfilter searches for the site configuration file in the locations specified in the FILES section.

--threads=N

Invoke rwfilter with N threads reading the input files. When this switch is not provided, the value in the SILK_RWFILTER_THREADS environment variable is used. If that variable is not set, rwfilter runs with a single thread. Using multiple threads, performance of rwfilter is greatly improved for queries that look at many files but return few records. Preliminary testing has found that performance peaks around four threads per CPU, but performance varies depending on the type of query and the number of records returned.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

The most basic filtering involves looking at specific traffic over a specific time. For example:

 $ rwfilter --start-date=2003/02/19:00 --end-date=2003/02/19:23     \  
        --proto=6 --pass-destination=tcp-in.rw

creates a file, tcp-in.rw containing all incoming TCP traffic on February 19, 2003. The --start-date and --end-date switches select which files to examine. The --proto switch partitions the flow records into a pass stream (records whose protocol is 6---that is, TCP) and a fail stream (all other records). The --pass-destination switch (often shortened to --pass) tells rwfilter to write the records that pass the --proto test to the file tcp-in.rw.

The tcp-in.rw file contains SiLK Flow data in a binary format. To examine the contents, use the command rwcut(1). This query only selects incoming traffic because the silk.conf(5) configuration file at most sites tells rwfilter to look at incoming traffic unless an explicit --type switch is given.

The following query gets all TCP traffic (for the default class) for February 19, 2003.

 $ rwfilter --type=all --start-date=2003/02/19  \  
        --proto=6 --pass-destination=alltcp.rw

Note the addition of --type=all. This query also relies on the default behavior of --start-date to consider a full day’s worth of data when no hour is specified.

The above query gets all traffic for the default class. If your silk.conf file has a single class, that query captures all of it. For silk.conf files that specify multiple classes, the following gets all TCP traffic for February 19, 2003:

 $ rwfilter --flowtypes=all/all --start-date=2003/02/19     \  
        --proto=6 --pass-destination=alltcp.rw

To get all non-TCP traffic, there are two approaches. rwfilter does not supply a way to choose a negated set of protocols, but you can choose all protocols other than TCP:

 $ rwfilter --start-date=2003/02/19:00 --end-date=2003/02/19:23     \  
        --proto=0-5,7-255 --pass-destination=non-tcp.rw

The other approach is to use the --fail-destination switch (often shortened to --fail) that contains the records that failed one or more of the partitioning test(s):

 $ rwfilter --start-date=2003/02/19:00 --end-date=2003/02/19:23     \  
        --proto=6 --fail-destination=non-tcp.rw

To print information about the number of flow records that pass a filter, use --print-volume-statistics. This can be combined with other output switches.

 $ rwfilter --start-date=2003/02/19:00 --end-date=2003/02/19:23     \  
        --proto=6 --print-volume-stat --pass-destination=tcp-in.rw  
      |        Recs|     Packets|         Bytes|  Files|  
 Total|      515359|     2722887|    1343819719|    180|  
  Pass|      512071|     2706571|    1342851708|       |  
  Fail|        3288|       16316|        968011|       |

If you want to see the number of records in a file produced by rwfilter, or to remind yourself how a file was created, use rwfileinfo(1):

 $ rwfileinfo tcp-in.rw  
 tcp-in.rw:  
   format(id)          FT_RWGENERIC(0x16)  
   version             16  
   byte-order          littleEndian  
   compression(id)     lzo1x(2)  
   header-length       208  
   record-length       52  
   record-version      5  
   silk-version        2.4.0  
   count-records       512071  
   file-size           8576160  
   command-lines  
       1  rwfilter --start-date=2003/02/19:00 --end-date=2003/02/19:23 \  
            --proto=6 --print-volume-stat --pass-destination=tcp-in.rw

Once a file is written, rwfilter can process the file again. Traffic on port 25 is most likely email (SMTP) traffic. To split the email traffic from the other traffic, use:

 $ rwfilter --aport=25 --pass=mail.rw --fail=not-mail.rw tcp-in.rw

This command puts traffic where the source or destination port was 25 into the file mail.rw, and all other traffic into the file not-mail.rw. The --fail-destination is an effective way to reverse the sense of a test. For example, to remove traffic on port 80 from the not-mail.rw file, run the command:

 $ rwfilter --aport=80 --fail=not-mail-web.rw not-mail.rw

To verify that the not-mail-web.rw file does not contain any traffic on ports 25 or 80, you can use the --print-statistics switch and see that 0 records pass:

 $ rwfilter --aport=25,80 --print-stat not-mail-web.rw  
 Files     1.  Read    54641.  Pass        0. Fail     54641.

The file maintains a history of the commands that created it:

 $ rwfileinfo not-mail-web.rw  
 not-mail-web.rw:  
   format(id)          FT_RWGENERIC(0x16)  
   version             16  
   byte-order          littleEndian  
   compression(id)     lzo1x(2)  
   header-length       364  
   record-length       52  
   record-version      5  
   silk-version        2.4.0  
   count-records       54641  
   file-size           762875  
   command-lines  
       1  rwfilter --start-date=2003/02/19:00 --end-date=2003/02/19:23 \  
            --proto=6 --print-volume-stat --pass-destination=tcp-in.rw  
       2  rwfilter --aport=25 --pass=mail.rw --fail=not-mail.rw        \  
            tcp-in.rw  
       3  rwfilter --aport=80 --fail=not-mail-web.rw not-mail.rw

The following finds all outgoing traffic from February 19, 2003, going to an external email server. Traffic going to a server contacts that server on its well-known port, and the flow record’s destination port should hold that well-known port:

 $ rwfilter --type=out --start-date=2003/02/19 --print-volume-stat  \  
        --dport=25 --proto=6

To limit the result to completed connections, select flow records that contain at least three packets, use the --packets switch with an open-ended range:

 $ rwfilter --type=out --start-date=2003/02/19 --print-volume-stat  \  
        --dport=25 --proto=6 --packets=3-

To limit the search to a particular internal CIDR block, 10.1.2.0/24, there are three different IP-partitioning switches you can use. The final approach uses rwsetbuild(1) to create an IPset file from textual input.

 $ rwfilter --type=out --start-date=2003/02/19 --print-volume-stat  \  
        --dport=25 --proto=6 --packets=3- --scidr=10.1.2.0/24

 $ rwfilter --type=out --start-date=2003/02/19 --print-volume-stat  \  
        --dport=25 --proto=6 --packets=3- --saddress=10.1.2.x

 $ echo "10.1.2.0/24" | rwsetbuild > my-set.set  
 $ rwfilter --type=out --start-date=2003/02/19 --print-volume-stat  \  
        --dport=25 --proto=6 --packets=3- --sipset=my-set.set

rwfilter does not have to output its records to a file; instead, the output from rwfilter can be piped into a another SiLK tool. You must still use the --pass-destination switch (or --fail-destination or --all-destination switch), but by providing the argument of stdout or - to the switch you tell rwfilter to write its output to the standard output.

For example, to get the IPs of the external email servers that the monitored network contacted, pipe the rwfilter output into rwset(1), and tell rwset to store the destination addresses:

 $ rwfilter --type=out --start-date=2003/02/19 --dport=25           \  
        --proto=6 --packets=3- --scidr=10.1.2.0/24 --pass=stdout    \  
   | rwset --dip-file=external-mail-servers.set

rwfilter can also pipe its output as input to another rwfilter command, which allows them to be chained together. rwfilter does not read from the standard input by default; you must explicitly give stdin or - as the stream to read:

 $ rwfilter --type=out,outweb --start-date=2003/02/19               \  
        --scidr=10.1.2.0/24 --pass=stdout                           \  
   | rwfilter --proto=17 --pass=udp.rw --fail=stdout stdin          \  
   | rwfilter --proto=6 --pass=stdout --fail=non-tcp-udp.rw stdin   \  
   | rwfilter --aport=25 --pass=mail.rw --fail=stdout stdin         \  
   | rwfilter --aport=80,443 --pass=web.rw                          \  
        --fail=tcp-non-web-mail.rw stdin

This chain of commands looks at outgoing traffic on February 19, 2003, originating from the internal net-block 10.1.2.0/24, creates the following files:

udp.rw

Outgoing UDP traffic

non-tcp-udp.rw

Outgoing traffic that is neither TCP nor UDP

mail.rw

Outgoing TCP traffic on port 25, most of which is probably email (SMTP). Since the query looks at outgoing traffic and the --aport switch was used, this file represents email going from the internal 10.1.2.0/24 to external mail servers, and the responses from any internal mail servers that exist in the 10.1.2.0/24 net-block to external clients.

web.rw

Outgoing TCP traffic on ports 80 and 443, most of which is probably web traffic (HTTP,HTTPS). As with the mail.rw file, this file represents queries to external web servers and responses from internal web servers.

tcp-non-web-mail.rw

Outgoing TCP traffic other than that on ports 25, 80, and 443

Expert users can create even more complicated chains of rwfilter commands using named pipes.

ENVIRONMENT

SILK_RWFILTER_THREADS

The number of threads to use while reading input files or files selected from the data store.

PYTHONPATH

This environment variable is used by Python to locate modules. When --python-file or --python-expr is specified, rwfilter must load the Python files that comprise the PySiLK module, such as silk/__init__.py. If this silk/ directory is located outside Python’s normal search path (for example, in the SiLK installation tree), it may be necessary to set or modify the PYTHONPATH environment variable to include the parent directory of silk/ so that Python can find the PySiLK module.

SILK_PYTHON_TRACEBACK

When set, Python plug-ins output traceback information on Python errors to the standard error.

SILK_COUNTRY_CODES

This environment variable allows the user to specify the country code mapping file that the --scc and --dcc switches use. The value may be a complete path or a file relative to the SILK_PATH. See the FILES section for standard locations of this file.

SILK_ADDRESS_TYPES

This environment variable allows the user to specify the address type mapping file that the --stype and --dtype switches use. The value may be a complete path or a file relative to the SILK_PATH. See the FILES section for standard locations of this file.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. This value overrides the compiled-in value, and rwfilter uses it unless the --data-rootdir switch is specified. In addition, rwfilter may use this value when searching for the SiLK site configuration files. See the FILES section for details.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files and plug-ins, rwfilter may use this environment variable. See the FILES section for details.

TZ

When a SiLK installation is built to use the local timezone (to determine if this is the case, check the Timezone support value in the output from rwfilter --version), the value of the TZ environment variable determines the timezone in which rwfilter parses timestamps. If the TZ environment variable is not set, the default timezone is used. Setting TZ to 0 or the empty string causes timestamps to be parsed as UTC. The value of the TZ environment variable is ignored when the SiLK installation uses utc. For system information on the TZ variable, see tzset(3) or environ(7).

SILK_PLUGIN_DEBUG

When set to 1, rwfilter prints status messages to the standard error as it attempts to find and open each of its plug-ins.

SILK_LOGSTATS

When set to a non-empty value, rwfilter treats the value as the path to an external program to execute with information about this rwfilter invocation. If the value in SILK_LOGSTATS does not contain a slash or if it references a file that does not exist, is not a regular file, or is not executable, the SILK_LOGSTATS value is silently ignored. The arguments to the external program are:

SILK_LOGSTATS_RWFILTER

If set, this environment variable overrides the value specified in SILK_LOGSTATS.

SILK_LOGSTATS_DEBUG

If the environment variable is set to a non-empty value, rwfilter prints messages to the standard error about the SILK_LOGSTATS value being used and either the reason why the value cannot be used or the arguments to the external program being executed.

FILES

${SILK_ADDRESS_TYPES}
${SILK_PATH}/share/silk/address_types.pmap
${SILK_PATH}/share/address_types.pmap
/usr/local/share/silk/address_types.pmap
/usr/local/share/address_types.pmap

Possible locations for the address types mapping file required by the --stype and --dtype switches.

${SILK_CONFIG_FILE}
ROOT_DIRECTORY/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided, where ROOT_DIRECTORY/ is the directory rwfilter is using as the root of the data repository.

${SILK_COUNTRY_CODES}
${SILK_PATH}/share/silk/country_codes.pmap
${SILK_PATH}/share/country_codes.pmap
/usr/local/share/silk/country_codes.pmap
/usr/local/share/country_codes.pmap

Possible locations for the country code mapping file required by the --scc and --dcc switches.

${SILK_DATA_ROOTDIR}/
/data/

Locations for the root directory of the data repository when the --data-rootdir switch is not specified.

${SILK_PATH}/lib64/silk/
${SILK_PATH}/lib64/
${SILK_PATH}/lib/silk/
${SILK_PATH}/lib/
/usr/local/lib64/silk/
/usr/local/lib64/
/usr/local/lib/silk/
/usr/local/lib/

Directories that rwfilter checks when attempting to load a plug-in.

NOTES

rwfilter is the most commonly used application in the suite. It provides access to the data files and performs all the basic queries.

rwfilter supports a variety of I/O options - in addition to reading from the data store, rwfilter results can be chained together with named pipes to output results to multiple files simultaneously. An introduction to named pipes is outside the scope of this document, however.

Two often underused options are --dry-run and --print-statistics. --dry-run performs a sanity check on the arguments and can be used, especially for complicated arguments, to check that the arguments are acceptable. --print-statistics used without --pass-destination or --fail-destination simply prints aggregate statistics to the standard error on a single line, and it can be used to do a quick pass through the data to get aggregate counts before going in deeper into the phenomenon being investigated.

--print-filename can be used as a progress meter; during long jobs, it shows which file is currently being read by rwfilter. --print-filename does not provide meaningful feedback with piped input.

Filters are applied in the order given on the command line. It is best to apply the biggest filters first.

The rwfilter command line is written into the header of the output file(s). You may use the rwfileinfo(1) command to see this information.

SEE ALSO

rwcut(1), rwfglob(1), rwfileinfo(1), rwset(1), rwtuc(1), rwsetbuild(1), rwsiteinfo(1), addrtype(3), ccfilter(3), flowrate(3), ipafilter(3), pmapfilter(3), pysilk(3), silkpython(3), silk-plugin(3), silk.conf(5), sensor.conf(5), silk(7), rwflowpack(8), yaf(1), applabel(1), zlib(3), dlopen(3), tzset(3), environ(7), Analysts’ Handbook: Using SiLK for Network Traffic Analysis

rwgeoip2ccmap

Create a country code prefix map from a GeoIP Legacy file

SYNOPSIS

  rwgeoip2ccmap [--mode={auto|ipv4|ipv6}]  
        [--input-file=FILENAME] [--output-file=FILENAME] [--dry-run]  
        [--note-add=TEXT] [--note-file-add=FILENAME]  
        [--invocation-strip]

  rwgeoip2ccmap --help

  rwgeoip2ccmap --version

Legacy Synopsis

  rwgeoip2ccmap {--csv-input | --v6-csv-input | --encoded-input}  
        [--input-file=FILENAME] [--output-file=FILENAME] [--dry-run]  
        [--note-add=TEXT] [--note-file-add=FILENAME]  
        [--invocation-strip]

DESCRIPTION

Prefix maps provide a way to map field values to string labels based on a user-defined map file. The country code prefix map, typically named country_codes.pmap, is a special prefix map that maps an IP address to a two-letter country code. It uses the country codes defined by the Internet Assigned Numbers Authority (http://www.iana.org/root-whois/index.html).

The country code prefix map is based on the GeoIP Legacy Country(R) or free GeoLite Legacy database created by MaxMind(R) and available from http://www.maxmind.com/. (Note: You must use the MaxMind legacy database format. rwgeoip2ccmap does not support the GeoIP2 and GeoLite2 databases.)

The database is available several formats, and rwgeoip2ccmap supports the following formats:

GeoIPCountryCSV.zip

a compressed (zip(1)) textual file containing an IPv4 range, country name, and county code in a comma separated value (CSV) format.

GeoIPv6.csv.gz

a compressed (gzip(1)) textual file containing an IPv6 range, country name, and county code in a CSV format. This file only contains IPv6 data. If you use this file to create your country code prefix map, any IPv4 addresses will have the unknown value --. See EXAMPLES for a way to merge the IPv6 and IPv4 files.

GeoIP.dat.gz

a compressed (gzip(1)) binary file containing specially encoded data for IPv4 address ranges.

GeoIPv6.dat.gz

a compressed (gzip(1)) binary file containing specially encoded data for both IPv4 and IPv6 address ranges.

The country code prefix map file is used by ccfilter(3) to map IP addresses to country codes in various SiLK tools. The ccfilter feature allows you to

The rwpmaplookup(1) command can use the country code mapping file to display the country code for textual IP addresses.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--mode={auto|ipv4|ipv6|binary}

Specify the type of the input which determines type of prefix map rwgeoip2ccmap creates. When not specified, rwgeoip2ccmap determines the type of prefix map to create based on the first line of input. The modes are:

auto

Determine the type of prefix map to create based on the IP addresses appear on the first line of input. This is the default mode.

ipv4

Read textual input containing IPv4 addresses in a comma separated value format and create an IPv4 prefix map. Any IPv6 addresses in the ::ffff:0:0/96 netblock are mapped to an IPv4 address and all other IPv6 addresses are ignored.

ipv6

Read textual input containing IPv6 addresses in a comma separated value format and create an IPv6 prefix map. Any IPv4 addresses are mapped into the ::ffff:0:0/96 netblock.

binary

Read specially-encoded binary input containing either IPv4 or IPv6 addresses and create the appropriate type of prefix map. Since SiLK 3.12.2.

--input-file=FILENAME

Read the CSV or binary forms of the GeoIP Legacy country code database from FILENAME. You may use stdin or - to represent the standard input. When this switch is not provided, the input is read from the standard input unless the standard input is a terminal. rwgeoip2ccmap will read textual input from the terminal if the standard input is explicitly specified as the input. Since SiLK 3.12.0.

--output-file=FILENAME

Write the binary country code prefix map to FILENAME. You may use stdout or - to represent the standard output. When this switch is not provided, the prefix map is written to the standard output unless the standard output is connected to a terminal. Since SiLK 3.12.0.

--dry-run

Check the syntax of the input file and do not write the output file. Since SiLK 3.12.0.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool. Since SiLK 3.12.0.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation. Since SiLK 3.12.0.

--invocation-strip

Do not record the command used to create the prefix map in the output. When this switch is not given, the invocation is written to the file’s header, and the invocation may be viewed with rwfileinfo(1). Since SiLK 3.12.0.

--csv-input

Assume the input is the CSV GeoIP Legacy country code data for IPv4. This switch is deprecated, and it should be replaced with --mode=ipv4.

--v6-csv-input

Assume the input is the CSV GeoIP Legacy country code data for IPv6. This switch is deprecated, and it should be replaced with --mode=ipv6.

--encoded-input

Assume the input is the specially-encoded binary form of the GeoIP Legacy country code data for either IPv4 or IPv6. This switch is deprecated, and it should be replaced with --mode=binary.

--help

Print the available options and exit.

--version

Print the version number and exit the application.

EXAMPLES

The following examples show how to create the country code prefix map file, country_codes.pmap, from various forms of input. Once you have created the country_codes.pmap file, you should copy it to /usr/local/share/silk/country_codes.pmap so that the ccfilter(3) plug-in can find it. Alternatively, you can set the SILK_COUNTRY_CODES environment variable to the location of the country_codes.pmap file.

In these examples, the dollar sign ($) represents the shell prompt. Some input lines are split over multiple lines in order to improve readability, and a backslash (\) is used to indicate such lines.

IPv4 Comma Separated Values File

Download the CSV version of the MaxMind GeoIP Legacy Country database for IPv4, GeoIPCountryCSV.zip. (Use the Legacy form of the GeoIP or GeoLite database since the GeoIP2 and GeoLite2 databases are not supported.) Running unzip -l on the zip file should show a single file, GeoIPCountryWhois.csv.) To expand this file, use the unzip(1) utility; by using the -p option to unzip, you can pass the output of unzip directly to rwgeoip2ccmap:

 $ unzip -p GeoIPCountryCSV.zip | \  
       rwgeoip2ccmap --mode=ipv4 > country_codes.pmap

IPv6 Comma Separated Values File

If you download the IPv6 version of the MaxMind GeoIP Legacy Country database, use the following command to create the country_codes.pmap file:

 $ gzip -d -c GeoIPv6.csv.gz | \  
       rwgeoip2ccmap --mode=ipv6 > country_codes.pmap

Since the GeoIPv6.csv.gz file only contains IPv6 addresses, the resulting country_codes.pmap file will display the unknown value (--) for any IPv4 address. See the next example for a solution.

IPv6 and IPv4 Comma Separated Values Files

To create a country_codes.pmap mapping file that supports both IPv4 and IPv6 addresses, download both of the Legacy CSV files (GeoIPv6.csv.gz and GeoIPCountryCSV.zip) from MaxMind.

You need to uncompress both files and feed the result as a single stream to the standard input of rwgeoip2ccmap. This can be done in a few commands:

 $ gzip -d GeoIPv6.csv.gz  
 $ unzip GeoIPCountryCSV.zip  
 $ cat GeoIPv6.csv GeoIPCountryWhois.csv | \  
       rwgeoip2ccmap --mode=ipv6 > country_codes.pmap

Alternatively, if your shell supports it, you may be able to use a subshell to avoid having to store the uncompressed data:

 $ ( gzip -d -c GeoIPv6.csv.gz ; unzip -p GeoIPCountryCSV.zip ) | \  
       rwgeoip2ccmap --mode=ipv6 > country_codes.pmap

SEE ALSO

ccfilter(3), rwpmaplookup(1), rwfilter(1), rwcut(1), rwsort(1), rwstats(1), rwuniq(1), rwgroup(1), rwpmapbuild(1), silk(7), gzip(1), zip(1), unzip(1), http://dev.maxmind.com/geoip/legacy/geolite/

NOTES

Support for the binary form of the GeoIP Legacy format was removed in SiLK 3.12.0 and restored in SiLK 3.12.2.

rwgroup

Tag similar SiLK records with a common next hop IP value

SYNOPSIS

  rwgroup  
        {--id-fields=KEY | --delta-field=FIELD --delta-value=DELTA}  
        [--objective] [--summarize] [--rec-threshold=THRESHOLD]  
        [--group-offset=IP]  
        [--note-add=TEXT] [--note-file-add=FILE] [--output-path=PATH]  
        [--copy-input=PATH] [--compression-method=COMP_METHOD]  
        [--site-config-file=FILENAME]  
        [--plugin=PLUGIN [--plugin=PLUGIN ...]]  
        [--python-file=PATH [--python-file=PATH ...]]  
        [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]  
        [FILE]

  rwgroup [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]  
        [--plugin=PLUGIN ...] [--python-file=PATH ...] --help

  rwgroup [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]  
        [--plugin=PLUGIN ...] [--python-file=PATH ...] --help-fields

  rwgroup --version

DESCRIPTION

rwgroup reads sorted SiLK Flow records (c.f. rwsort(1)) from the standard input or from a single file name listed on the command line, marks records that form a group with an identifier in the Next Hop IP field, and prints the binary SiLK Flow records to the standard output. In some ways rwgroup is similar to rwuniq(1), but rwgroup writes SiLK flow records instead of textual output.

Two SiLK records are defined as being in the same group when the fields specified in the --id-fields switch match exactly and when the field listed in the --delta-field matches within the value given by the --delta-value switch. Either --id-fields or --delta-fields is required; both may be specified. A --delta-value must be given when --delta-fields is present.

The first group of records gets the identifer 0, and rwgroup writes that value into each record’s Next Hop IP field. The ID for each subsequent group is incremented by 1. The --group-offset switch may be used to set the identifier of the initial group.

The --rec-threshold switch may be used to only write groups that contain a certain number of records. The --summarize switch attempts to merge records in the same group to a single output record.

rwgroup requires that the records are sorted on the fields listed in the --id-fields and --delta-fields switches. For example, a call using

  rwgroup --id-field=2 --delta-field=9 --delta-value=3

should read the output of

  rwsort --field=2,9

otherwise the results are unpredictable.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

At least one value for --id-field or --delta-field must be provided; rwgroup terminates with an error if no fields are specified.

--id-fields=KEY

KEY contains the list of flow attributes (a.k.a. fields or columns) that must match exactly for flows to be considered part of the same group. Each field may be specified once only. KEY is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Field-names are case insensitive. Example:

 --id-fields=stime,10,1-5

There is no default value for the --id-fields switch.

The complete list of built-in fields that the SiLK tool suite supports follows, though note that not all fields are present in all SiLK file formats; when a field is not present, its value is 0.

sIP,1

source IP address

dIP,2

destination IP address

sPort,3

source port for TCP and UDP, or equivalent

dPort,4

destination port for TCP and UDP, or equivalent

protocol,5

IP protocol

packets,pkts,6

packet count

bytes,7

byte count

flags,8

bit-wise OR of TCP flags over all packets

sTime,9

starting time of flow (seconds resolution)

duration,10

duration of flow (seconds resolution)

eTime,11

end time of flow (seconds resolution)

sensor,12

name or ID of sensor at the collection point

class,20

class of sensor at the collection point

type,21

type of sensor at the collection point

iType

the ICMP type value for ICMP or ICMPv6 flows and zero for non-ICMP flows. Internally, SiLK stores the ICMP type and code in the dPort field, so there is no need have both dPort and iType or iCode in the sort key. This field was introduced in SiLK 3.8.1.

iCode

the ICMP code value for ICMP or ICMPv6 flows and zero for non-ICMP flows. See note at iType.

icmpTypeCode,25

equivalent to iType,iCode in --id-fields. This field may not be mixed with iType or iCode, and this field is deprecated as of SiLK 3.8.1. As of SiLK 3.8.1, icmpTypeCode may no longer be used as the argument to --delta-field; the dPort field will provide an equivalent result as long as the input is limited to ICMP flow records.

Many SiLK file formats do not store the following fields and their values will always be 0; they are listed here for completeness:

in,13

router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))

out,14

router SNMP output interface or postVlanId

SiLK can store flows generated by enhanced collection software that provides more information than NetFlow v5. These flows may support some or all of these additional fields; for flows without this additional information, the field’s value is always 0.

initialFlags,26

TCP flags on first packet in the flow

sessionFlags,27

bit-wise OR of TCP flags over all packets except the first in the flow

attributes,28

flow attributes set by the flow generator:

S

all the packets in this flow record are exactly the same size

F

flow generator saw additional packets in this flow following a packet with a FIN flag (excluding ACK packets)

T

flow generator prematurely created a record for a long-running connection due to a timeout. (When the flow generator yaf(1) is run with the --silk switch, it will prematurely create a flow and mark it with T if the byte count of the flow cannot be stored in a 32-bit value.)

C

flow generator created this flow as a continuation of long-running connection, where the previous flow for this connection met a timeout (or a byte threshold in the case of yaf).

Consider a long-running ssh session that exceeds the flow generator’s active timeout. (This is the active timeout since the flow generator creates a flow for a connection that still has activity). The flow generator will create multiple flow records for this ssh session, each spanning some portion of the total session. The first flow record will be marked with a T indicating that it hit the timeout. The second through next-to-last records will be marked with TC indicating that this flow both timed out and is a continuation of a flow that timed out. The final flow will be marked with a C, indicating that it was created as a continuation of an active flow.

application,29

guess as to the content of the flow. Some software that generates flow records from packet data, such as yaf, will inspect the contents of the packets that make up a flow and use traffic signatures to label the content of the flow. SiLK calls this label the application; yaf refers to it as the appLabel. The application is the port number that is traditionally used for that type of traffic (see the /etc/services file on most UNIX systems). For example, traffic that the flow generator recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard HTTP/web port (80).

The following fields provide a way to label the IPs or ports on a record. These fields require external files to provide the mapping from the IP or port to the label:

sType,16

categorize the source IP address as non-routable, internal, or external and group based on the category. Uses the mapping file specified by the SILK_ADDRESS_TYPES environment variable, or the address_types.pmap mapping file, as described in addrtype(3).

dType,17

as sType for the destination IP address

scc,18

the country code of the source IP address. Uses the mapping file specified by the SILK_COUNTRY_CODES environment variable, or the country_codes.pmap mapping file, as described in ccfilter(3).

dcc,19

as scc for the destination IP

src-MAPNAME

value determined by passing the source IP or the protocol/source-port to the user-defined mapping defined in the prefix map associated with MAPNAME. See the description of the --pmap-file switch below and the pmapfilter(3) manual page.

dst-MAPNAME

as src-MAPNAME for the destination IP or protocol/destination-port.

sval
dval

These are deprecated field names created by pmapfilter that correspond to src-MAPNAME and dst-MAPNAME, respectively. These fields are available when a prefix map is used that is not associated with a MAPNAME.

Finally, the list of built-in fields may be augmented by the run-time loading of PySiLK code or plug-ins written in C (also called shared object files or dynamic libraries), as described by the --python-file and --plugin switches.

--delta-field=FIELD

Specify a single field that can differ by a specified delta-value among the SiLK records that make up a group. The FIELD identifiers include most of those specified for --id-fields. The exceptions are that plug-in fields are not supported, nor are fields that do not have numeric values (e.g., class, type, flags). The most common value for this switch is stime, which allows records that are identical in the id-fields but temporally far apart to be in different groups. The switch takes a single argument; multiple delta fields cannot be specified. When this switch is specified, the --delta-value switch is required.

--delta-value=DELTA_VALUE

Specify the acceptable difference between the values of the --delta-field. The --delta-value switch is required when the --delta-field switch is provided. For fields other than those holding IPs, when two consecutive records have values less than or equal to DELTA_VALUE, the records are considered members of the same group. When the delta-field refers to an IP field, DELTA_VALUE is the number of least significant bits of the IPs to remove before comparing them. For example, when --delta-field=sIP --delta-value=8 is specified, two records are the same group if their source IPv4 addresses belong to the same /24 or if their source IPv6 addresses belong to the same /120. The --objective switch affects the meaning of this switch.

--objective

Change the behavior of the --delta-value switch so that a record is considered part of a group if the value of its --delta-field is within the DELTA_VALUE of the first record in the group. (When this switch is not specified, consecutive records are compared.)

--summarize

Cause rwgroup to print (typically) a single record for each group. By default, all records in each group having at least --rec-threshold members is printed. When --summarize is active, the record that is written for the group is the first record in the group with the following modifications:

Note that multiple records for a group may be printed if the bytes, packets, or elapsed time values are too large to be stored in a SiLK flow record.

--plugin=PLUGIN

Augment the list of fields by using run-time loading of the plug-in (shared object) whose path is PLUGIN. The switch may be repeated to load multiple plug-ins. The creation of plug-ins is described in the silk-plugin(3) manual page. When PLUGIN does not contain a slash (/), rwgroup will attempt to find a file named PLUGIN in the directories listed in the FILES section. If rwgroup finds the file, it uses that path. If PLUGIN contains a slash or if rwgroup does not find the file, rwgroup relies on your operating system’s dlopen(3) call to find the file. When the SILK_PLUGIN_DEBUG environment variable is non-empty, rwgroup prints status messages to the standard error as it attempts to find and open each of its plug-ins.

--rec-threshold=THRESHOLD

Specify the minimum number of SiLK records a group must contain before the records in the group are written to the output stream. The default is 1; i.e., write all records. The maximum threshold is 65535.

--group-offset=IP

Specify the value to write into the Next Hop IP for the records that comprise the first group. The value IP may be an integer, or an IPv4 or IPv6 address in the canonical presentation form. If not specified, counting begins at 0. The value for each subsequent group is incremented by 1.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--copy-input=PATH

Copy all binary SiLK Flow records read as input to the specified file or named pipe. PATH may be stdout or - to write flows to the standard output as long as the --output-path switch is specified to redirect rwgroup’s output to a different location.

--output-path=PATH

Write the binary SiLK Flow records to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwgroup exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwgroup to exit with an error.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwgroup searches for the site configuration file in the locations specified in the FILES section.

--help

Print the available options and exit. Specifying switches that add new fields or additional switches before --help will allow the output to include descriptions of those fields or switches.

--help-fields

Print the description and alias(es) of each field and exit. Specifying switches that add new fields before --help-fields will allow the output to include descriptions of those fields.

--version

Print the version number and information about how SiLK was configured, then exit the application.

--pmap-file=MAPNAME:PATH
--pmap-file=PATH

Instruct rwgroup to load the mapping file located at PATH and create the src-MAPNAME and dst-MAPNAME fields. When MAPNAME is provided explicitly, it will be used to refer to the fields specific to that prefix map. If MAPNAME is not provided, rwgroup will check the prefix map file to see if a map-name was specified when the file was created. If no map-name is available, rwgroup creates the fields sval and dval. Multiple --pmap-file switches are supported as long as each uses a unique value for map-name. The --pmap-file switch(es) must precede the --id-fields switch. For more information, see pmapfilter(3).

--python-file=PATH

When the SiLK Python plug-in is used, rwgroup reads the Python code from the file PATH to define additional fields that can be used as part of the group key. This file should call register_field() for each field it wishes to define. For details and examples, see the silkpython(3) and pysilk(3) manual pages.

LIMITATIONS

rwgroup requires sorted data. The application works by comparing records in the order that the records are received (similar to the UNIX uniq(1) command), odd orders will produce odd groupings.

EXAMPLES

In the following example, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

As a rule of thumb, the --id-fields and --delta-field parameters should match rwsort(1)’s call, with --delta-field being the last parameter. A call to group all web traffic by queries from the same addresses (field=2) within 10 seconds (field=9) of the first query from that address will be:

 $ rwfilter --proto=6 --dport=80 --pass=stdout                  \  
   | rwsort --field=2,9                                         \  
   | rwgroup --id-field=2 --delta-field=9 --delta-value=10      \  
        --objective

ENVIRONMENT

PYTHONPATH

This environment variable is used by Python to locate modules. When --python-file is specified, rwgroup must load the Python files that comprise the PySiLK package, such as silk/__init__.py. If this silk/ directory is located outside Python’s normal search path (for example, in the SiLK installation tree), it may be necessary to set or modify the PYTHONPATH environment variable to include the parent directory of silk/ so that Python can find the PySiLK module.

SILK_PYTHON_TRACEBACK

When set, Python plug-ins will output traceback information on Python errors to the standard error.

SILK_COUNTRY_CODES

This environment variable allows the user to specify the country code mapping file that rwgroup uses when computing the scc and dcc fields. The value may be a complete path or a file relative to the SILK_PATH. See the FILES section for standard locations of this file.

SILK_ADDRESS_TYPES

This environment variable allows the user to specify the address type mapping file that rwgroup uses when computing the sType and dType fields. The value may be a complete path or a file relative to the SILK_PATH. See the FILES section for standard locations of this file.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwgroup may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files and plug-ins, rwgroup may use this environment variable. See the FILES section for details.

SILK_PLUGIN_DEBUG

When set to 1, rwgroup prints status messages to the standard error as it attempts to find and open each of its plug-ins. In addition, when an attempt to register a field fails, rwgroup prints a message specifying the additional function(s) that must be defined to register the field in rwgroup. Be aware that the output can be rather verbose.

FILES

${SILK_ADDRESS_TYPES}
${SILK_PATH}/share/silk/address_types.pmap
${SILK_PATH}/share/address_types.pmap
/usr/local/share/silk/address_types.pmap
/usr/local/share/address_types.pmap

Possible locations for the address types mapping file required by the sType and dType fields.

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

${SILK_COUNTRY_CODES}
${SILK_PATH}/share/silk/country_codes.pmap
${SILK_PATH}/share/country_codes.pmap
/usr/local/share/silk/country_codes.pmap
/usr/local/share/country_codes.pmap

Possible locations for the country code mapping file required by the scc and dcc fields.

${SILK_PATH}/lib64/silk/
${SILK_PATH}/lib64/
${SILK_PATH}/lib/silk/
${SILK_PATH}/lib/
/usr/local/lib64/silk/
/usr/local/lib64/
/usr/local/lib/silk/
/usr/local/lib/

Directories that rwgroup checks when attempting to load a plug-in.

SEE ALSO

rwfilter(1), rwfileinfo(1), rwsort(1), rwuniq(1), addrtype(3), ccfilter(3), pmapfilter(3), pysilk(3), silkpython(3), silk-plugin(3), sensor.conf(5), uniq(1), silk(7), yaf(1), dlopen(3), zlib(3)

rwidsquery

Invoke rwfilter to find flows matching Snort signatures

SYNOPSIS

 rwidsquery --intype=INPUT_TYPE  
        [--output-file=OUTPUT_FILE]  
        [--start-date=YYYY/MM/DD[:HH] [--end-date=YYYY/MM/DD[:HH]]]  
        [--year=YEAR] [--tolerance=SECONDS]  
        [--config-file=CONFIG_FILE]  
        [--mask=PREDICATE_LIST]  
        [--verbose] [--dry-run]  
        [INPUT_FILE | -]  
        [-- EXTRA_RWFILTER_ARGS...]

  rwidsquery --help

  rwidsquery --version

DESCRIPTION

rwidsquery facilitates selection of SiLK flow records that correspond to Snort IDS alerts and signatures. rwidsquery takes as input either a snort(8) alert log or rule file, analyzes the alert or rule contents, and invokes rwfilter(1) with the appropriate arguments to retrieve flow records that match attributes of the input file. rwidsquery will process the Snort rules or alerts from a single file named on the command line; if no file name is given, rwidsquery will attempt to read the Snort rules or alerts from the standard input, unless the standard input is connected to a terminal. An input file name of - or stdin will force rwidsquery to read from the standard input, even when the standard input is a terminal.

OPTIONS

In addition to the options listed below, you can pass extra options through to rwfilter(1) on the rwidsquery command line. The syntax for doing so is to place a double-hyphen (--) sequence after all valid rwidsquery options, and before all of the options you wish to pass through to rwfilter.

--intype=INPUT_TYPE

Specify the type of input contained in the input file. This switch is required. Two alert formats and one rule format are currently supported. Valid values for this option are:

fast

Input is a Snort ”fast” log file entry. Alerts are written in this format when Snort is configured with the snort_fast output module enabled. snort_fast alerts resemble the following:

    Jan  1 01:23:45 hostname snort[1976]: [1:1416:11] ...

full

Input is a Snort ”full” log file entry. Alerts are written in this format when Snort is configured with the snort_full output module enabled. snort_full alerts look like the following example:

    [**] [116:151:1] (snort decoder) Bad Traffic  ...

rule

Input is a Snort rule (signature). For example:

    alert tcp $EXTERNAL_NET any -> $HOME_NET any ...

--output-file=OUTPUT_FILE

Specify the output file that flows will be written to. If not specified, the default is to write to stdout. The argument to this option becomes the argument to rwfilter’s --pass-destination switch.

--start-date=YYYY/MM/DD[:HH]
--end-date=YYYY/MM/DD[:HH]

Used in conjunction with rule file input only. The date predicates indicate which time to start and end the search. See the rwfilter(1) manual page for details of the date format.

--year=YEAR

Used in conjunction with alert file input only. Timestamps in Snort alert files do not contain year information. By default, the current calendar year is used, but this option can be used to override this default behavior.

--tolerance=SECONDS

Used in conjunction with alert file input only. This option is provided to compensate for timing differences between the timestamps in Snort alerts and the start/end time of the corresponding flows. The default --tolerance value is 3600 seconds, which means that flow records +/- one hour from the alert timestamp will be searched.

--config-file=CONFIG_FILE

Used in conjunction with rule file input only. Snort requires a configuration file which, among other things, contains variables that can be used in Snort rule definitions. This option allows you to specify the location of this configuration file so that IP addresses, port numbers, and other information from the snort configuration file can be used to find matching flows.

--mask=PREDICATE_LIST

Exclude the rwfilter predicates named in PREDICATE_LIST from the selection criteria. This option is provided to widen the scope of queries by making them more general than the Snort rule or alert provided. For instance, --mask=dport will return flows with any destination port, not just those which match the input Snort alert or rule.

--verbose

Print the resulting rwfilter(1) command to the standard error prior to executing it.

--dry-run

Print the resulting rwfilter(1) command to the standard error but do not execute it.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

To find SiLK flows matching a Snort alert in snort_fast format:

 $ rwidsquery --intype fast --year 2007 --tolerance 300 alert.fast.txt

For the following Snort alert:

 Nov  15 00:00:58 hostname snort[5214]: [1:1416:11]  
 SNMP broadcast trap [Classification: Attempted Information Leak]  
 [Priority: 2]: {TCP}  
 192.168.0.1:4161 -> 127.0.0.1:139

The resulting rwfilter(1) command would look similar to:

 $ rwfilter --start-date=2007/11/14:23 --end-date=2007/11/15:00&