The SiLK Reference Guide
(SiLK-3.23.1)

CERT Software Automation Product Development
©2002–2024 Carnegie Mellon University
License available in Appendix 8.0
 
The canonical location for this handbook is
https://tools.netsa.cert.org/silk/silk-reference-guide.pdf

September 26, 2024

SiLK 3.23

Copyright 2024 Carnegie Mellon University.

NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN ”AS-IS” BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT.

Licensed under a GNU GPL 2.0-style license, please see LICENSE.txt or contact permission@sei.cmu.edu for full terms.

[DISTRIBUTION STATEMENT A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for non-US Government use and distribution.

This Software includes and/or makes use of Third-Party Software each subject to its own license.

DM24-1064

Contents

Introduction
1 SiLK Analysis Tools and Utilities
mapsid
num2dot
rwaddrcount
rwaggbag
rwaggbagbuild
rwaggbagcat
rwaggbagtool
rwappend
rwbag
rwbagbuild
rwbagcat
rwbagtool
rwcat
rwcombine
rwcompare
rwcount
rwcut
rwdedupe
rwfglob
rwfileinfo
rwfilter
rwgeoip2ccmap
rwgroup
rwidsquery
rwip2cc
rwipaexport
rwipaimport
rwipfix2silk
rwmatch
rwnetmask
rwp2yaf2silk
rwpcut
rwpdedupe
rwpdu2silk
rwpmapbuild
rwpmapcat
rwpmaplookup
rwpmatch
rwptoflow
rwrandomizeip
rwrecgenerator
rwresolve
rwscan
rwscanquery
rwset
rwsetbuild
rwsetcat
rwsetmember
rwsettool
rwsilk2ipfix
rwsiteinfo
rwsort
rwsplit
rwstats
rwswapbytes
rwtotal
rwtuc
rwuniq
silk_config
3 SiLK Libraries and Plug-Ins
addrtype
app-mismatch
ccfilter
conficker-c
cutmatch
flowkey
flowrate
int-ext-fields
ipafilter
packlogic-generic.so
packlogic-twoway.so
pmapfilter
PySiLK
silk-plugin
silkpython
5 SiLK File Formats
sensor.conf
silk.conf
7 SiLK Miscellaneous Information
SiLK
8 SiLK Administrator’s Tools
flowcap
rwflowappend
rwflowpack
rwguess
rwpackchecker
rwpollexec
rwreceiver
rwsender
A License

Introduction

The SiLK Reference Guide contains the manual page for each analysis tool, utility, plug-in, file format, and collection facility in the SiLK Collection and Analysis Suite.

This document is meant for reference only. The SiLK Analysis Handbook provides both a tutorial for learning about the tools and examples of how they can be used in analyzing flow data. See the SiLK Installation Handbook for instructions on installing SiLK at your site.

This reference guide is broken into sections like the traditional UNIX manual: end-user analysis tools and utilities are described in Section 1; the libraries and plug-ins that augment the behavior of some tools are presented in Section 3; Section 5 contains information about file formats; miscellaneous information is in Section 7; and commands for the installer and administrator of SiLK appear in Section 8.

 1
SiLK Analysis Tools and Utilities

This section provides the manual page for each analysis tool and utility that the users of SiLK may employ in their day-to-day work.

mapsid

Map between sensor names and sensor numbers

SYNOPSIS

  mapsid [--print-classes] [--print-descriptions]
        [--site-config-file=FILENAME]
        [{ <sensor-name> | <sensor-number> } ...]

  mapsid --help

  mapsid --version

DESCRIPTION

As of SiLK 3.0, mapsid is deprecated, and it will be removed in the SiLK 4.0 release. Use rwsiteinfo(1) instead---the EXAMPLES section shows how to use rwsiteinfo to get output similar to that produced by mapsid.

mapsid is a utility that maps sensor names to sensor numbers or vice versa depending on the input arguments. Sensors are defined in the silk.conf(5) file.

When no sensor arguments are given to mapsid, the mapping of all sensor numbers to names is printed. When a numeric argument is given, the number to name mapping is printed for the specified argument. When a name is given, its numeric id is printed. For convenience when typing in sensor names, case is ignored.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--print-classes

For each sensor, print the classes for which the sensor collects data. The classes are enclosed in square brackets, [].

--print-descriptions

For each sensor, print the description of the sensor as defined in the silk.conf file (if any).

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, mapsid searches for the site configuration file in the locations specified in the FILES section.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

The following examples demonstrate the use of mapsid. In addition, each example shows how to get similar output using rwsiteinfo(1).

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

Name to number mapping
 $ mapsid beta
 BETA ->     1

 $ rwsiteinfo --fields=sensor,id-sensor --sensors=BETA
 Sensor|Sensor-ID|
   BETA|        1|

Unlike mapsid, matching of the sensor name is case-sensitive in rwsiteinfo.

Number to name mapping
 $ mapsid 3
     3 -> DELTA

 $ rwsiteinfo --fields=id-sensor,sensor --sensors=3 --delimited=,
 Sensor-ID,Sensor
 3,DELTA

Print all mappings
 $ mapsid
     0 -> ALPHA
     1 -> BETA
     2 -> GAMMA
     3 -> DELTA
     4 -> EPSLN
     5 -> ZETA
      ....

 $ rwsiteinfo --fields=id-sensor,sensor --no-titles
   0| ALPHA|
   1|  BETA|
   2| GAMMA|
   3| DELTA|
   4| EPSLN|
   5|  ZETA|
   ...

Print the class
 $ mapsid --print-classes 3 ZETA
     3 -> DELTA  [all]
 ZETA  ->     5  [all]

 $ rwsiteinfo --fields=id-sensor,sensor,class:list --sensors=4,ZETA
 Sensor-ID|Sensor|Class:list|
         3| DELTA|       all|
         5|  ZETA|       all|

Print the class and description
 $ mapsid --print-classes --print-description 0 1
     0 -> ALPHA  [all]  "Primary gateway"
     1 -> BETA   [all]  "Secondary gateway"

rwsiteinfo supports using an integer range when specifying sensors.

 $ rwsiteinfo --fields=id-sensor,sensor,class:list,describe-sensor \
       --sensors=0-1
 Sensor-ID|Sensor|Class:list|Sensor-Description|
         0| ALPHA|       all|   Primary gateway|
         1|  BETA|       all| Secondary gateway|

ENVIRONMENT

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, mapsid may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, mapsid may use this environment variable. See the FILES section for details.

FILES

${SILK_CONFIG_FILE}

${SILK_DATA_ROOTDIR}/silk.conf

/data/silk.conf

${SILK_PATH}/share/silk/silk.conf

${SILK_PATH}/share/silk.conf

/usr/local/share/silk/silk.conf

/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwsiteinfo(1), silk.conf(5), silk(7)

NOTES

As of SiLK 3.0, mapsid is deprecated; use rwsiteinfo(1) instead.

num2dot

Convert an integer IP to dotted-decimal notation

SYNOPSIS

  num2dot [--ip-fields=FIELDS] [--delimiter=C]

  num2dot --help

  num2dot --version

DESCRIPTION

num2dot is a filter to speedup sorting of IP numbers and yet result in both a natural order (i.e., 29.23.1.1 will appear before 192.168.1.1) and readable output (i.e., dotted decimal rather than an integer representation of the IP number).

It is designed specifically to deal with the output of rwcut(1). Its job is to read stdin and convert specified fields (default field 1) separated by a delimiter (default ’|’) from an integer number into a dotted decimal IP address. Up to three IP fields can be specified via the --ip-fields=FIELDS option. The --delimiter option can be used to specify an alternate delimiter.

num2dot does not support IPv6 addresses. The EXAMPLES section below includes an example PySiLK script to handle IPv6.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--ip-fields=FIELDS

Column number of the input that should be considered IP numbers. Column numbers start from 1. If not specified, the default is 1.

--delimiter=C

The character that separates the columns of the input. Default is ’|’.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following example, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

Suppose in addition to the default fields of 1-12 produced by rwcut(1), you want to prefix each row with an integer form of the destination IP and the start time to make processing by another tool (e.g., a spreadsheet) easier. However, within the default rwcut output fields of 1-12, you want to see dotted-decimal IP addresses. You could use the following command:

 $ rwfilter ... --pass=stdout                               \
   | rwcut --fields=dip,stime,1-12 --ip-format=decimal      \
        --timestamp-format=epoch                            \
   | num2dot --ip-field=3,4

In the rwcut invocation, you prepend the fields of interest (dip and stime before the standard fields. The first six columns produced by rwcut will be dIP, sTime, sIP, dIP, sPort, dPort. The --ip-format switch causes the first, third, and fourth columns to be printed as integers, but you only want the first column to have an integer representation. The pipe through num2dot will convert the third and fourth columns to dotted-decimal IP numbers.

num2dot does not support converting integers to IPv6 addresses. The following PySiLK script (see pysilk(3)) could be used as a starting-point to create a version of num2dot that supports IPv6 addresses:

 #! /usr/bin/env python
 from __future__ import print_function
 import sys
 import silk
 # The IPv6 fields to process; the ID of the first field is 0
 ip_fields = (0, 1)
 # The delimiter between fields
 delim = ’|’
 # The width of the IPv6 fields
 width = 39
 # The file to process; this script processes standard input
 f = sys.stdin
 try:
     for line in f:
         fields = line.rstrip(f.newlines).split(delim)
         for i in ip_fields:
             fields[i] = "%*s" % (width, silk.IPv6Addr(int(fields[i])))
         print(delim.join(fields))
 finally:
     f.close()

SEE ALSO

rwcut(1), pysilk(3), silk(7)

BUGS

num2dot has no support for IPv6 addresses.

rwaddrcount

Count activity by IPv4 address

SYNOPSIS

  rwaddrcount {--print-recs | --print-ips | --print-stat}
        [--use-dest] [--min-bytes=BYTEMIN] [--max-bytes=BYTEMAX]
        [--min-records=RECMIN] [--max-records=RECMAX]
        [--min-packets=PACKMIN] [--max-packets=PACKMAX]
        [--set-file=PATHNAME] [--sort-ips] [--timestamp-format=FORMAT]
        [--ip-format=FORMAT] [--integer-ips] [--zero-pad-ips]
        [--no-titles] [--no-columns] [--column-separator=CHAR]
        [--no-final-delimiter] [{--delimited | --delimited=CHAR}]
        [--print-filenames] [--copy-input=PATH] [--output-path=PATH]
        [--pager=PAGER_PROG] [--site-config-file=FILENAME]
        [{--legacy-timestamps | --legacy-timestamps=NUM}]
        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}

  rwaddrcount --help

  rwaddrcount --version

DESCRIPTION

rwaddrcount reads SiLK Flow records, sums the byte-, packet-, and record-counts on those records by individual source or destination IP address and maintains the time window during which that IP address was active. At the end of the count operation, the results per IP address are displayed when the --print-recs switch is given. rwaddrcount includes facilities for displaying only those IP address whose byte-, packet- or flow-counts are between specified minima and maxima.

rwaddrcount does not support IPv6 addresses. To generate output for IPv6 records, use the rwuniq(1) tool:

 rwuniq --fields=sip --values=bytes,packets,records,stime,etime

rwaddrcount reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwaddrcount reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

For the application to operate, one of the three --print options must be chosen.

--print-recs

Print one row for each bin that meets the minima/maxima criteria. Each bin contains the IP address, number of bytes, number of packets, number of flow records, earliest start time, and latest end time.

--print-ips

Print a single column containing the IP addresses for each bin that meets the minima/maxima criteria.

--print-stat

Print a one or two line summary (plus a title line) that summarizes the bins. The first line is a summary across all bins, and it contains the number of unique IP addresses and the sums of the bytes, packets, and flow records. The second line is printed only when one or more minima or maxima are specified. This second line contains the same columns as first, and its values are the sums across those bins that meet the criteria.

--use-dest

Count by destination IP address in the filter record rather than source IP.

--min-bytes=BYTEMIN

Filtering criterion; for the final output (stats or printing), only include count records where the total number of bytes exceeds BYTEMIN

--min-packets=PACKMIN

Filtering criterion; for the final output (stats or printing), only include count records where the total number of packets exceeds PACKMIN

--min-records=RECMIN

Filtering criterion; for the final output (stats or printing), only include count records where the total number of filter records contributing to that count record exceeds RECMIN.

--max-bytes=BYTEMAX

Filtering criterion; for the final output (stats or printing), only include count records where the total number of bytes is less than BYTEMAX.

--max-packets=PACKMAX

Filtering criterion; for the final output (stats or printing), only include count records where the total number of packets is less than PACKMAX.

--max-records=RECMAX

Filtering criterion; for the final output (stats or printing), only include count records which at most RECMAX filter records contributed to.

--set-file=PATHNAME

Write the IPs into the rwset(1)-style binary IP-set file named PATHNAME. Use rwsetcat(1) to see the contents of this file.

--timestamp-format=FORMAT

Specify the format and/or timezone to use when printing timestamps. When this switch is not specified, the SILK_TIMESTAMP_FORMAT environment variable is checked for a default format and/or timezone. If it is empty or contains invalid values, timestamps are printed in the default format, and the timezone is UTC unless SiLK was compiled with local timezone support. FORMAT is a comma-separated list of a format and/or a timezone. The format is one of:

default

Print the timestamps as YYYY /MM/DDThh:mm:ss

iso

Print the timestamps as YYYY -MM-DD hh:mm:ss

m/d/y

Print the timestamps as MM/DD/YYYY hh:mm:ss

epoch

Print the timestamps as the number of seconds since 00:00:00 UTC on 1970-01-01.

When a timezone is specified, it is used regardless of the default timezone support compiled into SiLK. The timezone is one of:

utc

Use Coordinated Universal Time to print timestamps.

local

Use the TZ environment variable or the local timezone.

--ip-format=FORMAT

For the --print-recs and --print-ips output formats, specify how IP addresses are printed, where FORMAT is a comma-separated list of the arguments described below. When this switch is not specified, the SILK_IP_FORMAT environment variable is checked for a value and that format is used if it is valid. The default FORMAT is canonical. Since SiLK 3.7.0.

canonical

Print IP addresses in the canonical format: dot-separated decimal for IPv4 (192.0.2.1).

no-mixed

Print IP addresses in the canonical format (192.0.2.1). Prevent use of the mixed IPv4-IPv6 representation when map-v4 is also included in FORMAT. For example, use ::ffff:c000:201 instead of ::ffff:192.0.2.1. Since SiLK 3.17.0.

decimal

Print IP addresses as integers in decimal format. For example, print 192.0.2.1 and ::ffff:192.0.2.1 as 3221225985 and 281473902969345, respectively.

hexadecimal

Print IP addresses as integers in hexadecimal format. For example, print 192.0.2.1 and ::ffff:192.0.2.1 as c00000201 and ffffc00000201, respectively.

zero-padded

Make all IP address strings contain the same number of characters by padding numbers with leading zeros. For example, print 192.0.2.1 as 192.000.002.001. For IPv6 addresses, this setting implies no-mixed, so that ::ffff:192.0.2.1 is printed as 0000:0000:0000:0000:0000:ffff:c000:0201. As of SiLK 3.17.0, may be combined with any of the above, including decimal and hexadecimal.

The following arguments modify certain IP addresses prior to printing. These arguments may be combined with the above formats.

map-v4

Change addresses to IPv4-mapped IPv6 addresses (addresses in the ::ffff:0:0/96 netblock) prior to formatting. Since SiLK 3.17.0.

unmap-v6

Do nothing (rwaddrcount does not support IPv6 addresses as the key). Since SiLK 3.17.0.

The following argument is also available:

force-ipv6

Set FORMAT to map-v4,no-mixed.

--integer-ips

Print IP addresses as integers. This switch is equivalent to --ip-format=decimal, it is deprecated as of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.

--zero-pad-ips

Print IP addresses as fully-expanded, zero-padded values in the canonical format. This switch is equivalent to --ip-format=zero-padded, it is deprecated as of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release

--sort-ips

For the --print-recs and --print-ips output formats, the results are presented sorted by IP address.

--no-titles

Turn off column titles. By default, titles are printed.

--no-columns

Disable fixed-width columnar output.

--column-separator=C

Use specified character between columns and after the final column. When this switch is not specified, the default of ’|’ is used.

--no-final-delimiter

Do not print the column separator after the final column. Normally a delimiter is printed.

--delimited

--delimited=C

Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default ’|’.

--print-filenames

Print to the standard error the names of input files as they are opened.

--copy-input=PATH

Copy all binary SiLK Flow records read as input to the specified file or named pipe. PATH may be stdout or - to write flows to the standard output as long as the --output-path switch is specified to redirect rwaddrcount’s textual output to a different location.

--output-path=PATH

Write the textual output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output (and bypass the paging program). If PATH names an existing file, rwaddrcount exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is either sent to the pager or written to the standard output.

--pager=PAGER_PROG

When output is to a terminal, invoke the program PAGER_PROG to view the output one screen full at a time. This switch overrides the SILK_PAGER environment variable, which in turn overrides the PAGER variable. If the --output-path switch is given or if value of the pager is determined to be the empty string, no paging is performed and all output is written to the terminal.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwaddrcount searches for the site configuration file in the locations specified in the FILES section.

--legacy-timestamps

--legacy-timestamps=NUM

When NUM is not specified or is 1, this switch is equivalent to --timestamp-format=m/d/y. Otherwise, the switch has no effect. This switch is deprecated as of SiLK 3.0.0, and it will be removed in the SiLK 4.0 release.

--xargs

--xargs=FILENAME

Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwaddrcount opens each named file in turn and reads records from it as if the filenames had been listed on the command line.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

Deprecated Switches

The following switches are deprecated. They will be removed in SiLK 4.0.

--byte-min=BYTEMIN

Deprecated alias for --min-bytes.

--packet-min=PACKMIN

Deprecated alias for --min-packets.

--rec-min=RECMIN

Deprecated alias for --min-records.

--byte-max=BYTEMAX

Deprecated alias for --max-bytes.

--packet-max=PACKMAX

Deprecated alias for --max-packets.

--rec-max=RECMAX

Deprecated alias for --max-records.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

To print a list of source IP addresses that appeared in exactly one TCP record during the first 12 hours of 2003-Sep-01, use:

 $ rwfilter --start-date=2003/09/01:00 --end-date=2003/09/01:11     \
        --proto=6 --pass=stdout                                     \
   | rwaddrcount --max-records=1 --print-ips

In general, to print out record information, use rwaddrcount with --print-recs

 $ rwfilter --start-date=2003/01/17:00 --end-date=2003/01/17:23     \
        --proto=6 --pass=stdout                                     \
   | rwaddrcount --print-rec --no-title | head -3

  10.10.10.1|  65792| 147|  21| 2003/01/17T00:19:01| 2003/01/17T02:00:13|
  10.10.10.2| 110744|  89|   7| 2003/01/17T01:21:42| 2003/01/17T01:39:21|
  10.10.10.3|    864|  18|   6| 2003/01/17T00:20:33| 2003/01/17T01:25:38|

Replacements for rwaddrcount

We note some overlapping features between rwaddrcount and rwuniq(1). There is often more than one way to perform the same task in the SiLK tool set.

Here’s a guide to replacing each of the outputs of rwaddrcount:

The --print-recs switch prints five pieces of information for each source or destination address:

 $ rwaddrcount --print-recs data.rw
           sIP|Bytes|Packets|Records|         Start_Time|           End_Time|
    10.0.0.144| 1646|      4|      1|2007/05/09T18:01:41|2007/05/09T18:01:41|
 10.14.203.121|   40|      1|      1|2007/05/09T18:31:54|2007/05/09T18:31:54|
 10.14.203.122|   40|      1|      1|2007/05/09T18:32:43|2007/05/09T18:32:43|
    10.15.6.14|  539|      3|      3|2007/05/09T18:03:05|2007/05/09T18:08:07|
   12.0.101.22| 4365|     23|      2|2007/05/09T18:26:43|2007/05/09T18:43:46|

To do the same in rwuniq, specify either sip in --fields and the --values shown here:

 $ rwuniq --fields=sip --values=bytes,packets,flows,stime,etime data.rw
           sIP|Bytes|Packets|Records|          min_sTime|          max_eTime|
    10.0.0.144| 1646|      4|      1|2007/05/09T18:01:41|2007/05/09T18:01:41|
 10.14.203.121|   40|      1|      1|2007/05/09T18:31:54|2007/05/09T18:31:54|
 10.14.203.122|   40|      1|      1|2007/05/09T18:32:43|2007/05/09T18:32:43|
    10.15.6.14|  539|      3|      3|2007/05/09T18:03:05|2007/05/09T18:08:07|
   12.0.101.22| 4365|     23|      2|2007/05/09T18:26:43|2007/05/09T18:43:46|

When rwaddrcount includes --use-dest, change the --fields switch of rwuniq to dip. Replace the --sort-ips switch of rwaddrcount with --sort-output in rwuniq.

The --print-stat switch in rwaddrcount prints a one-line summary of the data:

 $ rwaddrcount --print-stat data.rw
           |  sIP_Uniq|         Bytes|    Packets|   Records|
      Total|     57727|     948620676|    2026581|    382578|

This is difficult to produce with rwuniq. If there is a field that you know is either empty or constant across all records (such as nhip or in), you can use that as the key field in rwuniq.

 $ rwuniq --fields=nhIP --values=distinct:sip,bytes,packets,flows data.rw
            nhIP|sIP-Distinct|        Bytes|   Packets|   Records|
         0.0.0.0|       57727|    948620676|   2026581|    382578|

Note that class generally does not work since each type within a class produces its own row:

 $ rwuniq --fields=class --values=distinct:sip,bytes,packets,flows data.rw
 class|sIP-Distinct|        Bytes|   Packets|   Records|
   all|        8674|    260143344|    964621|    151447|
   all|       55540|    688477332|   1061960|   6184399|

One trick is to use stime as the key with a very large --bin-time:

 $ rwuniq --fields=stime --bin-time=2147483647             \
        --values=distinct:sip,bytes,packets,flows data.rw
               sTime|sIP-Distinct|        Bytes|   Packets|   Records|
 1970/01/01T00:00:00|       57727|    948620676|   2026581|    382578|

Finally, you can use separate invocations of rwfilter(1), rwset(1), and rwsetcat(1):

 $ rwfilter --print-volume --all=stdout data.rw  \
   | rwset --sip=stdout                          \
   | rwsetcat --count-ips
      |      Recs|   Packets|        Bytes|     Files|
 Total|    382578|   2026581|    948620676|         1|
  Pass|    382578|   2026581|    948620676|          |
  Fail|         0|         0|            0|          |
 57727

rwaddrcount’s --print-ips switch prints the IP addresses as text:

 $ rwaddrcount --print-ips data.rw
             sIP
      10.0.0.144
   10.14.203.121
   10.14.203.122
      10.15.6.14
     12.0.101.22

A combination of rwset and rwsetcat is the best way to handle this:

 $ rwset --sip-file=stdout data.rw | rwsetcat --print-ips
 10.0.0.144
 10.14.203.121
 10.14.203.122
 10.15.6.14
 12.0.101.22

Alternatively, use rwuniq and the UNIX tool cut(1) to only print the first column:

 $ rwuniq --fields=sIP data.rw  \
   | cut -d ’|’ -f 1
             sIP
      10.0.0.144
   10.14.203.121
   10.14.203.122
      10.15.6.14
     12.0.101.22

rwaddrcount allows you to restrict the output to bins that have a certain minimum or maximum count of bytes, packets, or flows via --min-bytes, --max-bytes, --min-packets, --max-packets, --min-records, and --max-records:

 $ rwaddrcount --print-recs --min-byte=1024 --max-byte=2048 \
        --max-records=1 data.rw
           sIP|Bytes|Packets|Records|         Start_Time|           End_Time|
    10.0.0.144| 1646|      4|      1|2007/05/09T18:01:41|2007/05/09T18:01:41|
 10.14.203.121|   40|      1|      1|2007/05/09T18:31:54|2007/05/09T18:31:54|
 10.14.203.122|   40|      1|      1|2007/05/09T18:32:43|2007/05/09T18:32:43|

rwuniq supports the same operations using the --bytes, --packets, and --flows switches, each of which allows you to define a desired minimum and maximum value.

 $ rwuniq --fields=sip --values=bytes,packets,records,stime,etime \
        --bytes=1024-2048 --flows=1-1 data.rw
          sIP|Bytes|Packets|Records|          min_sTime|          max_eTime|
    10.0.0.144| 1646|      4|      1|2007/05/09T18:01:41|2007/05/09T18:01:41|
 10.14.203.121|   40|      1|      1|2007/05/09T18:31:54|2007/05/09T18:31:54|
 10.14.203.122|   40|      1|      1|2007/05/09T18:32:43|2007/05/09T18:32:43|

ENVIRONMENT

SILK_IP_FORMAT

This environment variable is used as the value for --ip-format when that switch is not provided. Since SiLK 3.11.0.

SILK_TIMESTAMP_FORMAT

This environment variable is used as the value for --timestamp-format when that switch is not provided. Since SiLK 3.11.0.

SILK_PAGER

When set to a non-empty string, rwaddrcount automatically invokes this program to display its output a screen at a time. If set to an empty string, rwaddrcount does not automatically page its output.

PAGER

When set and SILK_PAGER is not set, rwaddrcount automatically invokes this program to display its output a screen at a time.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwaddrcount may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwaddrcount may use this environment variable. See the FILES section for details.

TZ

When the argument to the --timestamp-format switch includes local or when a SiLK installation is built to use the local timezone, the value of the TZ environment variable determines the timezone in which rwaddrcount displays timestamps. (If both of those are false, the TZ environment variable is ignored.) If the TZ environment variable is not set, the machine’s default timezone is used. Setting TZ to the empty string or 0 causes timestamps to be displayed in UTC. For system information on the TZ variable, see tzset(3) or environ(7). (To determine if SiLK was built with support for the local timezone, check the Timezone support value in the output of rwaddrcount --version.)

FILES

${SILK_CONFIG_FILE}

${SILK_DATA_ROOTDIR}/silk.conf

/data/silk.conf

${SILK_PATH}/share/silk/silk.conf

${SILK_PATH}/share/silk.conf

/usr/local/share/silk/silk.conf

/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwset(1), rwsetcat(1), rwstats(1), rwtotal(1), rwuniq(1), silk(7), tzset(3), environ(7)

NOTES

rwaddrcount only supports IPv4 addresses, and it will not be modified to support IPv6 addresses. To produce output similar to rwaddrcount for IPv6 addresses, use rwuniq(1):

 rwuniq --fields=sip --values=bytes,packets,records,stime,etime

When used in an IPv6 environment, rwaddrcount converts IPv6 flow records that contain addresses in the ::ffff:0:0/96 prefix to IPv4 and processes them. IPv6 records having addresses outside of that prefix are ignored.

rwaddrcount uses a fairly large hashtable to store data, but it is likely that as the amount of data expands, the application will take more time to process data.

Similar binning of records are produced by rwstats(1), rwtotal(1), and rwuniq(1).

To generate a list of IP addresses without the volume information, use rwset(1).

rwaggbag

Build a binary Aggregate Bag from SiLK Flow records

SYNOPSIS

  rwaggbag --keys=KEY --counters=COUNTER
        [--note-strip] [--note-add=TEXT] [--note-file-add=FILE]
        [--invocation-strip] [--print-filenames] [--copy-input=PATH]
        [--compression-method=COMP_METHOD]
        [--ipv6-policy={ignore,asv4,mix,force,only}]
        [--output-path=PATH]
        [--site-config-file=FILENAME]
        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}

  rwaggbag --help

  rwaggbag --help-fields

  rwaggbag --version

DESCRIPTION

rwaggbag reads SiLK Flow records and builds an Aggregate Bag file. To build an Aggregate Bag from textual input, use rwaggbagbuild(1).

An Aggregate Bag is a binary file that maps a key to a counter, where the key and the counter are both composed of one or more fields. For example, an Aggregate Bag could contain the sum of the packet count and the sum of the byte count for each unique source IP and source port pair.

For each SiLK flow record rwaggbag reads, it extracts the values of the fields listed in the --keys switch, combines those fields into a key, searches for an existing bin that has that key and creates a new bin for that key if none is found, and adds the values for each of the fields listed in the --counters switch to the bin’s counter. Both the --keys and --counters switches are required.

rwaggbag reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwaggbag reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.

If rwaggbag runs out of memory, it will exit immediately. The output Aggregate Bag file remains behind with a size of 0 bytes.

To print the contents of an Aggregate Bag as text, use rwaggbagcat(1). The rwaggbagbuild(1) tool can create an Aggregate Bag from textual input. rwaggbagtool(1) allows you to manipulate binary Aggregate Bag files.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--keys=KEY

Create a key for binning flow records using the values of the comma-separated field(s) listed in KEY. The field names are case-insensitive, a name may be abbreviated to its shortest unique prefix, and a name may only be used one time. The list of available KEY fields are

sIPv4

source IP address when IPv4

sIPv6

source IP address when IPv6

dIPv4

destination IP address when IPv4

dIPv6

destination IP address when IPv6

sPort

source port for TCP or UDP, or equivalent

dPort

destination port for TCP or UDP, or equivalent

protocol

IP protocol

packets

count of packets recorded for this flow record

bytes

count of bytes recorded for this flow record

flags

bit-wise OR of TCP flags over all packets in the flow

sTime

starting time of the flow, in seconds resolution

duration

duration of the flow, in seconds resolution

eTime

ending time of the flow, in seconds resolution

sensor

numeric ID of the sensor where the flow was collected

input

router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))

output

router SNMP output interface or postVlanId

nhIPv4

router next hop IP address when IPv4

nhIPv6

router next hop IP address when IPv6

initialFlags

TCP flags on first packet in the flow as reported by yaf(1)

sessionFlags

bit-wise OR of TCP flags over all packets in the flow except the first as reported by yaf

attributes

flow attributes set by the flow generator

application

content of the flow as reported in the applabel field of yaf

class

class of the sensor at the collection point

type

type of the sensor at the collection point

icmpType

ICMP type value for ICMP and ICMPv6 flows, 0 otherwise

icmpCode

ICMP code value for ICMP and ICMPv6 flows, 0 otherwise

scc

the country code of the source IP address. Uses the mapping file specified by the SILK_COUNTRY_CODES environment variable or the country_codes.pmap mapping file, as described in FILES. (See also ccfilter(3).) Since SiLK 3.19.0.

dcc

the country code of the destination IP address. See scc. Since SiLK 3.19.0.

--counters=COUNTER

Add to the bin determined by the fields in --key the values of the comma-separated field(s) listed in COUNTER. The field names are case-insensitive, a name may be abbreviated to its shortest unique prefix, and a name may only be used one time. The list of available COUNTER fields are

records

count of the number of flow records that match the key

sum-packets

the sum of the packet counts for flow records that match the key

sum-bytes

the sum of the byte counts for flow records that match the key

sum-duration

the sum of the durations (in seconds) for flow records that match the key

--note-strip

Do not copy the notes (annotations) from the input file(s) to the output file. When this switch is not specified, notes from the input file(s) are copied to the output.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--invocation-strip

Do not record any command line history: do not copy the invocation history from the input files to the output file(s), and do not record the current command line invocation in the output. The invocation may be viewed with rwfileinfo(1).

--print-filenames

Print to the standard error the names of input files as they are opened.

--copy-input=PATH

Copy all binary SiLK Flow records read as input to the specified file or named pipe. PATH may be stdout or - to write flows to the standard output as long as the --output-path switch is specified to redirect rwaggbag’s output to a different location.

--output-path=PATH

Write the binary Aggregate Bag output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwaggbag exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwaggbag to exit with an error.

--ipv6-policy=POLICY

Determine how IPv4 and IPv6 flows are handled when SiLK has been compiled with IPv6 support. When the switch is not provided, the SILK_IPV6_POLICY environment variable is checked for a policy. If it is also unset or contains an invalid policy, the POLICY is mix. When SiLK has not been compiled with IPv6 support, IPv6 flows are always ignored, regardless of the value passed to this switch or in the SILK_IPV6_POLICY variable. The supported values for POLICY are:

ignore

Ignore any flow record marked as IPv6, regardless of the IP addresses it contains. Only IP addresses contained in IPv4 flow records will be added to the Aggregate Bag.

asv4

Convert IPv6 flow records that contain addresses in the ::ffff:0:0/96 netblock (that is, IPv4-mapped IPv6 addresses) to IPv4 and ignore all other IPv6 flow records.

mix

Process the input as a mixture of IPv4 and IPv6 flow records. When creating a bag whose key is an IP address and the input contains IPv6 addresses outside of the ::ffff:0:0/96 netblock, this policy is equivalent to force; otherwise it is equivalent to asv4.

force

Convert IPv4 flow records to IPv6, mapping the IPv4 addresses into the ::ffff:0:0/96 netblock.

only

Process only flow records that are marked as IPv6. Only IP addresses contained in IPv6 flow records will be added to the Aggregate Bag.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwaggbag searches for the site configuration file in the locations specified in the FILES section.

--xargs

--xargs=FILENAME

Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwaggbag opens each named file in turn and reads records from it as if the filenames had been listed on the command line.

--help

Print the available options and exit.

--help-fields

Print the names and descriptions of the keys and counters that may be used in the --keys and --counters switches and exit. Since SiLK 3.22.0.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

To create an Aggregate Bag that sums the packet count for destination IPs addresses in the SiLK Flow file data.rw:

 $ rwaggbag --key=dipv6 --counter=sum-packets data.rw   \
   | rwaggbagcat

To sum the number of records, packet count, and byte count for all flow records

 $ rwaggbag --key=dport --counter=records,sum-packets,sum-bytes    \
        --output-path=dport.aggbag data.rw

To count the number of records seen for each unique source port, destination port, and protocol:

 $ rwaggbag --key=sport,dport,proto --counter=records data.rw   \
   | rwaggbagcat

ENVIRONMENT

SILK_COUNTRY_CODES

This environment variable allows the user to specify the country code mapping file that rwaggbag uses when mapping an IP to a country for the scc and dcc keys. The value may be a complete path or a file relative to the SILK_PATH. See the FILES section for standard locations of this file.

SILK_IPV6_POLICY

This environment variable is used as the value for --ipv6-policy when that switch is not provided.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwaggbag may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwaggbag may use this environment variable. See the FILES section for details.

FILES

${SILK_CONFIG_FILE}

${SILK_DATA_ROOTDIR}/silk.conf

/data/silk.conf

${SILK_PATH}/share/silk/silk.conf

${SILK_PATH}/share/silk.conf

/usr/local/share/silk/silk.conf

/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

$SILK_COUNTRY_CODES

$SILK_PATH/share/silk/country_codes.pmap

$SILK_PATH/share/country_codes.pmap

/usr/local/share/silk/country_codes.pmap

/usr/local/share/country_codes.pmap

Possible locations for the country code mapping file required by the scc and dcc keys.

NOTES

rwaggbag and the other Aggregate Bag tools were introduced in SiLK 3.15.0.

SEE ALSO

rwaggbagbuild(1), rwaggbagcat(1), rwaggbagtool(1), rwbag(1), rwfileinfo(1), rwfilter(1), rwnetmask(1), rwset(1), rwuniq(1), ccfilter(3), sensor.conf(5), silk(7), yaf(1), zlib(3)

rwaggbagbuild

Create a binary aggregate bag from non-flow data

SYNOPSIS

  rwaggbagbuild [--fields=FIELDS]
        [--constant-field=FIELD=VALUE [--constant-field=FIELD=VALUE...]]
        [--column-separator=CHAR] [--no-titles]
        [--bad-input-lines=FILE] [--verbose] [--stop-on-error]
        [--note-add=TEXT] [--note-file-add=FILE]
        [--invocation-strip] [--compression-method=COMP_METHOD]
        [--output-path=PATH] [--site-config-file=FILENAME]
        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE...]]}

  rwaggbagbuild --help

  rwaggbagbuild --help-fields

  rwaggbagbuild --version

DESCRIPTION

rwaggbagbuild builds a binary Aggregate Bag file by reading one or more files containing textual input. To build an Aggregate Bag from SiLK Flow records, use rwaggbag(1).

An Aggregate Bag is a binary file that maps a key to a counter, where the key and the counter are both composed of one or more fields. For example, an Aggregate Bag could contain the sum of the packet count and the sum of the byte count for each unique source IP and source port pair.

rwaggbagbuild reads its input from the files named on the command line or from the standard input when no file names are specified, when --xargs is not present, and when the standard input is not a terminal. To read the standard input in addition to the named files, use - or stdin as a file name. When the --xargs switch is provided, rwaggbagbuild reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.

The new Aggregate Bag file is written to the location specified by the --output-path switch. If it is not provided, output is sent to the standard output when it is not connected to a terminal.

The Aggregate Bag file must have at least one field that it considers and key field and at least one field that it considers a counter field. See the description of the --fields switch.

In general (and as detailed below), each line of the text input files becomes one entry in the Aggregate Bag file. It is also possible to specify that each entry in the Aggregate Bag file contains additional fields, each with a specific value. These fields are specified by the --constant-field switch whose argument is a field name, an equals sign (’=’), and a textual representation of a value. The named field becomes one of the key or counter fields in the Aggregate Bag file, and that field is given the specified value for each entry that is read from an input file. See the --fields switch in the OPTIONS section for the names of the fields and the acceptable forms of the textual input for each field.

The remainder of this section details how rwaggbagbuild processes each text input file to create an Aggregate Bag file.

When the --fields switch is specified, its argument specifies the key and counter fields that the new Aggregate Bag file is to contain. If --fields is not specified, the first line of the first input file is expected to contain field names, and those names determine the Aggregate Bag’s key and counter. A field name of ignore causes rwaggbagbuild to ignore the values in that field when parsing the input.

The textual input is processed one line at a time. Comments begin with a ’#’-character and continue to the end of the line; they are stripped from each line. After removing the comments, any line that is blank or contains only whitespace is ignored.

All other lines must contain valid input, which is a set of fields separated by a delimiter. The default delimiter is the virtual bar (’|’) and may be changed with the --column-separator switch. Whitespace around a delimiter is allowed; however, using space or tab as the separator causes each space or tab character to be treated as a field delimiter. The newline character is not a valid delimiter character since it is used to denote records, and ’#’ is not a valid delimiter since it begins a comment.

The first line of each input file may contain delimiter-separated field names denoting in which order the fields appear in this input file. As mentioned above, when the --fields switch is not given, the first line of the first file determines the Aggregate Bag’s key and counter. To tell rwaggbagbuild to treat the first line of each file as field values to be parsed, specify the --no-titles switch.

Every other line must contain delimiter-separated field values. A delimiter may follow the final field on a line. rwaggbagbuild ignores lines that contain either too few or too many fields.

See the description of the --fields switch in the OPTIONS section for the names of the fields and the acceptable forms of the textual input for each field.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--fields=FIELDS

Specify the fields in the input files. FIELDS is a comma separated list of field names. Field names are case-insensitive, and a name may be abbreviated to the shortest unique prefix. Other than the ignore field, a field name may not be specified more than once. The Aggregate Bag file must have at least one key field and at least one counter field.

The names of the fields that are considered key fields, their descriptions, and the format of the input that each expects are:

ignore

field that rwaggbagbuild is to skip

sIPv4

source IP address, IPv4 only; either the canonical dotted-quad format or an integer from 0 to 4294967295 inclusive

dIPv4

destination IP address, IPv4 only; uses the same format as sIPv4

nhIPv4

next hop IP address, IPv4 only; uses the same format as sIPv4

any-IPv4

a generic IPv4 address; uses the same format as sIPv4

sIPv6

source IP address, IPv6 only; the canonical hex-encoded format for IPv6 addresses

dIPv6

destination IP address, IPv6 only; uses the same format as sIPv6

nhIPv6

next hop IP address, IPv6 only; uses the same format as sIPv6

any-IPv6

a generic IPv6 address; uses the same format as sIPv6

sPort

source port; an integer from 0 to 65535 inclusive

dPort

destination port; an integer from 0 to 65535 inclusive

any-port

a generic port; an integer from 0 to 65535 inclusive

protocol

IP protocol; an integer from 0 to 255 inclusive

packets

packet count; an integer from 1 to 4294967295 inclusive

bytes

byte count; an integer from 1 to 4294967295 inclusive

flags

bit-wise OR of TCP flags over all packets; a string containing F, S, R, P, A, U, E, C in upper- or lowercase

initialFlags

TCP flags on the first packet; uses the same form as flags

sessionFlags

bit-wise OR of TCP flags on the second through final packet; uses the same form as flags

sTime

starting time in seconds; uses the form YYYY/MM/DD[:hh[:mm[:ss[.sss]]]] (any fractional seconds value is dropped). A T may be used in place of : to separate the day and hour fields. A floating point value between 536870912 and 2147483647 is also allowed and is treated as seconds since the UNIX epoch.

eTime

ending time in seconds; uses the same format as sTime

any-time

a generic time in seconds; uses the same format as sTime

duration

duration of flow; a floating point value from 0.0 to 4294967.295

sensor

sensor name or ID at the collection point; a string as given in silk.conf(5)

class

class at collection point; a string as given in silk.conf

type

type at collection point; a string as given in silk.conf

input

router SNMP ingress interface or vlanId; an integer from 0 to 65535

output

router SNMP egress interface or postVlanId; an integer from 0 to 65535

any-snmp

a generic SNMP value; an integer from 0 to 65535

attribute

flow attributes set by the flow generator:

S

all the packets in this flow record are exactly the same size

F

flow generator saw additional packets in this flow following a packet with a FIN flag (excluding ACK packets)

T

flow generator prematurely created a record for a long-running connection due to a timeout or a byte-count threshold

C

flow generator created a record as a continuation of a previous record for a connection that exceeded a timeout or byte-count threshold

application

guess as to the content of the flow; as an integer from 0 to 65535

icmpType

ICMP type; an integer from 0 to 255 inclusive

icmpCode

ICMP code; an integer from 0 to 255 inclusive

scc

the country code of the source; accepts a two character string to use as the country of the source IP. The code is not checked for validity against the country_codes.pmap file. The code must be ASCII and it may contain two letters, a letter followed by a number, or the string --. Since SiLK 3.19.0.

dcc

the country code of the destination. See scc. Since SiLK 3.19.0.

any-cc

a generic country code. See scc. Since SiLK 3.19.0.

custom-key

a generic key; an integer from 0 to 4294967295 inclusive

The names and descriptions of the fields that are considered counter fields are listed next. For each, the type of input is an unsigned 64-bit number; that is, an integer from 0 to 18446744073709551615.

records

count of records that match the key

sum-packets

sum of packet counts

sum-bytes

sum of byte counts

sum-duration

sum of duration values

custom-counter

a generic counter

--constant-field=FIELD=VALUE

For each entry (row) read from the input file(s), insert or replace a field named FIELD and set its value to VALUE. VALUE is a textual representation of the field’s value as described in the description of the --fields switch above. When FIELD is a counter field and the same key appears multiple times in the input, VALUE is added to the counter multiple times. If a field named FIELD appears in an input file, its value from that file is ignored. Specify the --constant-field switch multiple times to insert multiple fields.

--column-separator=CHAR

When reading textual input, use the character CHAR as the delimiter between columns (fields) in the input. The default column separator is the vertical pipe (’|’). rwaggbagbuild normally ignores whitespace (space and tab) around the column separator; however, using space or tab as the separator causes each space or tab character to be treated as a field delimiter. The newline character is not a valid delimiter character since it is used to denote records, and ’#’ is not a valid delimiter since it begins a comment.

--bad-input-lines=FILEPATH

When parsing textual input, copy any lines than cannot be parsed to FILEPATH. The strings stdout and stderr may be used for the standard output and standard error, respectively. Each bad line is prepended by the name of the source input file, a colon, the line number, and a colon. On exit, rwaggbagbuild removes FILEPATH if all input lines were successfully parsed.

--verbose

When a textual input line fails to parse, print a message to the standard error describing the problem. When this switch is not specified, parsing failures are not reported. rwaggbagbuild continues to process the input after printing the message. To stop processing when a parsing error occurs, use --stop-on-error.

--stop-on-error

When a textual input line fails to parse, print a message to the standard error describing the problem and exit the program. When this occurs, the output file contains any records successfully created prior to reading the bad input line. The default behavior of rwaggbagbuild is to silently ignore parsing errors. To report parsing errors and continue processing the input, use --verbose.

--no-titles

Parse the first line of the input as field values. Normally when the --fields switch is specified, rwaggbagbuild examines the first line to determine if the line contains the names (titles) of fields and skips the line if it does. rwaggbagbuild exits with an error when --no-titles is given but --fields is not.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--invocation-strip

Do not record the command used to create the Aggregate Bag file in the output. When this switch is not given, the invocation is written to the file’s header, and the invocation may be viewed with rwfileinfo(1).

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--output-path=PATH

Write the binary Aggregate Bag output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwaggbagbuild exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwaggbagbuild to exit with an error.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwaggbagbuild searches for the site configuration file in the locations specified in the FILES section.

--xargs

--xargs=FILENAME

Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwaggbagbuild opens each named file in turn and reads text from it as if the filenames had been listed on the command line.

--help

Print the available options and exit.

--help-fields

Print the names and descriptions of the keys and counters that may be used in the --fields and --constant-field switches and exit. Since SiLK 3.22.0.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

Assume the following textual data in the file rec.txt:

             dIP|dPort|   packets|     bytes|
   10.245.15.175|   80|       127|     12862|
 192.168.251.186|29222|       131|    351213|
  10.247.186.130|   80|       596|     38941|
 192.168.239.224|29362|       600|    404478|
 192.168.215.219|   80|       400|     32375|
   10.255.252.19|28925|       404|   1052274|
 192.168.255.249|   80|       112|      7412|
    10.208.7.238|29246|       109|    112977|
 192.168.254.127|   80|       111|      9759|
   10.218.34.108|29700|       114|    461845|

To create an Aggregate Bag file from this data, provide the --fields switch with the names used by the Aggregate Bag tools:

 $ rwaggbagbuild --fields=dipv4,dport,sum-packets,sum-bytes  \
        --output-path=ab.aggbag rec.txt

Use the rwaggbagcat(1) tool to view it:

 $ rwaggbagcat ab.aggbag
           dIPv4|dPort|    sum-packets|           sum-bytes|
    10.208.7.238|29246|            109|              112977|
   10.218.34.108|29700|            114|              461845|
   10.245.15.175|   80|            127|               12862|
  10.247.186.130|   80|            596|               38941|
   10.255.252.19|28925|            404|             1052274|
 192.168.215.219|   80|            400|               32375|
 192.168.239.224|29362|            600|              404478|
 192.168.251.186|29222|            131|              351213|
 192.168.254.127|   80|            111|                9759|
 192.168.255.249|   80|            112|                7412|

Create an Aggregate Bag from the destination port field and count the number of times each port appears, ignore all fields except the dPort fields and use --constant-field to add a new field:

 $ rwaggbagbuild --fields=ignore,dport,ignore,ignore  \
        --constant-field=record=1                     \
   | rwaggbagcat
 dPort|   records|
    80|         5|
 28925|         1|
 29222|         1|
 29246|         1|
 29362|         1|
 29700|         1|

Alternatively, use rwaggbagtool(1) to get the same information from the ab.aggbag file created above:

 $ rwaggbagtool --select-fields=dport        \
        --insert-field=record=1 ab.aggbag    \
   | rwaggbagcat
 dPort|   records|
    80|         5|
 28925|         1|
 29222|         1|
 29246|         1|
 29362|         1|
 29700|         1|

ENVIRONMENT

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwaggbagbuild may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwaggbagbuild may use this environment variable. See the FILES section for details.

FILES

${SILK_CONFIG_FILE}

${SILK_DATA_ROOTDIR}/silk.conf

/data/silk.conf

${SILK_PATH}/share/silk/silk.conf

${SILK_PATH}/share/silk.conf

/usr/local/share/silk/silk.conf

/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwaggbag(1), rwaggbagcat(1), rwaggbagtool(1), rwfileinfo(1), rwset(1), rwsetbuild(1), rwsetcat(1), rwsettool(1), ccfilter(3), silk.conf(5), silk(7), zlib(3)

NOTES

rwaggbagbuild and the other Aggregate Bag tools were introduced in SiLK 3.15.0.

rwaggbagcat

Output a binary Aggregate Bag file as text

SYNOPSIS

  rwaggbagcat [--fields=FIELDS
        [--missing-field=FIELD=STRING [--missing-field=FIELD=STRING...]]]
        [--timestamp-format=FORMAT] [--ip-format=FORMAT]
        [--integer-sensors] [--integer-tcp-flags]
        [--no-titles] [--no-columns] [--column-separator=C]
        [--no-final-delimiter] [{--delimited | --delimited=C}]
        [--output-path=PATH] [--pager=PAGER_PROG]
        [--site-config-file=FILENAME]
        [AGGBAGFILE [AGGBAGFILE...]]

  rwaggbagcat --help

  rwaggbagcat --help-fields

  rwaggbagcat --version

DESCRIPTION

rwaggbagcat reads a binary Aggregate Bag as created by rwaggbag(1) or rwaggbagbuild(1), converts it to text, and outputs it to the standard output, the pager, or the specified file.

As of SiLK 3.22.0, rwaggbagcat accepts a --fields switch to control the order in which the fields are printed.

rwaggbagcat reads the AGGBAGFILEs specified on the command line; if no AGGBAGFILE arguments are given, rwaggbagcat attempts to read an Aggregate Bag from the standard input. To read the standard input in addition to the named files, use - or stdin as an AGGBAGFILE name. If any input does not contain an Aggregate Bag file, rwaggbagcat prints an error to the standard error and exits abnormally.

When multiple AGGBAGFILEs are specified on the command line, each is handled individually. To process the files as a single Aggregate Bag, use rwaggbagtool(1) to combine the Aggregate Bags and pipe the output of rwaggbagtool into rwaggbagcat. Using --fields in this situation allows for a consistent output across the multiple files and causes the titles to appear only once. No value is printed if --fields names a key or counter that is not present in one of the files.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--fields=FIELDS

Print only the key and/or counter fields given in this comma separated list. Fields are printed in the order given in FIELDS, and keys and counters may appear in any order or not at all. Specifying --fields only changes the order in which the columns are printed, it does not re-order the entries (rows) in the Aggregate Bag file. If FIELDS includes fields not present in an input Aggregate Bag file, prints the string specified for that field by --missing-field or an empty value. The title line is printed only one time even if multiple Aggregate Bag files are read.

The names of the fields that may appear in FIELDS are:

sIPv4

source IP address, IPv4 only

dIPv4

destination IP address, IPv4 only

nhIPv4

next hop IP address, IPv4 only

any-IPv4

a generic IPv4 address

sIPv6

source IP address, IPv6 only

dIPv6

destination IP address, IPv6 only

nhIPv6

next hop IP address, IPv6 only

any-IPv6

a generic IPv6 address

sPort

source port

dPort

destination port

any-port

a generic port

protocol

IP protocol

packets

packet count

bytes

byte count

flags

bit-wise OR of TCP flags over all packets

initialFlags

TCP flags on the first packet

sessionFlags

bit-wise OR of TCP flags on the second through final packet

sTime

starting time in seconds

eTime

ending time in seconds

any-time

a generic time in seconds

duration

duration of flow

sensor

sensor name or ID at the collection point

class

class at collection point

type

type at collection point

input

router SNMP ingress interface or vlanId

output

router SNMP egress interface or postVlanId

any-snmp

a generic SNMP value

attribute

flow attributes set by the flow generator

application

guess as to the content of the flow

icmpType

ICMP type

icmpCode

ICMP code

scc

the country code of the source

dcc

the country code of the destination

any-cc

a generic country code

custom-key

a generic key

records

counter: count of records that match the key

sum-packets

counter: sum of packet counts

sum-bytes

counter: sum of byte counts

sum-duration

counter: sum of duration values

custom-counter

counter: a generic counter

Since SiLK 3.22.0.

--missing-field=FIELD=STRING

When --fields is active, print STRING as the value for FIELD when FIELD is not present in the input Aggregate Bag file. The default value is the empty string. The switch may be repeated to set the missing value string for multiple fields. rwaggbagcat exits with an error if FIELD is not present in --fields or if this switch is specified but --fields is not. STRING may be any string. Since SiLK 3.22.0.

--timestamp-format=FORMAT

Specify the format, timezone, and/or modifier to use when printing timestamps. When this switch is not specified, the SILK_TIMESTAMP_FORMAT environment variable is checked for a format, timezone, and modifier. If it is empty or contains invalid values, timestamps are printed in the default format, and the timezone is UTC unless SiLK was compiled with local timezone support. FORMAT is a comma-separated list of a format, a timezone, and/or a modifier. The format is one of:

default

Print the timestamps as YYYY /MM/DDThh:mm:ss.sss.

iso

Print the timestamps as YYYY -MM-DD hh:mm:ss.sss.

m/d/y

Print the timestamps as MM/DD/YYYY hh:mm:ss.sss.

epoch

Print the timestamps as the number of seconds since 00:00:00 UTC on 1970-01-01.

When a timezone is specified, it is used regardless of the default timezone support compiled into SiLK. The timezone is one of:

utc

Use Coordinated Universal Time to print timestamps.

local

Use the TZ environment variable or the local timezone.

--ip-format=FORMAT

Specify how IP addresses are printed, where FORMAT is a comma-separated list of the arguments described below. When this switch is not specified, the SILK_IP_FORMAT environment variable is checked for a value and that format is used if it is valid. The default FORMAT is canonical.

canonical

Print IP addresses in the canonical format. If the column is IPv4, use dot-separated decimal (192.0.2.1). If the column is IPv6, use colon-separated hexadecimal (2001:db8::1) or a mixed IPv4-IPv6 representation for IPv4-mapped IPv6 addresses (the ::ffff:0:0/96 netblock, e.g., ::ffff:192.0.2.1) and IPv4-compatible IPv6 addresses (the ::/96 netblock other than ::/127, e.g., ::192.0.2.1).

no-mixed

Print IP addresses in the canonical format (192.0.2.1 or 2001:db8::1) but do not used the mixed IPv4-IPv6 representations. For example, use ::ffff:c000:201 instead of ::ffff:192.0.2.1. Since SiLK 3.17.0.

decimal

Print IP addresses as integers in decimal format. For example, print 192.0.2.1 and 2001:db8::1 as 3221225985 and 42540766411282592856903984951653826561, respectively.

hexadecimal

Print IP addresses as integers in hexadecimal format. For example, print 192.0.2.1 and 2001:db8::1 as c00000201 and 20010db8000000000000000000000001, respectively.

zero-padded

Make all IP address strings contain the same number of characters by padding numbers with leading zeros. For example, print 192.0.2.1 and 2001:db8::1 as 192.000.002.001 and 2001:0db8:0000:0000:0000:0000:0000:0001, respectively. For IPv6 addresses, this setting implies no-mixed, so that ::ffff:192.0.2.1 is printed as 0000:0000:0000:0000:0000:ffff:c000:0201. As of SiLK 3.17.0, may be combined with any of the above, including decimal and hexadecimal.

The following arguments modify certain IP addresses prior to printing. These arguments may be combined with the above formats.

map-v4

Change an IPv4 column to IPv4-mapped IPv6 addresses (addresses in the ::ffff:0:0/96 netblock) prior to formatting. Since SiLK 3.17.0.

unmap-v6

For an IPv6 column, change any IPv4-mapped IPv6 addresses (addresses in the ::ffff:0:0/96 netblock) to IPv4 addresses prior to formatting. Since SiLK 3.17.0.

The following argument is also available:

force-ipv6

Set FORMAT to map-v4,no-mixed.

--integer-sensors

Print the integer ID of the sensor rather than its name.

--integer-tcp-flags

Print the TCP flag fields (flags, initialFlags, sessionFlags) as an integer value. Typically, the characters F,S,R,P,A,U,E,C are used to represent the TCP flags.

--no-titles

Turn off column titles. By default, titles are printed.

--no-columns

Disable fixed-width columnar output.

--column-separator=C

Use specified character between columns and after the final column. When this switch is not specified, the default of ’|’ is used.

--no-final-delimiter

Do not print the column separator after the final column. Normally a delimiter is printed.

--delimited

--delimited=C

Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default ’|’.

--output-path=PATH

Write the textual output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output (and bypass the paging program). If PATH names an existing file, rwaggbagcat exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this option is not given, the output is either sent to the pager or written to the standard output.

--pager=PAGER_PROG

When output is to a terminal, invoke the program PAGER_PROG to view the output one screen full at a time. This switch overrides the SILK_PAGER environment variable, which in turn overrides the PAGER variable. If the --output-path switch is given or if the value of the pager is determined to be the empty string, no paging is performed and all output is written to the terminal.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwaggbagcat searches for the site configuration file in the locations specified in the FILES section.

--help

Print the available options and exit.

--help-fields

Print the names and descriptions of the keys and counters that may be used in the --fields and --missing-field switches and exit. Since SiLK 3.22.0.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

The formatting switches on rwaggbagcat are similar to those on the other SiLK tools.

Creating and printing an Aggregate Bag file

First, use rwaggbag(1) to create an Aggregate Bag file from the SiLK Flow file data.rw:

 $ rwaggbag --key=sport,dport --counter=sum-pack,sum-byte \
        --output-path=ab.aggbag data.rw

To print the contents of the Aggregate Bag file:

 $ rwaggbagcat ab.aggbag | head -4
 sPort|dPort|    sum-packets|           sum-bytes|
     0|    0|          73452|             6169968|
     0|  769|          15052|              842912|
     0|  771|          14176|              793856|

Reordering the columns

Use the --fields switch (added in SiLK 3.22.0) to control the order of the columns in the output or to select only some columns:

 $ rwaggbagcat --fields=dPort,sPort,sum-bytes ab.aggbag | head -4
 dPort|sPort|           sum-bytes|
     0|    0|             6169968|
   769|    0|              842912|
   771|    0|              793856|

The --fields switch only changes the positions of the columns. The sPort field is still the primary key in the output shown above.

The --fields switch may also include fields that are not in the input. By default, rwaggbagcat prints an empty value for those fields, but the --missing-field switch may be used to display any string instead. The argument to --missing-field is FIELD=STRING where FIELD is one of the fields in --fields.

 $ rwaggbagcat --fields=sipv4,proto,dport,sum-bytes \
        --missing=sipv4=n/a ab.aggbag | head -4
          sIPv4|pro|dPort|           sum-bytes|
            n/a|   |    0|             6169968|
            n/a|   |  769|              842912|
            n/a|   |  771|              793856|

Using --fields with IP addresses

When creating an Aggregate Bag file with the source IP address and protocol as keys, rwaggbagcat prints the columns in a different order depending on whether the address is treated as IPv4 or IPv6.

When the key is the source IPv4 address and the protocol, the Aggregate Bag is built with the source address as the primary key:

 $ rwaggbag --key=sipv4,proto --counter=records data.rw         \
   | rwaggbagcat
          sIPv4|pro|   records|
    10.4.52.235|  6|         1|
   10.5.231.251|  6|         1|
    10.9.77.117|  6|         1|

Reading the same file but treating the data as IPv6 results in the protocol being the primary key:

 $ rwaggbag --key=sipv6,proto --counter=records data.rw         \
   | rwaggbagcat
 pro|                                  sIPv6|   records|
   1|                   ::ffff:10.40.151.242|         1|
   1|                   ::ffff:10.44.140.138|         1|
   1|                    ::ffff:10.53.204.62|         1|

In the latter case, the --fields may be used to display the source IPv6 address first, but the switch only changes the positions of the columns, it does not reorder the entries (rows):

 $ rwaggbag --key=sipv6,proto --counter=records data.rw         \
   | rwaggbagcat --fields=sipv6,proto,records
                                  sIPv6|pro|   records|
                   ::ffff:10.40.151.242|  1|         1|
                   ::ffff:10.44.140.138|  1|         1|
                    ::ffff:10.53.204.62|  1|         1|

Removing the columns or the title from the output

To produce comma separated data:

 rwaggbagcat --delimited=, /tmp/ab.aggbag | head -4
 sPort,dPort,sum-packets,sum-bytes
 0,0,73452,6169968
 0,769,15052,842912
 0,771,14176,793856

To remove the title:

 $ rwaggbagcat --no-title ab.aggbag | head -4
     0|    0|          73452|             6169968|
     0|  769|          15052|              842912|
     0|  771|          14176|              793856|
     0| 2048|          14356|             1205904|

Customizing the IP and timestamp format

To change the format of IP addresses:

 $ rwaggbag --key=sipv4,dipv4 --counter=sum-pack,sum-byte data.rw   \
   | rwaggbagcat --ip-format=decimal | head -4
      sIPv4|     dIPv4|    sum-packets|           sum-bytes|
  168047851|3232295339|            255|               18260|
  168159227|3232293505|            331|              536169|
  168381813|3232282689|            563|               55386|

To change the format of timestamps:

 $ rwaggbag --key=stime,etime --counter=sum-pack,sum-byte data.rw   \
   | rwaggbagcat --timestamp-format=epoch | head -4
      sTime|     eTime|    sum-packets|           sum-bytes|
 1234396802|1234396802|              2|                 259|
 1234396802|1234398594|            526|               38736|
 1234396803|1234396803|              9|                 504|

ENVIRONMENT

SILK_IP_FORMAT

This environment variable is used as the value for --ip-format when that switch is not provided.

SILK_TIMESTAMP_FORMAT

This environment variable is used as the value for --timestamp-format when that switch is not provided.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_PAGER

When set to a non-empty string, rwaggbagcat automatically invokes this program to display its output a screen at a time. If set to an empty string, rwaggbagcat does not automatically page its output.

PAGER

When set and SILK_PAGER is not set, rwaggbagcat automatically invokes this program to display its output a screen at a time.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwaggbagcat may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files and plug-ins, rwaggbagcat may use this environment variable. See the FILES section for details.

TZ

When the argument to the --timestamp-format switch includes local or when a SiLK installation is built to use the local timezone, the value of the TZ environment variable determines the timezone in which rwaggbagcat displays timestamps. (If both of those are false, the TZ environment variable is ignored.) If the TZ environment variable is not set, the machine’s default timezone is used. Setting TZ to the empty string or 0 causes timestamps to be displayed in UTC. For system information on the TZ variable, see tzset(3) or environ(7). (To determine if SiLK was built with support for the local timezone, check the Timezone support value in the output of rwaggbagcat --version.)

FILES

${SILK_CONFIG_FILE}

${SILK_DATA_ROOTDIR}/silk.conf

/data/silk.conf

${SILK_PATH}/share/silk/silk.conf

${SILK_PATH}/share/silk.conf

/usr/local/share/silk/silk.conf

/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

NOTES

The --fields, --missing-field, and --help-fields switches were added in SiLK 3.22.0.

rwaggbagcat and the other Aggregate Bag tools were introduced in SiLK 3.15.0.

SEE ALSO

rwaggbag(1), rwaggbagbuild(1), rwaggbagtool(1), silk(7), tzset(3), environ(7)

rwaggbagtool

Manipulate binary Aggregate Bag files

SYNOPSIS

  rwaggbagtool
        [{ --remove-fields=REMOVE_LIST | --select-fields=SELECT_LIST
           | --to-bag=BAG_KEY,BAG_COUNTER
           | --to-ipset=FIELD [--ipset-record-version=VERSION] }]
        [--insert-field=FIELD=VALUE [--insert-field=FIELD2=VALUE2...]]
        [{ --add | --subtract | --divide }]
        [--zero-divisor-result={error | remove | maximum | VALUE}]
        [--scalar-multiply={VALUE | FIELD=VALUE}
          [--scalar-multiply={VALUE | FIELD=VALUE}...]]
        [--min-field=FIELD=VALUE [--min-field=FIELD=VALUE...]]
        [--max-field=FIELD=VALUE [--max-field=FIELD=VALUE...]]
        [--set-intersect=FIELD=FILE [--set-intersect=FIELD=FILE...]]
        [--set-complement=FIELD=FILE [--set-complement=FIELD=FILE...]]
        [--output-path=PATH [--modify-inplace [--backup-path=BACKUP]]]
        [--note-strip] [--note-add=TEXT] [--note-file-add=FILE]
        [--compression-method=COMP_METHOD]
        [--site-config-file=FILENAME]
        [AGGBAG_FILE [AGGBAG_FILE ...]]

  rwaggbagtool --help

  rwaggbagtool --help-fields

  rwaggbagtool --version

DESCRIPTION

rwaggbagtool performs operations on one or more Aggregate Bag files and creates a new Aggregate Bag file, a new Bag file, or an new IPset file. An Aggregate Bag is a binary file that maps a key to a counter, where the key and the counter are both composed of one or more fields. rwaggbag(1) and rwaggbagbuild(1) are the primary tools used to create an Aggregate Bag file. rwaggbagcat(1) prints a binary Aggregate Bag file as text.

The operations that rwaggbagtool supports are field manipulation (inserting or removing keys or counters), adding, subtracting, and dividing counters (all files must have the same keys and counters) across multiple Aggregate Bag files, multiplying all counters or only selected counters by a value, intersecting with an IPset, selecting rows based on minimum and maximum values of keys and counters, and creating a new IPset or Bag file.

rwaggbagtool processes the Aggregate Bag files listed on the command line. When no file names are specified, rwaggbagtool attempts to read an Aggregate Bag from the standard input. To read the standard input in addition to the named files, use - or stdin as a file name. If any input is not an Aggregate Bag file, rwaggbagtool prints an error to the standard error and exits with an error status.

By default, rwaggbagtool’s output is written to the standard output. Use --output-path to specify a different location. As of SiLK 3.21.0, rwaggbagtool supports the --modify-inplace switch which correctly handles the case when an input file is also used as the output file. That switch causes rwaggbagtool to write the output to a temporary file first and then replace the original output file. The --backup-path switch may be used in conjunction with --modify-inplace to set the pathname where the original output file is copied.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

The options are presented here in the order in which rwaggbagtool performs them: Field manipulation switches are applied to each file when it is read; multi-file operation switches combine the Aggregate Bags together; single-file operation switches are applied; filtering switches remove rows from the Aggregate Bag; the result is output as an Aggregate Bag, a standard Bag, or as an IPset.

Field manipulation switches

The following switches allow modification of the fields in the Aggregate Bag file. The --remove-fields and --select-fields switches are mutually exclusive, and they reduce the number of fields in the Aggregate Bag input files. Those switches also conflict with --to-ipset and --to-bag which resemble field selectors. The --insert-field switch is applied after --remove-fields or --select-fields, and it adds a field unless that field is already present.

--remove-fields=REMOVE_LIST

Remove the fields specified in REMOVE_LIST from each of the Aggregate Bag input files, where REMOVE_LIST is a comma-separated list of field names. This switch may include field names that are not in an Aggregate Bag input, and those field names are ignored. If a field name is included in this list and in a --insert-field switch, the field is given the value specified by the --insert-field switch, and the field is included in the output Aggregate Bag file. If removing a key field produces multiple copies of a key, the counters of those keys are merged. rwaggbagbuild exits with an error when this switch is used with --select-fields, --to-ipset, or --to-bag.

--select-fields=SELECT_LIST

For each Aggregate Bag input file, only use the fields in SELECT_LIST, a comma-separated list of field names. Alternatively, consider this switch as removing all fields that are not included in SELECT_LIST. This switch may include field names that are not in an Aggregate Bag input, and those field names are ignored. When a field name is included in this list and in a --insert-field switch, the field uses its value from the input Aggregate Bag file if present, and it uses the value specified in the --insert-field switch otherwise. If selecting only some key fields produces multiple copies of a key, the counters of those keys are merged. rwaggbagbuild exits with an error when this switch is used with --remove-fields, --to-ipset, or --to-bag.

--insert-field=FIELD=VALUE

For each entry read from an Aggregate Bag input file, insert a field named FIELD and set its value to VALUE if one of the following is true: (1)the input file does not contain a field named FIELD or (2)the input file does have a field named FIELD but it was removed by either (2a)being listed in the --remove-fields list or (2b)not being listed in the --select-fields list. That is, this switch only inserts FIELD when FIELD is not present in the input Aggregate Bag, but specifying FIELD in --remove-fields removes it from the input. VALUE is a textual representation of the field’s value as described in the description of the --fields switch in the rwaggbagbuild(1) tool. This switch may be repeated in order to insert multiple fields. If --to-ipset or --to-bag is specified, --insert-field may only name a field that is an argument to that switch.

Operations on multiple Aggregate Bag files

The following operations act on multiple Aggregate Bag files. These operations require all of the Aggregate Bag files to have the same set of key fields and counter fields. (Use the field manipulation switches to ensure this.) The values of the keys may differ, but the set of fields that comprise the key must match. It is an error if multiple operations are specified.

--add

Sum each of the counters for each key for all the Aggregate Bag input files. The keys in the result are the union of the set of keys that appear in all input files. Addition operations that overflow an unsigned 64-bit value are set to the maximum (18446744073709551615). If no other operation is specified, the add operation is the default.

--subtract

Subtract from the counters in the first Aggregate Bag file the counters in the second Aggregate Bag file, and repeat the process for each additional Aggregate Bag file. The keys in the result are a subset of the keys that appear in the first file: If a key does not appear in the first Aggregate Bag file, its counters are ignored in subsequent files. If a key does not appear in the second file, its counters in the first file are unchanged. Subtraction operations that result in a negative value are set to zero. If all counters for a key are zero, the key does not appear in the output.

--divide

Divide the counters in first Aggregate Bag file by the second Aggregate Bag file, and repeat the process for each additional Aggregate Bag file. The keys in the result are a subset of the keys that appear in the first file: If a key does not appear in the first Aggregate Bag file, its counters are ignored in subsequent files. If a key does not appear in the second file, its counters are treated as zero and the outcome is determined by the action specified by --zero-divisor-result. That option also determines the result when the two Aggregate Bag files have matching keys but a counter in the second bag is zero. If --zero-divisor-result is not given, rwaggbagtool exits with error if division by zero is detected. Since Aggregate Bag files do not support floating point numbers, the result of the division is rounded to the nearest integer (values ending in .5 are rounded up). Since SiLK 3.22.0.

While not an operation, the next switch is related to --divide and is described here.

--zero-divisor-result={ error | remove | maximum | VALUE }

Specify how to handle division by zero in the --divide operation, which can occur either because the first Aggregate Bag file (the dividend) contains a key that does not exist in the second file (the divisor) or because an individual counter in the divisor is zero. The supported arguments are:

error

Causes rwaggbagtool to exit with an error. This is the default when --zero-divisor-result is not given.

remove

Tells rwaggbagtool to remove this key from the output.

nochange

Tells rwaggbagtool to leave the individual counter in the first Aggregate Bag unchanged.

maximum

Sets the individual counter to the maximum value supported, which is the maximum unsigned 64-bit value (18446744073709551615).

VALUE

Sets the individual counter to VALUE, which can be any unsigned 64-bit value (0 to 18446744073709551615 inclusive).

This switch has no effect when --divide is not used. Since SiLK 3.22.0.

Counter operations

The following switch modifies the counters in an Aggregate Bag file. The operation may be combined with any of those from the previous section. This operation occurs after the above and before any filtering operation.

--scalar-multiply=VALUE

--scalar-multiply=FIELD=VALUE

Multiply all counter fields or one counter field by a value. If the argument is a positive integer value (1 or greater), multiply all counters by that value. If the argument contains an equals sign, treat the part to the left as a counter’s field name and the part to the right as the multiplier for that field: a non-negative integer value (0 or greater). The maximum VALUE is 18446744073709551615. This switch may be repeated; when a counter name is repeated or the all-counters form is repeated, the final multiplier is the product of all the values. Since SiLK 3.22.0.

Filtering switches

The following switches remove entries from the Aggregate Bag file based on a field’s value. These switches are applied immediately before the output is generated.

--min-field=FIELD=VALUE

Remove from the Aggregate Bag file all entries where the value of the field FIELD is less than VALUE, where VALUE is a textual representation of the field’s value as described in the description of the --fields switch in the rwaggbagbuild(1) tool. This switch is ignored if FIELD is not present in the Aggregate Bag. This switch may be repeated. Since SiLK 3.17.0.

--max-field=FIELD=VALUE

Remove from the Aggregate Bag file all entries where the value of the field FIELD is greater than VALUE, where VALUE is a textual representation of the field’s value as described in the description of the --fields switch in the rwaggbagbuild(1) tool. This switch is ignored if FIELD is not present in the Aggregate Bag. This switch may be repeated. Since SiLK 3.17.0.

--set-intersect=FIELD=SET_FILE

Read an IPset from the stream SET_FILE, and remove from the Aggregate Bag file all entries where the value of the field FIELD is not present in the IPset. SET_FILE may be the name a file or the string - or stdin to read the IPset from the standard input. This switch is ignored if FIELD is not present in the Aggregate Bag. This switch may be repeated. Since SiLK 3.17.0.

--set-complement=FIELD=SET_FILE

Read an IPset from the stream SET_FILE, and remove from the Aggregate Bag file all entries where the value of the field FIELD is present in the IPset. SET_FILE may be the name a file or the string - or stdin to read the IPset from the standard input. This switch is ignored if FIELD is not present in the Aggregate Bag. This switch may be repeated. Since SiLK 3.17.0.

Output switches

The following switches control the output.

--to-bag=BAG_KEY,BAG_COUNTER

After operating on the Aggregate Bag input files, create a (normal) Bag file from the resulting Aggregate Bag. Use the BAG_KEY field as the key of the Bag, and the BAG_COUNTER field as the counter of the Bag. Write the Bag to the standard output or the destination specified by --output-path. When this switch is used, the only legal field names that may be used in the --insert-field switch are BAG_KEY and BAG_COUNTER. rwaggbagbuild exits with an error when this switch is used with --remove-fields, --select-fields, or --to-ipset.

--to-ipset=FIELD

After operating on the Aggregate Bag input files, create an IPset file from the resulting Aggregate Bag by treating the values in the field named FIELD as IP addresses, inserting the IP addresses into the IPset, and writing the IPset to the standard output or the destination specified by --output-path. When this switch is used, the only legal field name that may be used in the --insert-field switch is FIELD. rwaggbagbuild exits with an error when this switch is used with --remove-fields, --select-fields, or --to-bag.

--ipset-record-version=VERSION

Specify the format of the IPset records that are written to the output when the --to-ipset switch is used. VERSION may be 2, 3, 4, 5 or the special value 0. When the switch is not provided, the SILK_IPSET_RECORD_VERSION environment variable is checked for a version. The default version is 0.

 0 

Use the default version for an IPv4 IPset and an IPv6 IPset. Use the --help switch to see the versions used for your SiLK installation.

 2 

Create a file that may hold only IPv4 addresses and is readable by all versions of SiLK.

 3 

Create a file that may hold IPv4 or IPv6 addresses and is readable by SiLK 3.0 and later.

 4 

Create a file that may hold IPv4 or IPv6 addresses and is readable by SiLK 3.7 and later. These files are more compact that version 3 and often more compact than version 2.

 5 

Create a file that may hold only IPv6 addresses and is readable by SiLK 3.14 and later. When this version is specified, IPsets containing only IPv4 addresses are written in version 4. These files are usually more compact that version 4.

--output-path=PATH

Write the resulting Aggregate Bag, IPset (see --to-ipset), or Bag (see --to-bag) to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwaggbagtool exits with an error unless the --modify-inplace switch is given or the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If --output-path is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwaggbagtool to exit with an error.

--modify-inplace

Allow rwaggbagtool to overwrite an existing file and properly account for the output file (PATH) also being an input file. When this switch is given, rwaggbagtool writes the output to a temporary location first, then overwrites PATH. rwaggbagtool attempts to copy the permission, owner, and group from the original file to the new file. The switch is ignored when PATH does not exist or the output is the standard output or standard error. rwaggbagtool exits with an error when this switch is given and PATH is not a regular file. If rwaggbagtool encounters an error or is interrupted prior to closing the temporary file, the temporary file is removed. See also --backup-path. Since SiLK 3.21.0.

--backup-path=BACKUP

Move the file named by --output-path (PATH) to the path BACKUP immediately prior to moving the temporary file created by --modify-inplace over PATH. If BACKUP names a directory, the file is moved into that directory. This switch will overwrite an existing file. If PATH and BACKUP point to the same location, the output is written to PATH and no backup is created. If BACKUP cannot be created, the output is left in the temporary file and rwaggbagtool exits with a message and an error. rwaggbagtool exits with an error if this switch is given without --modify-inplace. Since SiLK 3.21.0.

--note-strip

Do not copy the notes (annotations) from the input files to the output file. Normally notes from the input files are copied to the output.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

Miscellaneous switches
--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwaggbagtool searches for the site configuration file in the locations specified in the FILES section.

--help

Print the available options and exit.

--help-fields

Print the names and descriptions of the fields that may be used in the command line options that require a field name. Since SiLK 3.22.0.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

Add two files

Read today’s incoming flow records by type and use rwaggbag(1) to create an Aggregate Bag file for each, in.aggbag and inweb.aggbag, that count records using the protocol and both ports as the key. Add the counters in the two files to create total.aggbag. Use rwaggbagcat(1) to display the result.

 $ rwfilter --type=in --all=-                               \
   | rwaggbag --key=sport,dport,proto --counter=records     \
        --output-path=in.aggbag
 $ rwfilter --type=inweb --all=-                            \
   | rwaggbag --key=sport,dport,proto --counter=records     \
        --output-path=inweb.aggbag
 $ rwaggbagtool --add in.aggbag inweb.aggbag --output-path=total.aggbag
 $ rwaggbagcat total.aggbag

Subtract a file

Subtract inweb.aggbag from total.aggbag.

 $ rwaggbagtool --subtract total.aggbag inweb.aggbag    \
   | rwaggbagcat

Percent of traffic

Compute the percent of all incoming traffic per protocol and ports that was stored in the inweb type by multiplying the counters in inweb.aggbag by 100 and dividing by total.aggbag.

 $ rwaggbagtool --scalar-multiply=100 inweb.aggbag  \
   | rwaggbagtool --divide stdin total.aggbag       \
   | rwaggbagcat

Create a file

Create an Aggregate Bag file from data.rw where the ports are the key and that sums the bytes and packets.

 $ rwaggbag --key=sport,dport                       \
        --counter=sum-bytes,sum-packets data.rw     \
        --output-path=my-ab.aggbag

Choose selected fields

Using the previous file, get just the source port and byte count from the file my-ab.aggbag. One approach is to remove the destination port and packet count.

 $ rwaggbagtool --remove=dport,sum-packets my-ab.aggbag  \
        --output-path=source-bytes.aggbag

The other approach selects the source port and byte count.

 $ rwaggbagtool --select=sport,sum-bytes my-ag.aggbag    \
        --output-path=source-bytes.aggbag

To replace the packet count in my-ab.aggbag with zeros, remove the field and insert it with the value you want.

 $ rwaggbagtool --remove=sum-packets --insert=sum-packets=0  \
        my-ab.aggbag --output-path=zero-packets.aggbag

Convert to different formats

To create a regular Bag with the source port and byte count from my-ab.aggbag, use the --to-bag switch:

 $ rwaggbagtool --to-bag=sport,sum-bytes my-ab.aggbag  \
        --output-path=sport-byte.bag

The --to-ipset switch works similarly:

 $ rwaggbag --key=sipv6,dipv6 --counter=records data-v6.rw  \
        --output-path=ips.aggbag
 $ rwaggbagtool --to-ipset=dipv6 --output-path=dip.set

ENVIRONMENT

SILK_IPSET_RECORD_VERSION

This environment variable is used as the value for the --ipset-record-version when that switch is not provided.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwaggbagtool may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwaggbagtool may use this environment variable. See the FILES section for details.

FILES

${SILK_CONFIG_FILE}

${SILK_DATA_ROOTDIR}/silk.conf

/data/silk.conf

${SILK_PATH}/share/silk/silk.conf

${SILK_PATH}/share/silk.conf

/usr/local/share/silk/silk.conf

/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

NOTES

The Aggregate Bag tools were added in SiLK 3.15.0.

SiLK 3.17.0 added the --min-field, --max-field, --set-intersect, and --set-complement switches.

Support for country codes was added in SiLK 3.19.0.

The --modify-inplace switch was added in SiLK 3.21. When --backup-path is also given, there is a small time window when the original file does not exist: the time between moving the original file to the backup location and moving the temporary file into place.

SEE ALSO

rwaggbag(1), rwaggbagbuild(1), rwaggbagcat(1), rwfilter(1), rwfileinfo(1), silk(7), zlib(3)

rwappend

Append SiLK Flow file(s) to an existing SiLK Flow file

SYNOPSIS

  rwappend [--create=[TEMPLATE_FILE]] [--print-statistics]
         [--site-config-file=FILENAME]
         TARGET_FILE SOURCE_FILE [SOURCE_FILE...]

  rwappend --help

  rwappend --version

DESCRIPTION

rwappend reads SiLK Flow records from the specified SOURCE_FILEs and appends them to the TARGET_FILE. If stdin is used as the name of one of the SOURCE_FILEs, SiLK flow records will be read from the standard input.

When the TARGET_FILE does not exist and the --create switch is not provided, rwappend will exit with an error. When --create is specified and TARGET_FILE does not exist, rwappend will create the TARGET_FILE using the same format, version, and byte-order as the specified TEMPLATE_FILE. If no TEMPLATE_FILE is given, the TARGET_FILE is created in the default format and version (the same format that rwcat(1) would produce).

The TARGET_FILE must be an actual file---it cannot be a named pipe or the standard output. In addition, the header of TARGET_FILE must not be compressed; that is, you cannot append to a file whose entire contents has been compressed with gzip (those files normally end in the .gz extension).

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--create

--create=TEMPLATE_FILE

Create the TARGET_FILE if it does not exist. The file will have the same format, version, and byte-order as the TEMPLATE_FILE if it is provided; otherwise the defaults are used. The TEMPLATE_FILE will NOT be appended to TARGET_FILE unless it also appears in as the name of a SOURCE_FILE.

--print-statistics

Print to the standard error the number of records read from each SOURCE_FILE and the total number of records appended to the TARGET_FILE.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwappend searches for the site configuration file in the locations specified in the FILES section.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

Standard usage where the file to append to, results.rw, exists:

 $ rwappend results.rw sample5.rw sample6.rw

To append files sample*.rw to results.rw, or to create results.rw using the same format as the first file argument (note that sample1.rw must be repeated):

 $ rwappend results.rw --create=sample1.rw          \
        sample1.rw sample2.rw

If results.rw does not exist, the following two commands are equivalent:

 $ rwappend --create results.rw sample1.rw sample2.rw

 $ rwcat sample1.rw sample2.rw > results.rw

ENVIRONMENT

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwappend may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwappend may use this environment variable. See the FILES section for details.

FILES

${SILK_CONFIG_FILE}

${SILK_DATA_ROOTDIR}/silk.conf

/data/silk.conf

${SILK_PATH}/share/silk/silk.conf

${SILK_PATH}/share/silk.conf

/usr/local/share/silk/silk.conf

/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwcat(1), silk(7)

BUGS

When a SOURCE_FILE contains IPv6 flow records and the TARGET_FILE only supports IPv4 records, rwappend converts IPv6 records that contain addresses in the ::ffff:0:0/96 prefix to IPv4 and writes them to the TARGET_FILE. rwappend silently ignores IPv6 records having addresses outside of that prefix.

rwappend makes some attempts to avoid appending a file to itself (which would eventually exhaust the disk space) by comparing the names of files it is given; it should be smarter about this.

rwbag

Build a binary Bag from SiLK Flow records

SYNOPSIS

  rwbag --bag-file=KEY,COUNTER,OUTPUTFILE
        [--bag-file=KEY,COUNTER,OUTPUTFILE ...]
        [{ --pmap-file=PATH | --pmap-file=MAPNAME:PATH }]
        [--note-strip] [--note-add=TEXT] [--note-file-add=FILE]
        [--invocation-strip] [--print-filenames] [--copy-input=PATH]
        [--compression-method=COMP_METHOD]
        [--ipv6-policy={ignore,asv4,mix,force,only}]
        [--site-config-file=FILENAME]
        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}

  rwbag --help

  rwbag --legacy-help

  rwbag --version

LEGACY SYNOPSIS

  rwbag [--sip-flows=OUTPUTFILE] [--dip-flows=OUTPUTFILE]
        [--sport-flows=OUTPUTFILE] [--dport-flows=OUTPUTFILE]
        [--proto-flows=OUTPUTFILE] [--sensor-flows=OUTPUTFILE]
        [--input-flows=OUTPUTFILE] [--output-flows=OUTPUTFILE]
        [--nhip-flows=OUTPUTFILE]
        [--sip-packets=OUTPUTFILE] [--dip-packets=OUTPUTFILE]
        [--sport-packets=OUTPUTFILE] [--dport-packets=OUTPUTFILE]
        [--proto-packets=OUTPUTFILE] [--sensor-packets=OUTPUTFILE]
        [--input-packets=OUTPUTFILE] [--output-packets=OUTPUTFILE]
        [--nhip-packets=OUTPUTFILE]
        [--sip-bytes=OUTPUTFILE] [--dip-bytes=OUTPUTFILE]
        [--sport-bytes=OUTPUTFILE] [--dport-bytes=OUTPUTFILE]
        [--proto-bytes=OUTPUTFILE] [--sensor-bytes=OUTPUTFILE]
        [--input-bytes=OUTPUTFILE] [--output-bytes=OUTPUTFILE]
        [--nhip-bytes=OUTPUTFILE]
        [--note-add=TEXT] [--note-file-add=FILE]
        [--print-filenames] [--copy-input=PATH]
        [--compression-method=COMP_METHOD]
        [--ipv6-policy={ignore,asv4,mix,force,only}]
        [--site-config-file=FILENAME]
        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}

DESCRIPTION

rwbag reads SiLK Flow records and builds one or more Bag files. A Bag is similar to a set but each key is associated with a counter. Usually the key is some aspect of a flow record (an IP address, a port, the protocol, et cetera), and the counter is a volume (such as the number of flow records or the sum or bytes or packets) for the flow records that match that key. A Bag file supports a single key field and a single counter field; use the Aggregate Bag tools (e.g., rwaggbag(1)) when the key or counter contains multiple fields.

The --bag-file switch is required and it specifies how to create a Bag file. The argument to the switch names the key field to use for the bag, the counter field, and the location where the bag file is to be written. The switch may be repeated to create multiple Bag files.

rwbag reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwbag reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.

If adding a value to a key would cause the value to overflow the maximum value that Bags support, the key’s value will be set to the maximum and processing will continue. In addition, if this is the first value to overflow in this Bag, a warning will be printed to the standard error.

If rwbag runs out of memory, it will exit immediately. The output Bag files will remain behind, each with a size of 0 bytes.

Use rwbagcat(1) to see the contents of a bag. To create a bag from textual input or from an IPset, use rwbagbuild(1). rwbagtool(1) allows you to manipulate binary bag files.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--bag-file=KEY,COUNTER,OUTPUTFILE

Bin flow records by unique KEY, compute the COUNTER for each bin, and write the result to OUTPUTFILE. The list of available KEY and COUNTER values are given immediately below. OUTPUTFILE is the name of a non-existent file, a named pipe, or the keyword stdout or - to write the binary Bag to the standard output. Repeat the --bag-file switch to create multiple Bag files in a single pass over the data. Only one OUTPUTFILE may use the standard output. See LEGACY BAG CREATION SWITCHES for deprecated methods to create Bag files. This switch or one of legacy equivalents is required. Since SiLK 3.12.0.

rwbag supports the following names for KEY. The case of KEY is ignored.

sIPv4

source IP address, either IPv4 or IPv6

sIPv6

source IP address, either IPv4 or IPv6

dIPv4

destination IP address, either IPv4 or IPv6

dIPv6

destination IP address, either IPv4 or IPv6

sPort

source port for TCP or UDP, or equivalent

dPort

destination port for TCP or UDP, or equivalent

protocol

IP protocol

packets

count of packets recorded for this flow record

bytes

count of bytes recorded for this flow record

flags

bit-wise OR of TCP flags over all packets in the flow

sTime

starting time of the flow, in seconds resolution

duration

duration of the flow, in seconds resolution

eTime

ending time of the flow, in seconds resolution

sensor

numeric ID of the sensor where the flow was collected

input

router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))

output

router SNMP output interface or postVlanId

nhIPv4

router next hop IP address, either IPv4 or IPv6

nhIPv6

router next hop IP address, either IPv4 or IPv6

initialFlags

TCP flags on first packet in the flow

sessionFlags

bit-wise OR of TCP flags over all packets except the first in the flow

attributes

flow attributes set by the flow generator

application

guess as to the content of the flow

sip-country

the country code of the source IP address. Uses the mapping file specified by the SILK_COUNTRY_CODES environment variable or the country_codes.pmap mapping file, as described in FILES. (See also ccfilter(3).) Since SiLK 3.12.0.

scc

an alias for sip-country

dip-country

the country code of the destination IP address

dcc

an alias for dip-country

sip-pmap:MAPNAME

the value that the source IP address maps to in the mapping file whose map-name is MAPNAME. The type of that prefix map must be IPv4-address or IPv6-address. Use --pmap-file to load the mapping file and optionally set its map-name. Since the MAPNAME must be known when the --bag-file switch is parsed, the --pmap-file switch(es) should precede the --bag-file switch(es).

dip-pmap:MAPNAME

the value that the destination IP address maps to in the mapping file whose map-name is MAPNAME. See sip-pmap:MAPNAME.

sport-pmap:MAPNAME

the value that the protocol/source-port pair maps to in the mapping file whose map-name is MAPNAME. The type of that prefix map must be proto-port. Use --pmap-file to load the mapping file and optionally set its map-name. Since the MAPNAME must be known when the --bag-file switch is parsed, the --pmap-file switch(es) should precede the --bag-file switch(es).

dport-pmap:MAPNAME

the value that the protocol/destination-port pair maps to in the mapping file whose map-name is MAPNAME. See sport-pmap:MAPNAME.

rwbag supports the following names for COUNTER. The case of COUNTER is ignored.

records

count of the number of flow records that match the key

flows

an alias for records

sum-packets

the sum of the packet counts for flow records that match the key

packets

an alias for sum-packets

sum-bytes

the sum of the byte counts for flow records that match the key

bytes

an alias for sum-bytes

--pmap-file=PATH

--pmap-file=MAPNAME:PATH

Load the the prefix map file from PATH for use when the key part of the argument to the --bag-file switch is one of sip-pmap, dip-pmap, sport-pmap, or dport-pmap. Specify PATH as - or stdin to read from the standard input. If MAPNAME is specified, it overrides the map-name contained in the prefix map file itself. If no map-name is available, rwbag exits with an error. The switch may be repeated to load multiple prefix map files; each file must have a unique map-name. To create a prefix map file, use rwpmapbuild(1). Since SiLK 3.12.0.

--note-strip

Do not copy the notes (annotations) from the input files to the output file(s). When this switch is not specified, notes from the input files are copied to the output. Since SiLK 3.12.2.

--note-add=TEXT

Add the specified TEXT to the header of every output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of every output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--invocation-strip

Do not record any command line history: do not copy the invocation history from the input files to the output file(s), and do not record the current command line invocation in the output. The invocation may be viewed with rwfileinfo(1). Since SiLK 3.12.0.

--print-filenames

Print to the standard error the names of input files as they are opened.

--copy-input=PATH

Copy all binary SiLK Flow records read as input to the specified file or named pipe. PATH may be stdout or - to write flows to the standard output as long as no Bag file is being written there.

--ipv6-policy=POLICY

Determine how IPv4 and IPv6 flows are handled when SiLK has been compiled with IPv6 support. When the switch is not provided, the SILK_IPV6_POLICY environment variable is checked for a policy. If it is also unset or contains an invalid policy, the POLICY is mix. When SiLK has not been compiled with IPv6 support, IPv6 flows are always ignored, regardless of the value passed to this switch or in the SILK_IPV6_POLICY variable. The supported values for POLICY are:

ignore

Ignore any flow record marked as IPv6, regardless of the IP addresses it contains. Only IP addresses contained in IPv4 flow records will be added to the bag(s).

asv4

Convert IPv6 flow records that contain addresses in the ::ffff:0:0/96 netblock (that is, IPv4-mapped IPv6 addresses) to IPv4 and ignore all other IPv6 flow records.

mix

Process the input as a mixture of IPv4 and IPv6 flow records. When creating a bag whose key is an IP address and the input contains IPv6 addresses outside of the ::ffff:0:0/96 netblock, this policy is equivalent to force; otherwise it is equivalent to asv4.

force

Convert IPv4 flow records to IPv6, mapping the IPv4 addresses into the ::ffff:0:0/96 netblock.

only

Process only flow records that are marked as IPv6. Only IP addresses contained in IPv6 flow records will be added to the bag(s).

Regardless of the IPv6 policy, when all IPv6 addresses in the bag are in the ::ffff:0:0/96 netblock, rwbag treats them as IPv4 addresses and writes an IPv4 bag. When any other IPv6 addresses are present in the bag, the IPv4 addresses in the bag are mapped into the ::ffff:0:0/96 netblock and rwbag writes an IPv6 bag.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwbag searches for the site configuration file in the locations specified in the FILES section.

--xargs

--xargs=FILENAME

Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwbag opens each named file in turn and reads records from it as if the filenames had been listed on the command line.

--help

Print the available options and exit.

--legacy-help

Print help, including legacy switches. See the LEGACY BAG CREATION SWITCHES section below for these switches.

--version

Print the version number and information about how SiLK was configured, then exit the application.

LEGACY BAG CREATION SWITCHES

The following switches are deprecated as of SiLK 3.12.0. These switches may be used in conjunction with the --bag-file switch.

--sip-flows=OUTPUTFILE

Equivalent to --bag-file=sIPv4,records,OUTPUTFILE. Count number of flows by unique source IP.

--sip-packets=OUTPUTFILE

Equivalent to --bag-file=sIPv4,sum-packets,OUTPUTFILE. Count number of packets by unique source IP.

--sip-bytes=OUTPUTFILE

Equivalent to --bag-file=sIPv4,sum-bytes,OUTPUTFILE. Count number of bytes by unique source IP.

--dip-flows=OUTPUTFILE

Equivalent to --bag-file=dIPv4,records,OUTPUTFILE. Count number of flows by unique destination IP.

--dip-packets=OUTPUTFILE

Equivalent to --bag-file=dIPv4,sum-packets,OUTPUTFILE. Count number of packets by unique destination IP.

--dip-bytes=OUTPUTFILE

Equivalent to --bag-file=dIPv4,sum-bytes,OUTPUTFILE. Count number of bytes by unique destination IP.

--sport-flows=OUTPUTFILE

Equivalent to --bag-file=sPort,records,OUTPUTFILE. Count number of flows by unique source port.

--sport-packets=OUTPUTFILE

Equivalent to --bag-file=sPort,sum-packets,OUTPUTFILE. Count number of packets by unique source port.

--sport-bytes=OUTPUTFILE

Equivalent to --bag-file=sPort,sum-bytes,OUTPUTFILE. Count number of bytes by unique source port.

--dport-flows=OUTPUTFILE

Equivalent to --bag-file=dPort,records,OUTPUTFILE. Count number of flows by unique destination port.

--dport-packets=OUTPUTFILE

Equivalent to --bag-file=dPort,sum-packets,OUTPUTFILE. Count number of packets by unique destination port.

--dport-bytes=OUTPUTFILE

Equivalent to --bag-file=dPort,sum-bytes,OUTPUTFILE. Count number of bytes by unique destination port.

--proto-flows=OUTPUTFILE

Equivalent to --bag-file=protocol,records,OUTPUTFILE. Count number of flows by unique protocol.

--proto-packets=OUTPUTFILE

Equivalent to --bag-file=protocol,sum-packets,OUTPUTFILE. Count number of packets by unique protocol.

--proto-bytes=OUTPUTFILE

Equivalent to --bag-file=protocol,sum-bytes,OUTPUTFILE. Count number of bytes by unique protocol.

--sensor-flows=OUTPUTFILE

Equivalent to --bag-file=sensor,records,OUTPUTFILE. Count number of flows by unique sensor ID.

--sensor-packets=OUTPUTFILE

Equivalent to --bag-file=sensor,sum-packets,OUTPUTFILE. Count number of packets by unique sensor ID.

--sensor-bytes=OUTPUTFILE

Equivalent to --bag-file=sensor,sum-bytes,OUTPUTFILE. Count number of bytes by unique sensor ID.

--input-flows=OUTPUTFILE

Equivalent to --bag-file=input,records,OUTPUTFILE. Count number of flows by unique input interface index.

--input-packets=OUTPUTFILE

Equivalent to --bag-file=input,sum-packets,OUTPUTFILE. Count number of packets by unique input interface index.

--input-bytes=OUTPUTFILE

Equivalent to --bag-file=input,sum-bytes,OUTPUTFILE. Count number of bytes by unique input interface index.

--output-flows=OUTPUTFILE

Equivalent to --bag-file=output,records,OUTPUTFILE. Count number of flows by unique output interface index.

--output-packets=OUTPUTFILE

Equivalent to --bag-file=output,sum-packets,OUTPUTFILE. Count number of packets by unique output interface index.

--output-bytes=OUTPUTFILE

Equivalent to --bag-file=output,sum-bytes,OUTPUTFILE. Count number of bytes by unique output interface index.

--nhip-flows=OUTPUTFILE

Equivalent to --bag-file=nhIPv4,records,OUTPUTFILE. Count number of flows by unique next hop IP.

--nhip-packets=OUTPUTFILE

Equivalent to --bag-file=nhIPv4,sum-packets,OUTPUTFILE. Count number of packets by unique next hop IP.

--nhip-bytes=OUTPUTFILE

Equivalent to --bag-file=nhIPv4,sum-bytes,OUTPUTFILE. Count number of bytes by unique next hop IP.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

Bag of Protocol:Byte

Read the SiLK Flow file data.rw and create the Bag proto-byte.bag that contains the total byte-count seen for each protocol by using protocol as the key and sum-bytes as the counter:

 $ rwbag --bag-file=protocol,sum-bytes,proto-byte.bag data.rw

Use rwbagcat(1) to view the result:

 $ rwbagcat proto-byte.bag
          1|            10695328|
          6|        120536195111|
         17|            24500079|

Specify the output path as - to pass the Bag file from rwbag directly into rwbagcat.

 $ rwbag --bag-file=protocol,sum-bytes,- data.rw    \
   | rwbagcat
          1|            10695328|
          6|        120536195111|
         17|            24500079|

Compare that to this rwuniq(1) command.

 $ rwuniq --field=protocol --value=bytes --sort-output data.rw
 pro|               Bytes|
   1|            10695328|
   6|        120536195111|
  17|            24500079|

One advantage of Bag files over rwuniq is that the data remains in binary form where it can be manipulated by rwbagtool(1).

Two Bags in a Single Pass

Read records from rwfilter(1) and build Bag files sip-flow.bag and dip-flow.bag that count the number of flows seen for each source address and for each destination address, respectively.

 $ rwfilter ... --pass=stdout                       \
   | rwbag --bag-file=sipv4,records,sip-flow.bag    \
        --bag-file=dipv4,records,dip-flow.bag

Using a Network Prefix

To create sip16-byte.bag that contains the number of bytes seen for each /16 found in the source address field, use the rwnetmask(1) tool prior to feeding the input to rwbag:

 $ rwfilter ... --pass=stdout                       \
   | rwnetmask --4sip-prefix-length=16              \
   | rwbag --bag-file=sipv4,sum-bytes,sip16-byte.bag

 $ rwbagcat sip16-byte.bag | head -4
        10.4.0.0|               18260|
        10.5.0.0|              536169|
        10.9.0.0|               55386|
       10.11.0.0|             5110438|

To print the IP addresses of an existing Bag into /16 prefixes, use the --network-structure switch of rwbagcat(1).

 $ rwfilter ... --pass=stdout                   \
   | rwbag --bag-file=sipv4,sum-bytes,-         \
   | rwbagcat --network-structure=B             \
   | head -4
        10.4.0.0/16|               18260|
        10.5.0.0/16|              536169|
        10.9.0.0/16|               55386|
       10.11.0.0/16|             5110438|

Bag of Country Codes

As of SiLK 3.12.0, a Bag file may contain a country code as its key. Create scc-pkt.bag that sums the packet count by country.

 $ rwbag --bag-file=sip-country,sum-packets,scc-pkt.bag
 $ rwbagcat scc-pkt.bag
 --|                 840|
 a1|                 284|
 a2|                   1|
 ae|                   8|

Bag of Prefix Map Values

rwbag and rwbagbuild(1) can use a prefix map file as the key in a Bag file as of SiLK 3.12.0. For example, to lookup each source address in the prefix map file ip-map.pmap that maps from address to ”type of service”, use the --pmap-file switch to specify the prefix map file, and specify the Bag’s key as sip-pmap:map-name, where map-name is either the map-name stored in the prefix map file or a name that is provided as part of the --pmap-file argument. (A prefix map’s map-name is available via the rwfileinfo(1) command.)

 $ rwfileinfo --field=prefix-map ip-map.pmap
 ip-map.pmap:
   prefix-map          v1: service-host
 $
 $ rwbag --pmap-file=ip-map.pmap                            \
        --bag-file=sip-pmap:service-host,bytes,srvhost.bag  \
        data.rw

Multiple --pmap-file switches may be specified which may be useful when generating multiple Bag files in a single invocation. On the command line, the --pmap-file switch that defines the map-name must preceded the --bag-file where the map-name is used.

The prefix map file is not stored as part of the Bag, so you must provide the name of the prefix map when running rwbagcat.

 $ rwbagcat srvhost.bag
 rwbagcat: The --pmap-file switch is required for \
         Bags containing sip-pmap keys
 $ rwbagcat --pmap-file=ip-map.pmap srvhost.bag
          external|         59950837766|
          internal|         60602999159|
               ntp|              588316|
               dns|            14404581|
              dhcp|             2560696|

rwbag also has support for prefix map files that map from a protocol-port pair to a label. The proto-port.pmap file does not have a map-name so a name must be provided on the rwbag command line.

 $ rwfileinfo --field=prefix-map proto-port.pmap
 proto-port.pmap:
 $
 $ rwbag --pmap-file=srvport:proto-port.pmap                \
        --bag-file=sip-pmap:srvport,flows,srvport.bag       \
        data.rw
 $ rwbagcat --pmap-file=proto-port.pmap srvport.bag | head -4
      ICMP|               15622|
       UDP|               62216|
   UDP/DNS|               62216|
  UDP/DHCP|               15614|

ENVIRONMENT

SILK_COUNTRY_CODES

This environment variable allows the user to specify the country code mapping file that rwbag uses when mapping an IP to a country for the sip-country and dip-country keys. The value may be a complete path or a file relative to the SILK_PATH. See the FILES section for standard locations of this file.

SILK_IPV6_POLICY

This environment variable is used as the value for --ipv6-policy when that switch is not provided.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwbag may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwbag may use this environment variable. See the FILES section for details.

FILES

${SILK_CONFIG_FILE}

${SILK_DATA_ROOTDIR}/silk.conf

/data/silk.conf

${SILK_PATH}/share/silk/silk.conf

${SILK_PATH}/share/silk.conf

/usr/local/share/silk/silk.conf

/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

$SILK_COUNTRY_CODES

$SILK_PATH/share/silk/country_codes.pmap

$SILK_PATH/share/country_codes.pmap

/usr/local/share/silk/country_codes.pmap

/usr/local/share/country_codes.pmap

Possible locations for the country code mapping file required by the sip-country and dip-country keys.

SEE ALSO

rwbagbuild(1), rwbagcat(1), rwbagtool(1), rwaggbag(1), rwfileinfo(1), rwfilter(1), rwnetmask(1), rwpmapbuild(1), rwuniq(1), ccfilter(3), sensor.conf(5), silk(7), zlib(3)

rwbagbuild

Create a binary Bag from non-flow data

SYNOPSIS

  rwbagbuild { --set-input=SETFILE | --bag-input=TEXTFILE }
        [--delimiter=C] [--proto-port-delimiter=C]
        [--default-count=DEFAULTCOUNT]
        [--key-type=FIELD_TYPE] [--counter-type=FIELD_TYPE]
        [{ --pmap-file=PATH | --pmap-file=MAPNAME:PATH }]
        [--note-add=TEXT] [--note-file-add=FILE]
        [--invocation-strip] [--compression-method=COMP_METHOD]
        [--output-path=PATH]

  rwbagbuild --help

  rwbagbuild --version

DESCRIPTION

rwbagbuild builds a binary Bag file from an IPset file or from textual input. A Bag is a set of keys where each key is associated with a counter. Usually the key is some aspect of a flow record (an IP address, a port, the protocol, et cetera), and the counter is a volume (such as the number of flow records or the sum or bytes or packets) for the flow records that match that key.

Either --set-input or --bag-input must be provided to specify the type and the location of the input file. To read from the standard input, specify stdin or - as the argument to the switch.

Each occurrence of a unique key adds a counter value to the Bag file for that key, where the counter is the value specified by --default-count, a value specified on a line in the textual input, or a fallback value of 1. If the addition causes an overflow of the maximum counter value (18446744073709551614), the counter is set to the maximum. A message is printed to the standard error the first time an overflow condition is detected.

SET INPUT

When creating a Bag from an IPset, the count associated with each IP address is the value specified by the --default-count switch or 1 if the switch is not provided.

If the --key-type is sip-country, dip-country, or any-country, each IP address is mapped to its country code using the country code mapping file (see FILES) and that key is added to the Bag file with the --default-count value.

If the --key-type is sip-pmap, dip-pmap, or any-ip-pmap, each IP address is mapped to a value found in the prefix map file specified in --pmap-file and that value is added to the Bag file with the --default-count value.

BAG (TEXTUAL) INPUT

The textual input read from the argument to the --bag-input switch is processed a line at a time. Comments begin with a ’#’-character and continue to the end of the line; they are stripped from each line. Any line that is blank or contains only whitespace is ignored. All other lines must contain a valid key or key-counter pair; whitespace around the key and counter is ignored. The key and counter are separated by a one-character delimiter. The default delimiter is vertical bar (|); use --delimiter to specify a different delimiter.

Each line that is not ignored must begin with a key. The accepted formats of the key are described below.

When the --default-count switch is given, rwbagtool only parses the key and ignores everything on a line to the right of the first delimiter. To re-iterate, the --default-count switch overrides any counter present on the line.

If the delimiter is not present on a line, rwbagtool parses the key and adds the --default-count value (or the fallback value of 1) to the Bag for that key.

When --default-count is not given, any text between the first delimiter and optional second delimiter on a line is treated as the counter. If the counter contains only whitespace, the counter for the key is incremented by 1; otherwise, the counter must be a (decimal) number from 0 to 18446744073709551614 inclusive. If a second delimiter is present, it and any text that follows it is ignored.

rwbagbuild prints an error and exits when a key or counter cannot be parsed.

Format of the counter

The counter is any non-negative (decimal) integer value from 0 to 18446744073709551614 inclusive (the maximum is one less than the maximum unsigned 64-bit value). When writing the Bag file, keys whose counter is zero are not written to the file.

Format of the Key

The key is a 32-bit integer, an IP address, a CIDR block, a SiLK IPWildcard, or a pair of numbers when the key-type is a protocol-port prefix map file.

For key-types that use fewer than 32-bits, rwbagbuild does not verify the validity of the key. For example, it is possible to have 257 as a key in Bag whose key-type is protocol.

rwbagbuild parses specific key-types as follows:

sIPv4, dIPv4, nhIPv4, any-IPv4

key is an IPv4 address or a 32-bit value; key-type set to corresponding IPv6 type when an IPv6 address is present. A CIDR block or SiLK IPWildcard representing multiple addresses adds multiple entries to the Bag

sIPv6, dIPv6, nhIPv6, any-IPv6

key is an IPv6 address. An IPv4 address is mapped into the ::ffff:0:0/96 netblock. All keys must be IP addresses (integers are not allowed).

flags, initialFlags, sessionFlags

key is the numeric value of the flags, 17 = FIN|ACK

sTime, eTime, any-time

key is seconds since the UNIX epoch

duration

key represents seconds

sensor

key is the numeric sensor ID

sip-country, dip-country, any-country

key is an IP address; the country_codes.pmap prefix map file is used to map the IP to a country code that is stored in the Bag

sip-pmap, dip-pmap, any-ip-pmap

key is an IP address; the specified --prefix-map file is used to map the IP to a value that is stored in the Bag

sport-pmap, dport-pmap, any-port-pmap

key is comprised of two numbers separated by a delimiter: a protocol (8-bit number) and a port (16-bit number). Those values are looked up in the specified --prefix-map file and the result is stored in the Bag. The delimiter separating the protocol and port may be set by --proto-port-delimiter. If not explicitly set, it is the same as the delimiter specified to --delimiter. The default delimiter is ’|’.

attributes

these bits of the key are relevant, though any 32-bit value is accepted: 0x08=F, 0x10=S, 0x20=T, 0x40=C

class, type

key is treated as a number

An IP address or integer key must be expressed in one of the following formats. rwbagbuild complains if the key field contains a mixture of IPv6 addresses and integer values.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

The first two switches control the type of input; exactly one must be provided:

--set-input=SETFILE

Create a Bag from an IPset. SETFILE is a filename, a named pipe, or the keyword stdin or - to read the IPset from the standard input. Counts have a volume of 1 when the --default-count switch is not specified. (IPsets are typically created by rwset(1) or rwsetbuild(1).)

--bag-input=TEXTFILE

Create a Bag from a delimited text file. TEXTFILE is a filename, a named pipe, or the keyword stdin or - to read the text from the standard input. See the DESCRIPTION section for the syntax of the TEXTFILE.

--delimiter=C

Expect the character C between each key-counter pair in the TEXTFILE read by the --bag-input switch. The default delimiter is the vertical pipe (’|’). The delimiter is ignored if the --set-input switch is specified. When the delimiter is a whitespace character, any amount of whitespace may surround and separate the key and counter. Since ’#’ is used to denote comments and newline is used to denote records, neither is a valid delimiter character.

--proto-port-delimiter=C

Expect the character C between the protocol and port that comprise a key when the --key-type is sport-pmap, dport-pmap, or any-port-pmap. Unless this switch is specified, rwbagbuild expects the key-counter delimiter to appear between the protocol and port.

--default-count=DEFAULTCOUNT

Override the counts of all values in the input text or IPset with the value of DEFAULTCOUNT. DEFAULTCOUNT must be a positive integer from 1 to 18446744073709551614 inclusive.

--key-type=FIELD_TYPE

Write a entry into the header of the Bag file that specifies the key contains FIELD_TYPE values. When this switch is not specified, the key type of the Bag is set to custom. The FIELD_TYPE is case insensitive. The supported FIELD_TYPEs are:

sIPv4

source IP address, IPv4 only

dIPv4

destination IP address, IPv4 only

sPort

source port

dPort

destination port

protocol

IP protocol

packets

packets, see also sum-packets

bytes

bytes, see also sum-bytes

flags

an unsigned bitwise OR of TCP flags

sTime

starting time of the flow record, seconds resolution

duration

duration of the flow record, seconds resolution

eTime

ending time of the flow record, seconds resolution

sensor

sensor ID

input

SNMP input

output

SNMP output

nhIPv4

next hop IP address, IPv4 only

initialFlags

TCP flags on first packet in the flow

sessionFlags

bitwise OR of TCP flags on all packets in the flow except the first

attributes

flow attributes set by the flow generator

application

guess as to the content of the flow, as set by the flow generator

class

class of the sensor

type

type of the sensor

icmpTypeCode

an encoded version of the ICMP type and code, where the type is in the upper byte and the code is in the lower byte

sIPv6

source IP, IPv6

dIPv6

destination IP, IPv6

nhIPv6

next hop IP, IPv6

records

count of flows

sum-packets

sum of packet counts

sum-bytes

sum of byte counts

sum-duration

sum of duration values

any-IPv4

a generic IPv4 address

any-IPv6

a generic IPv6 address

any-port

a generic port

any-snmp

a generic SNMP value

any-time

a generic time value, in seconds resolution

sip-country

the country code of the source IP address. For textual input, the key column must contain an IP address or an integer. rwbagbuild maps the IP address to a country code and stores the country code in the bag. Uses the mapping file specified by the SILK_COUNTRY_CODES environment variable or the country_codes.pmap mapping file, as described in FILES. (See also ccfilter(3).) Since SiLK 3.12.0.

dip-country

the country code of the destination IP. See sip-country. Since SiLK 3.12.0.

any-country

the country code of any IP address. See sip-country. Since SiLK 3.12.0.

sip-pmap

a prefix map value found from a source IP address. Maps each IP address in the key column to a value from a prefix map file and stores the value in the bag. The type of the prefix map must be IPv4-address or IPv4-address. Use the --pmap-file switch to specify the path to the file. Since SiLK 3.12.0.

dip-pmap

a prefix map value found from a destination IP address. See sip-pmap. Since SiLK 3.12.0.

any-ip-pmap:PMAP_PATH

a prefix map value found from any IP address. See sip-pmap. Since SiLK 3.12.0.

sport-pmap

a prefix map value found from a protocol/source-port pair. Each key must contain two values, a protocol and a port. Maps each protocol/port pair to a value from a prefix map file and stores the value in the bag. The type of the prefix map must be proto-port. Use the --pmap-file switch to specify the path to the file. Since SiLK 3.12.0.

dport-pmap

a prefix map value found from a protocol/destination-port pair. See sport-pmap. Since SiLK 3.12.0.

any-port-pmap

a prefix map value found from a protocol/port pair. See sport-pmap. Since SiLK 3.12.0.

custom

a number

--counter-type=FIELD_TYPE

Write a entry into the header of the Bag file that specifies the counter contains FIELD_TYPE values. When this switch is not specified, the counter type of the Bag is set to custom. Although the supported FIELD_TYPEs are the same as those for the key, the value is always treated as a number that can be summed. rwbagbuild does not use the country code or prefix map when parsing the value field.

--pmap-file=PATH

--pmap-file=MAPNAME:PATH

When the key-type is one of sip-pmap, dip-pmap, any-ip-pmap, sport-pmap, dport-pmap, or any-port-pmap, use the prefix map file located at PATH to map the key to a string. Specify PATH as - or stdin to read from the standard input. A map-name may be included in the argument to the switch, but rwbagbuild currently does not use the map-name. To create a prefix map file, use rwpmapbuild(1). Since SiLK 3.12.0.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--invocation-strip

Do not record the command used to create the Bag file in the output. When this switch is not given, the invocation is written to the file’s header, and the invocation may be viewed with rwfileinfo(1). Since SiLK 3.12.0.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--output-path=PATH

Write the binary Bag output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwbagtool exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwbagtool to exit with an error.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

Create a bag with IP addresses as keys from a text file

Assume the file mybag.txt contains the following lines, where each line contains an IP address, a comma as a delimiter, a count, and ends with a newline.

 192.168.0.1,5
 192.168.0.2,500
 192.168.0.3,3
 192.168.0.4,14
 192.168.0.5,5

To build a bag with it:

 $ rwbagbuild --bag-input=mybag.txt --delimiter=, > mybag.bag

Use rwbagcat(1) to view its contents:

 $ rwbagcat mybag.bag
     192.168.0.1|                   5|
     192.168.0.2|                 500|
     192.168.0.3|                   3|
     192.168.0.4|                  14|
     192.168.0.5|                   5|

Create a bag with protocols as keys from a text file

To create a Bag of protocol data from the text file myproto.txt:

   1|      4|
   6|    138|
  17|    131|

use

 $ rwbagbuild --key-type=proto --bag-input=myproto.txt > myproto.bag
 $ rwbagcat myproto.bag
          1|                   4|
          6|                 138|
         17|                 131|

When the --key-type switch is specified, rwbagcat knows the keys should be printed as integers, and rwfileinfo(1) shows the type of the key:

 $ rwfileinfo --fields=bag myproto.bag
 myproto.bag:
   bag            key: protocol @ 4 octets; counter: custom @ 8 octets

Without the --key-type switch, rwbagbuild assumes the integers in myproto.txt represent IP addresses:

 $ rwbagbuild --bag-input=myproto.txt | rwbagcat
         0.0.0.1|                   4|
         0.0.0.6|                 138|
        0.0.0.17|                 131|

Although the --key-format switch on rwbagcat may be used to choose how the keys are displayed, it is generally better to use the --key-type switch when creating the bag.

$ rwbagbuild --bag-input=myproto.txt | rwbagcat --key-format=decimal 1| 4| 6| 138| 17| 131|

Create a bag and override the existing counter

To ignore the counts that exist in myproto.txt and set the counts for each protocol to 1, use the --default-count switch which overrides the existing value:

 $ rwbagbuild --key-type=protocol --bag-input=myproto.txt  \
        --default-count=1 --output-path=myproto1.bag
 $ rwbagcat myproto1.bag
          1|                   1|
          6|                   1|
         17|                   1|

Create a bag from multiple text files

To create a bag from multiple text files (X.txt, Y.txt, and Z.txt), use the UNIX cat(1) utility to concatenate the files and have rwbagbuild read the combined input. To avoid creating a temporary file, feed the output of cat as the standard input to rwbagbuild.

 $ cat X.txt Y.txt Z.txt                                \
   | rwbagbuild --bag-input=- --output-path=xyz.bag

For each key that appears in multiple input files, rwbagbuild sums the counters for the key.

Create a bag with IP addresses as keys from an IPset file

Given the IP set myset.set, create a bag where every entry in the bag has a count of 3:

 $ rwbagbuild --set-input=myset.set --default-count=3  \
        --out=mybag2.bag

Create a bag from multiple IPset files

Suppose we have three IPset files, A.set, B.set, and C.set:

 $ rwsetcat A.set
 10.0.0.1
 10.0.0.2
 $ rwsetcat B.set
 10.0.0.2
 10.0.0.3
 $ rwsetcat C.set
 10.0.0.1
 10.0.0.2
 10.0.0.4

We want to create a bag file from these IPset files where the count for each IP address is the number of files that IP appears in. rwbagbuild accepts a single file as an argument, so we cannot do the following:

 $ rwbagbuild --set-input=A.set --set-input=B.set ...   # WRONG!

(Even if we could repeat the --set-input switch, specifying it multiple times would be annoying if we had 300 files instead of only 3.)

Since IPset files are (mathematical) sets, joining them together first with rwsettool(1) and then running rwbagbuild causes each IP address to get a count of 1:

 $ rwsettool --union A.set B.set C.set   \
   | rwbagbuild --set-input=-            \
   | rwbagcat
        10.0.0.1|                   1|
        10.0.0.2|                   1|
        10.0.0.3|                   1|
        10.0.0.4|                   1|

When rwbagbuild is processing textual input, it sums the counters for keys that appear in the input multiple times. We can use rwsetcat(1) to convert each IPset file to text and feed that as single textual stream to rwbagbuild. Use the --cidr-blocks switch on rwsetcat to reduce the amount of input that rwbagbuild must process. This is probably the best approach to the problem:

 $ rwsetcat --cidr-block *.set | rwbagbuild --bag-input=- > total1.bag
 $ rwbagcat total1.bag
        10.0.0.1|                   2|
        10.0.0.2|                   3|
        10.0.0.3|                   1|
        10.0.0.4|                   1|

A less efficient solution is to convert each IPset to a bag and then use rwbagtool(1) to add the bags together:

 $ for i in *.set ; do
        rwbagbuild --set-input=$i --output-path=/tmp/$i.bag ;
   done
 $ rwbagtool --add /tmp/*.set.bag > total2.bag
 $ rm /tmp/*.set.bag

There is no need to create a bag file for each IPset; we can get by with only two bag files, the final bag file, total3.bag, and a temporary file, tmp.bag. We initialize total3.bag to an empty bag. As we loop over each IPset, rwbagbuild converts the IPset to a bag on its standard output, rwbagtool creates tmp.bag by adding its standard input to total3.bag, and we rename tmp.bag to total3.bag:

 $ rwbagbuild --bag-input=/dev/null --output-path=total3.bag
 $ for i in *.set ; do
        rwbagbuild --set-input=$i  \
        | rwbagtool --output-path=tmp.bag --add total3.bag stdin ;
        /bin/mv tmp.bag total3.bag ;
   done
 $ rwbagcat total3.bag
        10.0.0.1|                   2|
        10.0.0.2|                   3|
        10.0.0.3|                   1|
        10.0.0.4|                   1|

Create a bag where the key is the country code

As of SiLK 3.12.0, a Bag file may contain a country code as its key. In rwbagbuild, specify the --key-type as sip-country, dip-country, or any-country. That key-type works with either textual input or IPset input. The form of the textual input when mapping an IP address to a country code is identical to that when building an ordinary bag.

 $ rwbagbuild --bag-input=mybag.txt --delimiter=,       \
        --key-type=any-country --output-path=scc1.bag
 $ rwbagcat scc1.bag
 --|                 527|

 $ rwbagbuild --set-input=A.set --key-type=any-country  \
        --output-path=scc2.bag
 $ rwbagcat scc2.bag
 --|                   2|

Create a bag using a prefix map value as the key

rwbagbuild and rwbag(1) can use a prefix map file as the key in a Bag file as of SiLK 3.12.0. Use the --pmap-file switch to specify the prefix map file, and specify the --key-type using one of the types that end in -pmap.

For a prefix map that maps by IP addresses, use a key-type of sip-pmap, dip-pmap, or any-ip-pmap. The input may be an IPset or text. The form of the textual input is the same as for a normal bag file.

 $ rwbagbuild --set-input=A.set --key-type=sip-pmap     \
        --pmap-file=ip-map.pmap --output=test1.bag

 $ rwbagbuild --bag-input=mybag.txt --delimiter=,       \
        --key-type=sip-pmap --pmap-file=ip-map.pmap     \
        --output-path=test2.bag

The prefix map file is not stored as part of the Bag, so you must provide the name of the prefix map when running rwbagcat(1).

 $ rwbagcat --pmap-file=ip-map.pmap test2.bag
          internal|                 527|

For a prefix map file that maps by protocol-port pairs, the textual input must contain either three column (protocol, port, counter) or two columns (protocol and port) which uses the --default-counter.

 $ cat proto-port-count.txt
 6| 25|  800|
 6| 80| 5642|
 6| 22
 $ rwbagbuild --key-type=sport-pmap                 \
        --bag-input=proto-port-count.txt            \
        --pmap-file=proto-port-map.pmap             \
        --output-path=service.bag
 $ rwbagcat --pmap-file=port-map.pmap service.bag
   TCP/SSH|                   1|
  TCP/SMTP|                 800|
  TCP/HTTP|                5642|

Delimiter examples

A single value followed by an optional delimiter is treated as a key. The counter for those keys is set to 1. A delimiter may follow the count, and any text after that delimiter is ignored. When the counter is 0, the key is not inserted into the Bag.

 $ cat sport.txt
 0
 1|
 2|3
 4|5|
 6|7|8|
 9|10|||||
 11|0
 $ rwbagbuild --bag-input=sport.txt --key-type=sport \
   | rwbagcat
          0|                   1|
          1|                   1|
          2|                   3|
          4|                   5|
          6|                   7|
          9|                  10|

The --default-counter switch overrides the count.

 $ rwbagbuild --bag-input=sport.txt --key-type=sport --default-count=1 \
   | rwbagcat
          0|                   1|
          1|                   1|
          2|                   1|
          4|                   1|
          6|                   1|
          9|                   1|
         11|                   1|

In fact, the --default-counter switch causes rwbagbuild to ignore all text after the delimiter that follows the key.

 $ echo ’12|13 14’ | rwbagbuild --bag-input=- --output=/dev/null
 rwbagbuild: Error parsing line 1: Extra text after count
 rwbagbuild: Error creating bag from text bag

 $ echo ’12|13 14’ | rwbagbuild --bag-input=- --default-count=1 \
   | rwbagcat --key-format=decimal
         12|                   1|

ENVIRONMENT

SILK_COUNTRY_CODES

This environment variable allows the user to specify the country code mapping file that rwbagbuild uses when mapping an IP to a country for the sip-country, dip-country, or any-country keys. The value may be a complete path or a file relative to the SILK_PATH. See the FILES section for standard locations of this file.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.

SILK_PATH

This environment variable gives the root of the install tree. When searching for the country code mapping file, rwbagbuild may use this environment variable. See the FILES section for details.

FILES

$SILK_COUNTRY_CODES

$SILK_PATH/share/silk/country_codes.pmap

$SILK_PATH/share/country_codes.pmap

/usr/local/share/silk/country_codes.pmap

/usr/local/share/country_codes.pmap

Possible locations for the country code mapping file required by the sip-country, dip-country, and any-country key-types.

SEE ALSO

rwbag(1), rwbagcat(1), rwbagtool(1), rwfileinfo(1), rwpmapbuild(1), rwset(1), rwsetbuild(1), rwsetcat(1), rwsettool(1), ccfilter(3), silk(7), zlib(3), cat(1)

BUGS

rwbagbuild should verify the key’s value is within the allowed range for the specified --key-type.

rwbagbuild should accept non-numeric values for some fields, such as times and TCP flags.

The --default-count switch is poorly named.

rwbagcat

Output a binary Bag file as text

SYNOPSIS

  rwbagcat [ --network-structure[=STRUCTURE] | --bin-ips[=SCALE]
             | --sort-counters[=ORDER]]
        [--print-statistics[=OUTFILE]]
        [--minkey=VALUE] [--maxkey=VALUE] [--mask-set=PATH]
        [--mincounter=VALUE] [--maxcounter=VALUE] [--zero-counts]
        [{ --pmap-file=PATH | --pmap-file=MAPNAME:PATH }]
        [--key-format=FORMAT] [--integer-keys] [--zero-pad-ips]
        [--no-columns] [--column-separator=C]
        [--no-final-delimiter] [{--delimited | --delimited=C}]
        [--output-path=PATH] [--pager=PAGER_PROG]
        [--site-config-file=FILENAME]
        [BAGFILE [BAGFILE...]]

  rwbagcat --help

  rwbagcat --version

DESCRIPTION

rwbagcat reads a binary Bag as created by rwbag(1) or rwbagbuild(1), converts it to text, and writes it to the standard output, to the pager, or to the specified output file. It can also print various statistics and summary information about the Bag.

As of SiLK 3.12.0, rwbagcat uses information in the Bag file’s header to determine how to display the key column.

In addition, rwbagcat exits with an error when asked to use an IP format to display keys that are not IP addresses.

rwbagcat reads the BAGFILEs specified on the command line; if no BAGFILE arguments are given, rwbagcat attempts to read the Bag from the standard input. BAGFILE may be the keyword stdin or a hyphen (-) to allow rwbagcat to print data from both files and piped input. If any input does not contain a Bag, rwbagcat prints an error to the standard error and exits abnormally.

When multiple BAGFILEs are specified on the command line, each is handled individually. To process the files as a single Bag, use rwbagtool(1) to combine the bags and pipe the output of rwbagtool into rwbagcat.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--network-structure

--network-structure=STRUCTURE

For each numeric value in STRUCTURE, group the IPs in the Bag into a netblock of that size and print the number of hosts, the sum of the counters, and, optionally, print the number of smaller, occupied netblocks that each larger netblock contains. When STRUCTURE begins with v6:, the IPs in the Bag are treated as IPv6 addresses, and any IPv4 addresses are mapped into the ::ffff:0:0/96 netblock. Otherwise, the IPs are treated as IPv4 addresses, and any IPv6 address outside the ::ffff:0:0/96 netblock is ignored. Aside from the initial v6: (or v4:, for consistency), STRUCTURE has one of following forms:

  1. NETBLOCK_LIST/SUMMARY_LIST. Group IPs into the sizes specified in either NETBLOCK_LIST or SUMMARY_LIST. rwbagcat prints a row for each occupied netblock specified in NETBLOCK_LIST, where the row lists the base IP of the netblock, the sum of the counters for that netblock, the number of hosts, and the number of smaller, occupied netblocks having a size that appears in either NETBLOCK_LIST or SUMMARY_LIST. (The values in SUMMARY_LIST are only summarized; they are not printed.)

  2. NETBLOCK_LIST/. Similar to the first form, except all occupied netblocks are printed, and there are no netblocks that are only summarized.

  3. NETBLOCK_LISTS. When the character S appears anywhere in the NETBLOCK_LIST, rwbagcat provides a default value for the SUMMARY_LIST. That default is 8,16,24,27 for IPv4, and 48,64 for IPv6.

  4. NETBLOCK_LIST. When neither S nor / appear in STRUCTURE, the output does not include the number of smaller, occupied netblocks.

  5. Empty. When STRUCTURE is empty or only contains v6: or v4:, the NETBLOCK_LIST prints a single row for the total network (the /0 netblock) giving the number of hosts, the sum of the counters, and the number of smaller, occupied netblocks using the same default list specified in form 3.

NETBLOCK_LIST and SUMMARY_LIST contain a comma separated list of numbers between 0 (the total network) and the size for an individual host (32 for IPv4 or 128 for IPv6). The characters T and H may be used as aliases for 0 and the host netblock, respectively. In addition, when parsing the lists as IPv4 netblocks, the characters A, B, C, and X are supported as aliases for 8, 16, 24, and 27, respectively. A comma is not required between adjacent letters. The --network-structure switch disables printing of the IPs in the Bag file; specify the H argument to the switch to print each individual IP address and its counter.

The --network-structure switch may not be combined with the --bin-ips or --sort-counters switches. As of SiLK 3.12.0, rwbagcat exits with an error if the --network-structure switch is used on a Bag file whose key-type is neither custom nor an IP address type.

--bin-ips

--bin-ips=SCALE

Invert the bag and count the total number of unique keys for a given value of the volume bin. For example, turn a Bag {sip:flow} into {flow:count(sip)}. SCALE is a string containing the value linear, binary, or decimal.

The --bin-ips switch may not be combined with the --network-structure or --sort-counters switches. See also the --invert switch on rwbagtool(1) which inverts a bag using a linear scale and creates a new binary bag file.

--sort-counters

--sort-counters=ORDER

Sort the output so the counters are presented in either decreasing or increasing order. Typically the output is sorted by the keys. If the ORDER argument is not given to the switch, the counters are printed in decreasing order. Valid values for ORDER are

decreasing

Print the maximum counter first. This is the default.

increasing

Print the minimum counter first.

When two counters have the same value, the smaller key is displayed first. The --sort-counters switch may not be combined with the --network-structure or --bin-ips switches. Since SiLK 3.12.2.

--print-statistics

--print-statistics=OUTFILE

Print a breakdown of the network hosts seen, and print general statistics about the keys and counters. When --print-statistics is specified, no other output is produced unless one of --sort-counters, --network-structure, or --bin-ips is also specified. When the OUTFILE argument is not given, the statistics are written to the standard output or to the pager if output is to a terminal. OUTFILE is a filename, named pipe, the keyword stderr to write to the standard error, or the keyword stdout or - to write to the standard output. If OUTFILE names an existing file, rwbagcat exits with an error unless the SILK_CLOBBER environment variable is set, in which case OUTFILE is overwritten. The output statistics produced by this switch are:

--minkey=VALUE

Output records whose key value is at least VALUE. VALUE may be an IP address or an integer in the range 0 to 4294967295 inclusive. The default is to print all records with a non-zero counter.

--maxkey=VALUE

Output records whose key value is not more than VALUE. VALUE may be an IP address or an integer in the range 0 to 4294967295 inclusive. The default is to print all records with a non-zero counter.

--mask-set=PATH

Output records whose key appears in the binary IPset read from the file PATH. (To build an IPset, use rwset(1) or rwsetbuild(1).) When used with --minkey and/or --maxkey, output records whose key is in the IPset and is also within when the specified range. As of SiLK 3.12.0, rwbagcat exits with an error if the --mask-set switch is used on a Bag file whose key-type is neither custom nor an IP address type.

--mincounter=VALUE

Output records whose counter value is at least VALUE. VALUE is an integer in the range 1 to 18446744073709551615. The default is to print all records with a non-zero counter; use --zero-counts to show records whose counter is 0.

--maxcounter=VALUE

Output records whose counter value is not more than VALUE. VALUE is an integer in the range 1 to 18446744073709551615, with the default being the maximum counter value.

--zero-counts

Print keys whose counter is zero. Normally, keys with a counter of zero are suppressed since all keys have a default counter of zero. In order to use this flag, either --mask-set or both --minkey and --maxkey must be specified. When this switch is specified, any counter limit explicitly set by the --maxcounter switch is also applied.

--pmap-file=PATH

--pmap-file=MAPNAME:PATH

Use the prefix map file located at PATH to map the key to a string when the type of the Bag’s key is one of sip-pmap, dip-pmap, any-ip-pmap, sport-pmap, dport-pmap, or any-port-pmap. This switch is required for Bag files whose key was derived from a prefix map file. The type of the prefix map file must match the key’s type, but a different prefix map file may be used. Specify PATH as - or stdin to read from the standard input. A map-name may be included in the argument to the switch, but rwbagcat currently does not use the map-name. To create a prefix map file, use rwpmapbuild(1). Since SiLK 3.12.0.

--key-format=FORMAT

Specify the format to use when printing a key, where FORMAT is a comma-separated list of the arguments described below. When this switch is not specified, rwbagcat uses the key’s type to determine how to format the key, and a key whose type is unknown or custom is assumed to be an IP address. rwbagcat exits with an error if the specified format is incompatible with the key’s type (for example, attempting to format a timestamp as an IP address).

decimal

Print keys as integers in decimal format. For example, print 192.0.2.1 and 2001:db8::1 as 3221225985 and 42540766411282592856903984951653826561, respectively. May be combined with zero-padded and either map-v4 or unmap-v6. rwbagcat exits with an error when this format is used on a Bag file whose key-type is a timestamp.

hexadecimal

Print keys as integers in hexadecimal format. For example, print 192.0.2.1 and 2001:db8::1 as c00000201 and 20010db8000000000000000000000001, respectively. May be combined with zero-padded and either map-v4 or unmap-v6. rwbagcat exits with an error when this format is used on a Bag file whose key-type is a timestamp. Note: This setting does not apply to CIDR prefix values which are printed as decimal.

canonical

Print keys as IP addresses in the canonical format. If the key is an IPv4 address, use dotted decimal (192.0.2.1). If the key is an IPv6 address, use colon-separated hexadecimal (2001:db8::1) or a mixed IPv4-IPv6 representation for IPv4-mapped IPv6 addresses (the ::ffff:0:0/96 netblock, e.g., ::ffff:192.0.2.1) and IPv4-compatible IPv6 addresses (the ::/96 netblock other than ::/127, e.g., ::192.0.2.1). May be combined with zero-padded and either map-v4 or unmap-v6. As of SiLK 3.12.0, rwbagcat exits with an error when this format is used on a Bag file whose key-type is neither custom nor an IP address type.

no-mixed

Print keys as IP addresses in the canonical format (192.0.2.1 or 2001:db8::1) but do not used the mixed IPv4-IPv6 representations. For example, use ::ffff:c000:201 instead of ::ffff:192.0.2.1. May be combined with zero-padded and either map-v4 or unmap-v6. rwbagcat exits with an error when this format is used on a Bag file whose key-type is neither custom nor an IP address type. Since SiLK 3.17.0.

map-v4

When the Bag’s key is an IPv4 address, change all IPv4 addresses to IPv4-mapped IPv6 addresses (addresses in the ::ffff:0:0/96 netblock) prior to formatting. May be combined with one of the above settings. rwbagcat exits with an error when this format is used on a Bag file whose key-type is neither custom nor an IP address type. Since SiLK 3.17.0.

unmap-v6

When the Bag’s key is an IPv6 address, change any IPv4-mapped IPv6 addresses (addresses in the ::ffff:0:0/96 netblock) to IPv4 addresses prior to formatting. May be combined with any one of the above settings except map-v4. rwbagcat exits with an error when this format is used on a Bag file whose key-type is neither custom nor an IP address type. Since SiLK 3.17.0.

zero-padded

Make all formatted key strings contain the same number of characters by padding numbers with leading zeros. For example, print 192.0.2.1 and 2001:db8::1 as 192.000.002.001 and 2001:0db8:0000:0000:0000:0000:0000:0001, respectively. For IPv6 addresses, this setting implies no-mixed, so that ::ffff:192.0.2.1 is printed as 0000:0000:0000:0000:0000:ffff:c000:0201. As of SiLK 3.17.0, may be combined with any of the above, including decimal and hexadecimal. As of SiLK 3.18.0, the values of CIDR prefix are also zero-padded. rwbagcat exits with an error when this format is used on a Bag file whose key-type is a timestamp.

force-ipv6

Print keys using the format map-v4,no-mixed. May be combined with zero-padded. As of SiLK 3.12.0, rwbagcat exits with an error when this format is used on a Bag file whose key-type is neither custom nor an IP address type.

timestamp

Print keys as time in standard SiLK format: yyyy/mm/ddThh:mm:ss. May be combined with utc or localtime. May only be used on keys whose type is custom or a time value. Since SiLK 3.12.0.

iso-time

Print keys as time in the ISO time format yyyy-mm-dd hh:mm:ss. May be combined with utc or localtime. May only be used on keys whose type is custom or a time value. Since SiLK 3.12.0.

m/d/y

Print keys as time in the format mm/dd/yyyy hh:mm:ss. May be combined with utc or localtime. May only be used on keys whose type is custom or a time value. Since SiLK 3.12.0.

utc

Print the keys as time in UTC. If no other time-related key-format is provided, formats the time using the timestamp format. May only be used on keys whose type is custom or a time value. Since SiLK 3.12.0.

localtime

Print as the keys as time and get the timezone from either the TZ environment variable or local machine. If no other time-related key-format is provided, formats the time using the timestamp format. May only be used on keys whose type is custom or a time value. Since SiLK 3.12.0.

epoch

Print keys as seconds since UNIX epoch. May only be used on keys whose type is custom or a time value. Since SiLK 3.12.0.

--integer-keys

This switch is equivalent to --key-format=decimal, it is deprecated as of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.

--zero-pad-ips

This switch is equivalent to --key-format=zero-padded, it is deprecated as of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.

--no-columns

Disable fixed-width columnar output.

--column-separator=C

Use specified character between columns and after the final column. When this switch is not specified, the default of ’|’ is used.

--no-final-delimiter

Do not print the column separator after the final column. Normally a delimiter is printed. When the network summary is requested (--network-structure=S), the separator is always printed before the summary column and never after that column.

--delimited

--delimited=C

Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default ’|’.

--output-path=PATH

Write the textual output of the --network-structure, --bin-ips, or --sort-counters switch to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output (and bypass the paging program). If PATH names an existing file, rwbagcat exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this option is not given, the output is either sent to the pager or written to the standard output.

--pager=PAGER_PROG

When output is to a terminal, invoke the program PAGER_PROG to view the output one screen full at a time. This switch overrides the SILK_PAGER environment variable, which in turn overrides the PAGER variable. If the --output-path switch is given or if the value of the pager is determined to be the empty string, no paging is performed and all output is written to the terminal.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwbagcat searches for the site configuration file in the locations specified in the FILES section. Since SiLK 3.15.0.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line.

Printing a bag

To print the contents of the bag file mybag.bag:

 $ rwbagcat mybag.bag
      172.23.1.1|              5|
      172.23.1.2|            231|
      172.23.1.3|              9|
      172.23.1.4|             19|
   192.168.0.100|              1|
   192.168.0.101|              1|
   192.168.0.160|             15|
  192.168.20.161|              1|
  192.168.20.162|              5|
  192.168.20.163|              5|

Displaying number of hosts by network

To print the bag with a full network breakdown:

 $ rwbagcat --network-structure=TABCHX mybag.bag
           172.23.1.1      |              5|
           172.23.1.2      |            231|
           172.23.1.3      |              9|
           172.23.1.4      |             19|
         172.23.1.0/27     |            264|
       172.23.1.0/24       |            264|
     172.23.0.0/16         |            264|
   172.0.0.0/8             |            264|
           192.168.0.100   |              1|
           192.168.0.101   |              1|
         192.168.0.96/27   |              2|
           192.168.0.160   |             15|
         192.168.0.160/27  |             15|
       192.168.0.0/24      |             17|
           192.168.20.161  |              1|
           192.168.20.162  |              5|
           192.168.20.163  |              5|
         192.168.20.160/27 |             11|
       192.168.20.0/24     |             11|
     192.168.0.0/16        |             28|
   192.0.0.0/8             |             28|
 TOTAL                     |            292|

In the above, lines that include a CIDR prefix display the sum of the preceding hosts. For example, there are 264 hosts in the 172.23.1.0/27 net-block.

To show an abbreviated network structure by class A and C only, including summary information:

 $ rwbagcat --network-structure=ACS mybag.bag
     172.23.1.0/24     |            264| 4 hosts in 1 /27
 172.0.0.0/8           |            264| 4 hosts in 1 /16, 1 /24, and 1 /27
     192.168.0.0/24    |             17| 3 hosts in 2 /27s
     192.168.20.0/24   |             11| 3 hosts in 1 /27
 192.0.0.0/8           |             28| 6 hosts in 1 /16, 2 /24s, and 3 /27s

Overriding the key type

Suppose a key-type of a bag file is duration:

 $ rwfileinfo --field=bag Bag2.bag
 Bag2.bag:
   bag          key: duration @ 4 octets; counter: custom @ 8 octets

rwbagcat complains when the --key-format switch lists a format that it thinks is ”nonsensical” for that type of key.

 $ rwbagcat --key-format=utc Bag2.bag
 rwbagcat: Invalid key-format ’utc’:
        Nonsensical for Bag containing duration keys

 $ rwbagcat --key-format=canonical Bag2.bag
 rwbagcat: Invalid key-format ’canonical’:
        Nonsensical for Bag containing duration keys

To use the --key-format one time and leave the key-type in the Bag file unchanged, you may merge the bag with an empty bag file: Use rwbagbuild(1) to create an empty bag that uses the custom key type, add the empty bag to Bag2.bag using rwbagtool(1), then display the result:

 $ rwbagbuild --bag-input=/dev/null   \
   | rwbagtool --add Bag2.bag stdin   \
   | rwbagcat --key-format=utc
 1970/01/01T00:00:01|                   1|
 1970/01/01T00:00:04|                   2|
 1970/01/01T00:00:07|                  32|
 1970/01/01T00:00:08|                   2|

 $ rwbagbuild --bag-input=/dev/null   \
   | rwbagtool --add Bag2.bag -       \
   | rwbagcat --key-format=canonical
         0.0.0.1|                   1|
         0.0.0.4|                   2|
         0.0.0.7|                  32|
         0.0.0.8|                   2|

To rewrite the bag file with a different key type, print the bag file as text and use rwbagbuild to build a new bag file:

 $ rwbagcat Bag2.bag    \
   | rwbagbuild --bag-input=- --key-type=sipv4

Inverting a bag

Inverting a bag means counting the number of times each counter appears in the bag.

To bin the number of IP addresses that had each flow count:

 $ rwbagcat --bin-ips mybag.bag
               1|              3|
               5|              3|
               9|              1|
              15|              1|
              19|              1|
             231|              1|

The output shows that the bag contains 3 source hosts that had a single flow, 3 hosts that had 5 flows, and four hosts that each had a unique flow count (9, 15, 19, and 231).

For a log2 breakdown of the counts:

 $ rwbagcat --bin-ips=binary mybag.bag
    2^0 to 2^1-1|              3|
    2^2 to 2^3-1|              3|
    2^3 to 2^4-1|              2|
    2^4 to 2^5-1|              1|
    2^7 to 2^8-1|              1|

Sorting the bag by counter value

rwbagcat normally presents the data in order of increasing key value. To sort based on the counter value, specify the --sort-counter switch. When sorting by the counter value, the default order is from maximum counter to minimum counter.

 $ rwbagcat --sort-counter mybag.bag
      172.23.1.2|                 231|
      172.23.1.4|                  19|
   192.168.0.160|                  15|
      172.23.1.3|                   9|
      172.23.1.1|                   5|
  192.168.20.162|                   5|
  192.168.20.163|                   5|
   192.168.0.100|                   1|
   192.168.0.101|                   1|
  192.168.20.161|                   1|

To change the sort order, specify the increasing argument to the --sort-counter switch:

 $ rwbagcat --sort-counter=increasing mybag.bag
   192.168.0.100|                   1|
   192.168.0.101|                   1|
  192.168.20.161|                   1|
      172.23.1.1|                   5|
  192.168.20.162|                   5|
  192.168.20.163|                   5|
      172.23.1.3|                   9|
   192.168.0.160|                  15|
      172.23.1.4|                  19|
      172.23.1.2|                 231|

For keys have the same counter value, the order of the keys is consistent (always from low to high) regardless how the counters are sorted. The following output is limited to those keys whose value is 5. The output is first shown without the --sort-counter switch, then with the data sorted by increasing and decreasing counter value.

 $ rwbagcat --delim=, mybag.bag | grep ,5
 172.23.1.1,5
 192.168.20.162,5
 192.168.20.163,5

 $ rwbagcat --delim=, --sort-counter=increasing mybag.bag | grep ,5
 172.23.1.1,5
 192.168.20.162,5
 192.168.20.163,5

 $ rwbagcat --delim=, --sort-counter=decreasing mybag.bag | grep ,5
 172.23.1.1,5
 192.168.20.162,5
 192.168.20.163,5

Displaying bags that use prefix map values as the key

rwbag(1) and rwbagbuild(1) can use a prefix map file as the key in a bag file as of SiLK 3.12.0. When attempting to display these Bag files, you must specify the --pmap-file switch on the rwbagcat command line for it to map each prefix map value to its label. If the --pmap-file is not given, rwbagcat displays an error.

 $ rwbagcat service.bag
 rwbagcat: The --pmap-file switch is required for \
         Bags containing sport-pmap keys

In addition, the type of the prefix map file must match the key-type in the bag file: a prefix map type of IPv4-address or IPv6-address when the key was mapped from an IP address, and a prefix map type of proto-port when the key was mapped from a protocol-port pair. The type of key in a bag may be determined by rwfileinfo(1).

 $ rwfileinfo --fields=bag service.bag
 service.bag:
   bag          key: sport-pmap @ 4 octets; counter: custom @ 8 octets

 $ rwbagcat --pmap-file=ip-map.pmap service.bag
 rwbagcat: Cannot use IPv4-address prefix map for \
        Bag containing sport-pmap keys

 $ rwbagcat --pmap-file=port-map.pmap service.bag
   TCP/SSH|                   1|
  TCP/SMTP|                 800|
  TCP/HTTP|                5642|

The only check rwbagcat makes is whether the prefix map file is the correct type. A different prefix map file may be used. If a value in the bag file does not have an index in the prefix map file, the numeric index of the label is displayed as shown in the following example which creates a prefix map with a single label.

 $ echo ’label 1 none’                                      \
   | rwpmapbuild --mode=proto-port --input-path=-           \
        --output-path=tmp.pmap
 $ rwbagcat --pmap-file=tmp.pmap service.bag
   7|                   1|
   8|                 800|
   9|                5642|

Displaying statistics
 $ rwbagcat --print-statistics mybag.bag

 Statistics
     number of keys:  10
    sum of counters:  292
        minimum key:  172.23.1.1
        maximum key:  192.168.20.163
    minimum counter:  1
    maximum counter:  231
               mean:  29.2
           variance:  5064
 standard deviation:  71.16
               skew:  2.246
           kurtosis:  8.1
    nodes allocated:  0 (0 bytes)
    counter density:  inf%

ENVIRONMENT

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_PAGER

When set to a non-empty string, rwbagcat automatically invokes this program to display its output a screen at a time. If set to an empty string, rwbagcat does not automatically page its output.

PAGER

When set and SILK_PAGER is not set, rwbagcat automatically invokes this program to display its output a screen at a time.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwbagcat may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwbagcat may use this environment variable. See the FILES section for details.

TZ

When the argument to the --key-format switch includes localtime or when a SiLK installation is built to use the local timezone, the value of the TZ environment variable determines the timezone in which rwbagcat displays timestamps. (If both of those are false, the TZ environment variable is ignored.) If the TZ environment variable is not set, the machine’s default timezone is used. Setting TZ to the empty string or 0 causes timestamps to be displayed in UTC. For system information on the TZ variable, see tzset(3) or environ(7). (To determine if SiLK was built with support for the local timezone, check the Timezone support value in the output of rwbagcat --version.)

FILES

${SILK_CONFIG_FILE}

${SILK_DATA_ROOTDIR}/silk.conf

/data/silk.conf

${SILK_PATH}/share/silk/silk.conf

${SILK_PATH}/share/silk.conf

/usr/local/share/silk/silk.conf

/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwbag(1), rwbagbuild(1), rwbagtool(1), rwpmapbuild(1), rwfileinfo(1), rwset(1), rwsetbuild(1), silk(7), tzset(3), environ(7)

rwbagtool

Perform high-level operations on binary Bag files

SYNOPSIS

  rwbagtool { --add | --subtract | --minimize | --maximize
              | --divide | --scalar-multiply=VALUE
              | --compare={lt | le | eq | ge | gt} }
        [--intersect=SETFILE | --complement-intersect=SETFILE]
        [--mincounter=VALUE] [--maxcounter=VALUE]
        [--minkey=VALUE] [--maxkey=VALUE]
        [--invert] [--coverset] [--ipset-record-version=VERSION]
        [--output-path=PATH [--modify-inplace [--backup-path=BACKUP]]]
        [--note-strip] [--note-add=TEXT] [--note-file-add=FILE]
        [--compression-method=COMP_METHOD]
        [BAGFILE[ BAGFILE...]]

  rwbagtool --help

  rwbagtool --version

DESCRIPTION

rwbagtool performs various operations on binary Bag files (key-counter associations) and creates a new Bag file or an IPset file. rwbagtool can add Bags together, subtract a subset of data from a Bag, divide a Bag by another, compare the counters of two Bag files, perform key intersection of a Bag with an IPset, extract the keys of a Bag as an IPset, or filter Bag entries based on their key or counter values.

rwbagtool reads Bags from the files and named pipes specified on the command line. If no file names are given on the command line, rwbagtool attempts to read a Bag from the standard input. The names stdin or - may be used to force rwbagtool to read from the standard input. The resulting Bag or IPset is written to the location specified by the --output-path switch or to the standard output if that switch is not provided. If a BAGFILE does not contain a Bag or an attempt is made to read binary input or write binary output to the terminal,, rwbagtool prints an error to the standard error and exits abnormally.

In SiLK 3.21.0, rwbagtool added the --modify-inplace switch which correctly handles the case when an input file is also used as the output file. That switch causes rwbagtool to write the output to a temporary file first and then replace the original output file. The --backup-path switch may be used in conjunction with --modify-inplace to set the pathname where the original output file is copied.

A Bag is a set where each key is associated with a counter. rwbag(1) and rwbagbuild(1) are the primary tools used to create a Bag file. rwbagcat(1) prints a binary Bag file as text.

SiLK 3.15.0 introduced Aggregate Bags that are capable of storing multiple keys and counters. See rwaggbag(1), rwaggbagbuild(1), rwaggbagcat(1), and rwaggbagtool(1) for more information.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

Operation switches

The first set of options are mutually exclusive; only one may be specified. If none are specified, the counters in the Bag files are summed.

--add

Sum the counters for each key for all Bag files given on the command line. At least one Bag file must be specified, and any number of additional Bag files may be given. If a key is not present in an input file, a counter of zero is used. The result contains the union of the keys from the input Bag files. When no operation switch is specified on the command line, the add operation is the default. If addition causes a counter to exceed the maximum value, rwbagtool exits with an error.

--subtract

Subtract from the first Bag file all subsequent Bag files. At least one Bag file must be specified, and any number of additional Bag files may be given. If a key does not appear in the first Bag file, rwbagtool assumes it has a value of 0. If subtracting a key’s counters results in a non-positive number, the key does appear in the resulting Bag file. The result contains a subset of the keys in the first Bag file.

--minimize

Cause the output to contain the minimum counter seen for each key. Keys that do not appear in all input Bags do not appear in the output. At least one Bag file must be specified, and any number of additional Bag files may be given.

--maximize

Cause the output to contain the maximum counter seen for each key. The output contains each key that appears in any input Bag. At least one Bag file must be specified, and any number of additional Bag files may be given.

--divide

Divide the first Bag file by the second Bag file. It is an error if only one Bag file or more than two Bag files are given. Every key in the first Bag file must appear in the second file; the second Bag may have keys that do not appear in the first, and those keys do not appear in the output. Since Bags do not support floating point numbers, the result of the division is rounded to the nearest integer (values ending in .5 are rounded up). If the result of the division is less than 0.5, the key does not appear in the output.

--scalar-multiply=VALUE

Multiply each counter in the Bag file by the scalar VALUE, where VALUE is an integer in the range 1 to 18446744073709551614. This switch requires a single Bag as input. On overflow, the lower 64-bits of the result are used as the counter’s value.

--compare=OPERATION

Compare the key/counter pairs in exactly two Bag files. It is an error if only one Bag file or more than two Bag files are specified. The keys in the output Bag are only those for which the comparison denoted by OPERATION is true when comparing the key’s counter in the first Bag with the key’s counter in the second Bag. The counters for all keys in the output have the value 1. Any key that does not appear in both input Bag files does not appear in the result. The possible OPERATION values are the strings:

lt

GetCounter(Bag1, key) < GetCounter(Bag2, key)

le

GetCounter(Bag1, key) <= GetCounter(Bag2, key)

eq

GetCounter(Bag1, key) == GetCounter(Bag2, key)

ge

GetCounter(Bag1, key) >= GetCounter(Bag2, key)

gt

GetCounter(Bag1, key) > GetCounter(Bag2, key)

Masking/Limiting switches

The result of the above operation is an intermediate Bag file. The following switches are applied next to remove entries from the intermediate Bag:

--intersect=SETFILE

Mask the keys in the intermediate Bag using the set in SETFILE. SETFILE is the name of a file or a named pipe containing an IPset, or the name stdin or - to have rwbagtool read the IPset from the standard input. If SETFILE does not contain an IPset, rwbagtool prints an error to stderr and exits abnormally. Only key/counter pairs where the key matches an entry in SETFILE are written to the output. (IPsets are typically created by rwset(1) or rwsetbuild(1).)

--complement-intersect=SETFILE

As --intersect, but only writes key/counter pairs for keys which do not match an entry in SETFILE.

--mincounter=VALUE

Cause the output to contain only those entries whose counter value is VALUE or higher. The allowable range is 1 to the maximum counter value (18446744073709551614); the default is 1.

--maxcounter=VALUE

Cause the output to contain only those entries whose counter value is VALUE or lower. The allowable range is 1 to the maximum counter value; the default is the maximum counter value.

--minkey=VALUE

Cause the output to contain only those entries whose key value is VALUE or higher. Default is 0 (or 0.0.0.0). Accepts input as an integer or as an IP address in dotted decimal notation.

--maxkey=VALUE

Cause the output to contain only those entries whose key value is VALUE or higher. Default is 4294967295 (or 255.255.255.255). Accepts input as an integer or as an IP address in dotted decimal notation.

Output switches

The following switches control the output.

--invert

Generate a new Bag whose keys are the counters in the intermediate Bag and whose counter is the number of times the counter was seen. For example, this turns the Bag {sip:flow} into the Bag {flow:count(sip)}. Any counter in the intermediate Bag that is larger than the maximum possible key is attributed to the counter for the maximum key; to prevent this, specify --maxcounter=4294967295 which removes all key-counter pairs whose counters do not fit into a key. (The --bin-ips switch on rwbagcat(1) allows one to invert a Bag file as it is being printed.) If inverting the Bag causes a counter to exceed the maximum value, rwbagtool exits with an error.

--coverset

Instead of creating a Bag file as the output, write an IPset which contains the keys contained in the intermediate Bag.

--ipset-record-version=VERSION

Specify the format of the IPset records that are written to the output when the --coverset switch is used. VERSION may be 2, 3, 4, 5 or the special value 0. When the switch is not provided, the SILK_IPSET_RECORD_VERSION environment variable is checked for a version. The default version is 0. Since SiLK 3.11.0.

 0 

Use the default version for an IPv4 IPset and an IPv6 IPset. Use the --help switch to see the versions used for your SiLK installation.

 2 

Create a file that may hold only IPv4 addresses and is readable by all versions of SiLK.

 3 

Create a file that may hold IPv4 or IPv6 addresses and is readable by SiLK 3.0 and later.

 4 

Create a file that may hold IPv4 or IPv6 addresses and is readable by SiLK 3.7 and later. These files are more compact that version 3 and often more compact than version 2.

 5 

Create a file that may hold only IPv6 addresses and is readable by SiLK 3.14 and later. When this version is specified, IPsets containing only IPv4 addresses are written in version 4. These files are usually more compact that version 4.

--output-path=PATH

Write the resulting Bag or IPset to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwbagtool exits with an error unless the --modify-inplace switch is given or the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If --output-path is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwbagtool to exit with an error.

--modify-inplace

Allow rwbagtool to overwrite an existing file and properly account for the output file (PATH) also being an input file. When this switch is given, rwbagtool writes the output to a temporary location first, then overwrites PATH. rwbagtool attempts to copy the permission, owner, and group from the original file to the new file. The switch is ignored when PATH does not exist or the output is the standard output or standard error. rwbagtool exits with an error when this switch is given and PATH is not a regular file. If rwbagtool encounters an error or is interrupted prior to closing the temporary file, the temporary file is removed. See also --backup-path. Since SiLK 3.21.0.

--backup-path=BACKUP

Move the file named by --output-path (PATH) to the path BACKUP immediately prior to moving the temporary file created by --modify-inplace over PATH. If BACKUP names a directory, the file is moved into that directory. This switch will overwrite an existing file. If PATH and BACKUP point to the same location, the output is written to PATH and no backup is created. If BACKUP cannot be created, the output is left in the temporary file and rwbagtool exits with a message and an error. rwbagtool exits with an error if this switch is given without --modify-inplace. Since SiLK 3.21.0.

--note-strip

Do not copy the notes (annotations) from the input files to the output file. Normally notes from the input files are copied to the output.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

The examples assume the following contents for the files:

 Bag1.bag    Bag2.bag    Bag3.bag    Bag4.bag    Mask.set
  3|  10|     1|   1|     2|   8|     1|   1|          2
  4|   7|     4|   2|     4|  10|     4|   3|          4
  6|  14|     7|  32|     6|  14|     6|   4|          6
  7|  23|     8|   2|     7|  12|     7|   4|          8
  8|   2|                 9|   8|     8|   6|

The examples use rwbagcat(1) to print the contents of the Bag files.

Adding Bag Files

Adding Bag files produces a Bag whose keys are the set union of the keys in the input Bags. The counter for each key is the sum of the key’s counters in each input Bag.

 $ rwbagtool --add Bag1.bag Bag2.bag > Bag-sum.bag
 $ rwbagcat --key-format=decimal Bag-sum.bag
  1|   1|
  3|  10|
  4|   9|
  6|  14|
  7|  55|
  8|   4|

 $ rwbagtool --add Bag1.bag Bag2.bag Bag3.bag > Bag-sum2.bag
 $ rwbagcat --key-format=decimal Bag-sum2.bag
  1|   1|
  2|   8|
  3|  10|
  4|  19|
  6|  28|
  7|  67|
  8|   4|
  9|   8|

Subtracting Bag Files

The --subtract switch subtracts from the key/counter pairs in the first Bag file the key/counter pairs in all other Bag file arguments. Keys that are not present in the first argument are ignored. If subtraction results in a counter value of zero or less, the key is removed from the result.

 $ rwbagtool --subtract Bag1.bag Bag2.bag > Bag-diff.bag
 $ rwbagcat --key-format=decimal Bag-diff.bag
  3|  10|
  4|   5|
  6|  14|

 $ rwbagtool --subtract Bag2.bag Bag1.bag > Bag-diff2.bag
 $ rwbagcat --key-format=decimal Bag-diff2.bag
  1|   1|
  7|   9|

Getting the Minimum Value

The output produced by the --minimize switch contains only the keys that appear in all of input Bags. For each key, the counter is the minimum value for that key in any input Bag.

 $ rwbagtool --minimize Bag1.bag Bag2.bag Bag3.bag > Bag-min.bag
 $ rwbagcat --key-format=decimal Bag-min.bag
  4|   2|
  7|  12|

Getting the Maximum Value

The keys of the Bag file produced by --maximize are the same as the keys produced by --add; that is, the union of all keys in the input files. For each key, its counter is the maximum value seen for that key in any single input Bag file.

 $ rwbagtool --maximize Bag1.bag Bag2.bag Bag3.bag > Bag-max.bag
 $ rwbagcat --key-format=decimal Bag-max.bag
  1|   1|
  2|   8|
  3|  10|
  4|  10|
  6|  14|
  7|  32|
  8|   2|
  9|   8|

Dividing Bag Files

The --divide switch requires exactly two Bag files as input. The keys in the first Bag argument must be either the same as or a subset of those in the second argument. The counter for each key in the first Bag file is divided by that key’s counter in the second file. If the result of the division is less than 0.5, the key is not included in the output.

 $ rwbagtool --divide Bag2.bag Bag4.bag > Bag-div1.bag
 $ rwbagcat --key-format=decimal Bag-div1.bag
   1|   1|
   4|   1|
   7|   8|

When the order of the Bag file arguments is reversed an error is reported.

 $ rwbagtool --divide Bag4.bag Bag2.bag > Bag-div2.bag
 rwbagtool: Error dividing bags; key 6 not in divisor bag

To work around this issue, use the --coverset switch to create a copy of Bag4.bag that contains only the keys in Bag2.bag.

 $ rwbagtool --coverset Bag2.bag > Bag2-keys.set
 $ rwbagtool --intersect=Bag2-keys.set  Bag4.bag > Bag4-small.bag
 $ rwbagtool --divide Bag4-small.bag Bag2.bag > Bag-div2.bag
 $ rwbagcat --key-format=decimal Bag-div2.bag
   1|   1|
   4|   2|
   8|   3|

The following command is the same as the above except the IPset and Bag files are piped between the tools instead of being written to disk:

 $ rwbagtool --coverset Bag2.bag                \
   | rwbagtool --intersect=-  Bag4.bag          \
   | rwbagtool --divide -  Bag2.bag             \
   | rwbagcat --key-format=decimal
   1|   1|
   4|   2|
   8|   3|

Scalar Multiplication

The --scalar-multiply switch multiplies each counter in the input Bag by the specified value. Exactly one Bag file argument is required.

 $ rwbagtool --scalar-multiply=7 Bag1.bag > Bag-multiply.bag
 $ rwbagcat --key-format=decimal Bag-multiply.bag
  3|  70|
  4|  49|
  6|  98|
  7| 161|
  8|  14|

Use two rwbagtool commands if multiple operations are desired.

 $ rwbagtool --add Bag1.bag Bag2.bag   \
   | rwbagtool --scalar-multiply=3 --output-path=Bag12-multi.bag
 $ rwbagcat --key-format=decimal Bag12-multi.bag
  1|   3|
  3|  30|
  4|  27|
  6|  42|
  7| 165|
  8|  12|

Comparing Bag Files

The --compare switch takes an argument that specifies how to compare the counters in two Bag files, and it requires exactly two Bag files as input. For each key that appears in both Bag files, the counter value in the first file is compared to counter value in the second file. If the comparison is true, the key appears in the resulting Bag file with a counter of 1. If the comparison is false, the key is not present in the output file. Keys that appear in only one of the input files are ignored.

The following comparisons operate on Bag1.bag and Bag2.bag which have as common keys 4, 7, and 8.

Find counters in Bag1.bag that are less than those in Bag2.bag:

 $ rwbagtool --compare=lt Bag1.bag Bag2.bag > Bag-lt.bag
 $ rwbagcat --key-format=decimal Bag-lt.bag
  7|   1|

Find counters in Bag1.bag that are less than or equal to those in Bag2.bag:

 $ rwbagtool --compare=le Bag1.bag Bag2.bag > Bag-le.bag
 $ rwbagcat --key-format=decimal Bag-le.bag
  7|   1|
  8|   1|

Find counters in Bag1.bag that are equal to those in Bag2.bag:

 $ rwbagtool --compare=eq Bag1.bag Bag2.bag > Bag-eq.bag
 $ rwbagcat --key-format=decimal Bag-eq.bag
  8|   1|

Find counters in Bag1.bag that are greater than or equal to those in Bag2.bag:

 $ rwbagtool --compare=ge Bag1.bag Bag2.bag > Bag-ge.bag
 $ rwbagcat --key-format=decimal Bag-ge.bag
  4|   1|
  8|   1|

Find counters in Bag1.bag that are greater than those in Bag2.bag:

 $ rwbagtool --compare=gt Bag1.bag Bag2.bag > Bag-gt.bag
 $ rwbagcat --key-format=decimal Bag-gt.bag
  4|   1|

Making a Cover Set

A cover set is an IPset file that contains the keys that are present in any of the input Bag files. In other words, it is the union of the keys converted to an IPset. Since an operation switch is not provided in this command, an implicit --add operation is performed on the Bag files prior to creating the cover set. (rwsetcat(1) prints the contents of an IPset file as text.)

 $ rwbagtool --coverset Bag1.bag Bag2.bag Bag3.bag > Cover.set
 $ rwsetcat --key-format=decimal Cover.set
  1
  2
  3
  4
  6
  7
  8
  9

One use of a cover set is to limit the contents of a Bag file to keys that are present in a second Bag file:

 $ rwbagtool --coverset --output-path=Cover.set Bag1.bag
 $ rwbagtool --intersect=Cover.set Bag2.bag > Bag1-mask-Bag2.bag
 $ rwbagcat --key-format=decimal Bag1-mask-Bag2.bag
  4|   2|
  7|  32|
  8|   2|

To mask the contents of Bag2.bag by the keys that are not present in Bag1.bag:

 $ rwbagtool --complement-intersect=Cover.set Bag2.bag \
        > Bag1-notmask-Bag2.bag
 $ rwbagcat --key-format=decimal Bag1-notmask-Bag2.bag
  1|   1|

Inverting a Bag

The output of the --invert switch is a Bag file that counts the number of times each counter is present in the input Bag file.

 $ rwbagtool --invert Bag1.bag > Bag-inv1.bag
 $ rwbagcat --key-format=decimal Bag-inv1.bag
  2|   1|
  7|   1|
 10|   1|
 14|   1|
 23|   1|

 $ rwbagtool --invert Bag2.bag > Bag-inv2.bag
 $ rwbagcat --key-format=decimal Bag-inv2.bag
  1|   1|
  2|   2|
 32|   1|

 $ rwbagtool --invert Bag3.bag > Bag-inv3.bag
 $ rwbagcat --key-format=decimal Bag-inv3.bag
  8|   2|
 10|   1|
 12|   1|
 14|   1|

When multiple Bag files are specified on the command line, the files are added prior to creating the inverted Bag. Even though the counter 2 appears three times in the files Bag1.bag and Bag2.bag, the key 2 is not present in the following since the add operation is performed first.

 $ rwbagtool --invert Bag1.bag Bag2.bag   \
   | rwbagcat --key-format=decimal
  1|   1|
  4|   1|
  9|   1|
 10|   1|
 14|   1|
 55|   1|

Masking Bag Files

The --intersect switch takes an IPset file as an argument and limits the keys of the Bag produced by rwbagtool to only those keys that appear in the IPset file.

 $ rwbagtool --intersect=Mask.set Bag1.bag > Bag-mask.bag
 $ rwbagcat --key-format=decimal Bag-mask.bag
  4|   7|
  6|  14|
  8|   2|

The --complement-intersect switch limits the output to only those keys that do not appear in the IPset file.

 $ rwbagtool --complement-intersect=Mask.set Bag1.bag > Bag-mask2.bag
 $ rwbagcat --key-format=decimal Bag-mask2.bag
  3|  10|
  7|  23|

See also the next section.

Restricting the Output

In addition to limiting the result of rwbagtool to keys that appear or do not appear in an IPset file (cf. previous section), numeric limits may be used to restrict the keys or counters that in the resulting Bag file with use of the --minkey, --maxkey, --mincounter, and --maxcounter switches.

 $ rwbagtool --add --maxkey=5 Bag1.bag Bag2.bag > Bag-res1.bag
 $ rwbagcat --key-format=decimal Bag-res1.bag
  1|   1|
  3|  10|
  4|   9|

 $ rwbagtool --minkey=3 --maxkey=6 Bag1.bag > Bag-res2.bag
 $ rwbagcat --key-format=decimal Bag-res2.bag
  3|  10|
  4|   9|
  6|  14|

 $ rwbagtool --mincounter=20 Bag1.bag Bag2.bag > Bag-res3.bag
 $ rwbagcat --key-format=decimal Bag-res3.bag
  7|  55|

 $ rwbagtool --subtract --maxcounter=9 Bag1.bag Bag2.bag  \
        > Bag-res4.bag
 $ rwbagcat --key-format=decimal Bag-res4.bag
  4|   5|

Changing a File’s Format

To share a Bag file with a user who has a version of SiLK that includes different compression libraries, it may be necessary to change the the compression-method of the Bag.

It is not possible to change the compression-method directly. A new file must be created first, and then you may then replace the old file with the new file.

To create a new file that uses a different compression-method of the Bag file A.bag, use rwbagtool with the --add switch and specify the desired argument:

 $ rwbagtool --add --compression=none --output-path=A1.bag A.bag

Changing the Key Type or Counter Type

Unfortunately, the Bag tools do not allow changing the key type or counter type of a Bag file. To change the types, use rwbagcat(1) to write the Bag as text and rwbagbuild(1) to convert the text back to a Bag file.

 $ rwbagcat Bag1.bag    \
   | rwbagbuild --bag-input=- --output-path=Bag1-typed.bag  \
        --key-type=sport --counter-type=sum-bytes

Use rwfileinfo(1) to see the type of the key and counter.

 $ rwfileinfo --field=bag Bag1-typed.bag
 Bag1-typed.bag:
   bag          key: sPort @ 4 octets; counter: sum-bytes @ 8 octets

Alternatively, one may use PySiLK (see pysilk(3)) to modify the key type and counter type.

 $ cat bag-type.py
 import sys
 from silk import *

 key_type = sys.argv[1]
 counter_type = sys.argv[2]
 old_file = sys.argv[3]
 new_file = sys.argv[4]

 old = Bag.load(old_file, key_type=IPv4Addr)
 new = Bag(old, key_type=key_type, counter_type=counter_type)
 new.save(new_file)
 $
 $ python bag-type.py sipv4 sum-packets Bag1.bag Bag1-type2.bag
 $ rwfileinfo --field=bag Bag1-type2.bag
 Bag1-type2.bag:
   bag          key: sIPv4 @ 4 octets; counter: sum-packets @ 8 octets

ENVIRONMENT

SILK_IPSET_RECORD_VERSION

This environment variable is used as the value for the --ipset-record-version when that switch is not provided. Since SiLK 3.7.0.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.

NOTES

The --modify-inplace switch was added in SiLK 3.21. When --backup-path is also given, there is a small time window when the original file does not exist: the time between moving the original file to the backup location and moving the temporary file into place.

rwbagtool should handle counter overflow more consistently and gracefully.

SEE ALSO

rwbag(1), rwbagbuild(1), rwbagcat(1), rwfileinfo(1), rwset(1), rwsetbuild(1), rwsetcat(1), rwaggbag(1), rwaggbagbuild(1), rwaggbagcat(1), rwaggbagtool(1), silk(7), zlib(3)

rwcat

Concatenate SiLK Flow files into single stream

SYNOPSIS

  rwcat [--output-path=PATH] [--note-add=TEXT] [--note-file-add=FILE]
        [--print-filenames] [--byte-order={big | little | native}]
        [--ipv4-output] [--milliseconds]
        [--compression-method=COMP_METHOD]
        [--site-config-file=FILENAME]
        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE...]]}

  rwcat --help

  rwcat --version

DESCRIPTION

rwcat reads SiLK Flow records and writes the records in the standard binary SiLK format to the specified output-path; rwcat writes the records to the standard output when stdout is not the terminal and --output-path is not provided.

rwcat reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwcat reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.

rwcat does not copy the invocation history and annotations (notes) from the header(s) of the source file(s) to the destination file. The --note-add or --note-file-add switch may be used to add a new annotation to the destination file.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--output-path=PATH

Write the binary SiLK Flow records to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwcat exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. When PATH ends in .gz, the output is compressed using the library associated with gzip(1). If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwcat to exit with an error.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--byte-order=ENDIAN

Set the byte order for the output SiLK Flow records. The argument is one of the following:

native

Use the byte order of the machine where rwcat is running. This is the default.

big

Use network byte order (big endian) for the output.

little

Write the output in little endian format.

--ipv4-output

Force the output to contain only IPv4 flow records. When this switch is specified, IPv6 flow records that contain addresses in the ::ffff:0:0/96 prefix are converted to IPv4 and written to the output, and all other IPv6 records are ignored. When SiLK has not been compiled with IPv6 support, rwcat acts as if this switch were always in effect.

--milliseconds

Force the output to use record formats and versions that use millisecond timestamps. This makes the output compatible with releases of SiLK prior to SiLK 3.23.0. To read the output, SiLK 3.10.0 or later is required, and if the byte-count, packet-count, or SNMP values (in and out) exceed the maximum supported that version of SiLK, the value is set to its maximum. Since SiLK 3.23.0.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--print-filenames

Print the names of input files and the number of records each file contains as the files are read.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwcat searches for the site configuration file in the locations specified in the FILES section.

--xargs

--xargs=FILENAME

Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwcat opens each named file in turn and reads records from it as if the filenames had been listed on the command line.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

To combine the results of several rwfilter(1) runs---stored in the files run1.rw, run2.rw, ... runN.rw---together to create the file combined.rw, you can use:

 $ rwcat --output=combined.rw  *.rw

If the shell complains about too many arguments, you can use the UNIX find(1) function and pipe its output to rwcat:

 $ find . -name ’*.rw’ -print                   \
   | rwcat --xargs --output=combined.rw

ENVIRONMENT

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwcat may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwcat may use this environment variable. See the FILES section for details.

FILES

${SILK_CONFIG_FILE}

${SILK_DATA_ROOTDIR}/silk.conf

/data/silk.conf

${SILK_PATH}/share/silk/silk.conf

${SILK_PATH}/share/silk.conf

/usr/local/share/silk/silk.conf

/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwfilter(1), rwfileinfo(1), silk(7), gzip(1), find(1), zlib(3)

BUGS

Although rwcat will read from the standard input, this feature should be used with caution. rwcat will treat the standard input as a single file, as it has no way to know when one file ends and the next begins. The following will not work:

 $ cat run1.rw run2.rw | rwcat --output=combined.rw     # WRONG!

The header of run2.rw will be treated as data of run1.rw, resulting in corrupt output.

rwcombine

Combine flows denoting a long-lived session into a single flow

SYNOPSIS

  rwcombine [--actions=ACTIONS] [--ignore-fields=FIELDS]
        [--max-idle-time=NUM]
        [{--print-statistics | --print-statistics=FILENAME}]
        [--temp-directory=DIR_PATH] [--buffer-size=SIZE]
        [--note-add=TEXT] [--note-file-add=FILE]
        [--compression-method=COMP_METHOD] [--print-filenames]
        [--output-path=PATH] [--site-config-file=FILENAME]
        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}

  rwcombine --help

  rwcombine --help-fields

  rwcombine --version

DESCRIPTION

rwcombine reads SiLK Flow records from one or more input sources, searches for flow records where the attributes field denotes records that were prematurely created or were continuations of prematurely created flows, and attempts to combine those records into a single record. All the unmodified SiLK records and the combined records are written to the file specified by the --output-path switch or to the standard output when the --output-path switch is not provided and the standard output is not connected to a terminal.

Some flow exporters, such as yaf(1), provide fields that describe characteristics about the flow record, and these characteristics are stored in the attributes field of SiLK Flow records. The two flags that rwcombine considers are:

T

The flow generator prematurely created a record for a long-lived session due to the connection’s lifetime reaching the active timeout of the flow generator. (Also, when yaf is run with the --silk switch, it prematurely creates a flow and marks it with T if the byte count of the flow cannot be stored in a 32-bit value.)

C

The flow generator created this flow as a continuation of long-running connection, where the previous flow for this connection met a timeout. (yaf only sets this flag when it is invoked with the --silk switch.)

A very long-running session may be represented by multiple flow records, where the first record is marked with the T flag, the final record is marked with the C flag, and intermediate records are marked with both C (this record continues an earlier flow) and T (this record also met the active time-out). rwcombine attempts to combine these multiple flow records into a single record.

The input to rwcombine does not need to be sorted. As part of its processing, rwcombine may re-order the records before writing them.

rwcombine reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwcombine reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.

Algorithm

The algorithm rwcombine uses to combine records is

  1. rwcombine reads SiLK flow records, examines the attributes field on each record, and immediately writes to the destination stream all records where both the time-out flag (T) and the continuation flag (C) are not set. Records where one or both of those flags are set are stored until all input records have been read.

  2. rwcombine groups the stored records into bins where the following fields for each record in each bin are identical: sIP, dIP, sPort, dPort, protocol, sensor, in, out, nhIP, application, class, and type.

  3. For each bin, the records are stored by time (sTime and eTime).

  4. Within a bin, rwcombine combines two records into a single record when the attributes field of the first record has the T (time-out) flag set and the second record has the C (continuation) flag set. When combining records, the bytes field and packets fields are summed, the initialFlags from the first record is used, the sessionFlags field becomes the bit-wise OR of both sessionFlags fields and the second record’s initialFlags field, and the eTime is set to that of the second flow.

  5. If the second record’s T flag was set, rwcombine checks to see if the third record’s C flag is set. If it is, the third record becomes part of the new record.

  6. The previous step repeats for the records in the bin until the bin contains a single record, the most recently added record did not have the T flag set, or the next record in the bin does not have the C flag set.

  7. After examining a bin, rwcombine writes the record(s) the bin contains to the destination stream.

  8. Steps 3 through 7 are repeated for each bin.

The --ignore-fields switch allows the user to remove fields from the set that rwcombine uses when grouping records in Step 2.

When combining two records into one (Step 4), rwcombine completely disregards the difference between the first record’s end-time and the second record’s start-time (the idle time). To tell rwcombine not to combine those records when the difference is greater than a limit, specify that value as the argument to the --max-idle-time switch.

To see information on the number of flows combined and the minimum and maximum idle times, specify the --print-statistics switch.

During its processing, rwcombine will try to allocate a large (near 2GB) in-memory array to hold the records. (You may use the --buffer-size switch to change this maximum buffer size.) If more records are read than will fit into memory, the in-core records are temporarily stored on disk as described by the --temp-directory switch. When all records have been read, the on-disk files are merged to produce the output.

By default, the temporary files are stored in the /tmp directory. Because the sizes of the temporary files may be large, it is strongly recommended that /tmp not be used as the temporary directory, and rwcombine will print a warning when /tmp is used. To modify the temporary directory used by rwcombine, provide the --temp-directory switch, set the SILK_TMPDIR environment variable, or set the TMPDIR environment variable.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--actions=ACTIONS

Select the type of action(s) that rwcombine should take to combine the input records. The default action is all, and the following actions are supported:

all

Perform all the actions described below.

timeout

Combine into a single flow record those records where the timeout flags in the attributes field indicate that the flow exporter has divided a long-lived session into multiple flow records.

This switch is provided for future expansion of rwcombine, since at present rwcombine supports a single action. When writing a script that uses rwcombine, specify --action=timeout for compatibility with future versions of rwcombine.

--ignore-fields=FIELDS

Ignore the fields listed in FIELDS when determining if two flow records should be grouped into the same bin; that is, treat FIELDS as being identical across all flows. By default, rwcombine puts records into a bin when the records have identical values for the following fields: sIP, dIP, sPort, dPort, protocol, sensor, in, out, nhIP, application, class, and type.

FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Field-names are case-insensitive. Example:

      --ignore-fields=sensor,12-15

The list of supported fields are:

sIP,1

source IP address

dIP,2

destination IP address

sPort,3

source port for TCP and UDP, or equivalent

dPort,4

destination port for TCP and UDP, or equivalent

protocol,5

IP protocol

sensor,12

name or ID of sensor at the collection point

in,13

router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))

out,14

router SNMP output interface or postVlanId

nhIP,15

router next hop IP

class,20,type,21

class and type of sensor at the collection point (represented internally by a single value)

application,29

guess as to the content of the flow. Some software that generates flow records from packet data, such as yaf(1), will inspect the contents of the packets that make up a flow and use traffic signatures to label the content of the flow. SiLK calls this label the application; yaf refers to it as the appLabel. The application is the port number that is traditionally used for that type of traffic (see the /etc/services file on most UNIX systems). For example, traffic that the flow generator recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard HTTP/web port (80).

--max-idle-time=NUM

Do not combine flow records when the start time of the second flow record begins NUM seconds after the end time of the first flow record. NUM may be a floating point value. If not specified, the maximum idle time may be considered infinite.

--print-statistics

--print-statistics=FILENAME

Print to the standard error or to the specified FILENAME the number of flows records read and written, the number of flows that did not require combining, the number of flows combined, the number that could not be combined, and minimum and maximum idle time between combined flow records.

--temp-directory=DIR_PATH

Specify the name of the directory in which to store data files temporarily when more records have been read that will fit into RAM. This switch overrides the directory specified in the SILK_TMPDIR environment variable, which overrides the directory specified in the TMPDIR variable, which overrides the default, /tmp.

--buffer-size=SIZE

Set the maximum size of the buffer to use for holding the records, in bytes. A larger buffer means fewer temporary files need to be created, reducing the I/O wait times. The default maximum for this buffer is near 2GB. The SIZE may be given as an ordinary integer, or as a real number followed by a suffix K, M or G, which represents the numerical value multiplied by 1,024 (kilo), 1,048,576 (mega), and 1,073,741,824 (giga), respectively. For example, 1.5K represents 1,536 bytes, or one and one-half kilobytes. (This value does not represent the absolute maximum amount of RAM that rwcombine will allocate, since additional buffers will be allocated for reading the input and writing the output.)

--output-path=PATH

Write the binary SiLK Flow records to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwcombine exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwcombine to exit with an error.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--print-filenames

Print to the standard error the names of input files as they are opened.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwcombine searches for the site configuration file in the locations specified in the FILES section.

--xargs

--xargs=FILENAME

Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwcombine opens each named file in turn and reads records from it as if the filenames had been listed on the command line.

--help

Print the available options and exit.

--help-fields

Print the description and alias(es) of each field and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

Use rwfilter(1) to find ssh flow records that involve the host 192.168.126.252. The output from rwcut(1) shows the flow exporter split this long-lived ssh session into multiple flow records:

 $ rwfilter --saddr=192.168.126.252 --dport=22 --pass=- data.rw \
   | rwcut --fields=flags,attributes,stime,etime
    flags|attribut|                  sTime|                  eTime|
  S PA   |T       |2009/02/13T00:29:59.563|2009/02/13T00:59:39.668|
    PA   |TC      |2009/02/13T00:59:39.668|2009/02/13T01:29:19.478|
    PA   |TC      |2009/02/13T01:29:19.478|2009/02/13T01:58:48.890|
    PA   |TC      |2009/02/13T01:58:48.891|2009/02/13T02:28:43.599|
 F  PA   | C      |2009/02/13T02:28:43.600|2009/02/13T02:32:58.272|

Here is the other half of that conversation:

 $ rwfilter --daddr=192.168.126.252 --sport=22 --pass=- data.rw \
   | rwcut --fields=flags,attributes,stime,etime
    flags|attribut|                  sTime|                  eTime|
  S PA   |T       |2009/02/13T00:30:00.060|2009/02/13T00:59:39.667|
    PA   |TC      |2009/02/13T00:59:39.670|2009/02/13T01:29:19.478|
    PA   |TC      |2009/02/13T01:29:19.481|2009/02/13T01:58:48.890|
    PA   |TC      |2009/02/13T01:58:48.893|2009/02/13T02:28:43.599|
 F  PA   | C      |2009/02/13T02:28:43.600|2009/02/13T02:32:58.271|

Use rwuniq(1) to compute the byte and packet counts for that ssh session:

 $ rwfilter --any-addr=192.168.126.252 --aport=22 --pass=- data.rw \
   | rwuniq --fields=sip,dip,sport,dport --values=records,byte,packets
             sIP|            dIP|sPort|dPort|Records|  Bytes|Packets|
   10.11.156.107|192.168.126.252|   22|28975|      5|4677240|   3881|
 192.168.126.252|  10.11.156.107|28975|   22|      5| 281939|   3891|

Invoke rwcombine on these records and store the result in the file combined.rw:

 $ rwfilter --any-addr=192.168.126.252 --aport=22 --pass=- data.rw \
   | rwcombine --print-statistics --output-path=combined.rw
 FLOW RECORD COUNTS:
 Read:                                    10
 Initially Complete:           -           0 *
 Sorted & Examined:            =          10
 Missing end:                  -           0 *
 Missing start & end:          -           0 *
 Missing start:                -           0 *
 Prior to combining:           =          10
 Eliminated:                   -           8
 Made complete:                =           2 *
 Written:                                  2 (sum of *)

 IDLE TIMES:
 Minimum:        0:00:00:00.000
 Penultimate:    0:00:00:00.000
 Maximum:        0:00:00:00.003

View the resulting records:

 $ rwcut --fields=sip,dip,sport,dport,bytes,packets,flags combined.rw
             sIP|            dIP|sPort|dPort|  bytes|packets|   flags|
   10.11.156.107|192.168.126.252|   22|28975|4677240|   3881|FS PA   |
 192.168.126.252|  10.11.156.107|28975|   22| 281939|   3891|FS PA   |

 $ rwcut --fields=sip,attributes,stime,etime combined.rw
             sIP|attribut|                  sTime|                  eTime|
   10.11.156.107|        |2009/02/13T00:30:00.060|2009/02/13T02:32:58.271|
 192.168.126.252|        |2009/02/13T00:29:59.563|2009/02/13T02:32:58.272|

ENVIRONMENT

SILK_TMPDIR

When set and --temp-directory is not specified, rwcombine writes the temporary files it creates to this directory. SILK_TMPDIR overrides the value of TMPDIR.

TMPDIR

When set and SILK_TMPDIR is not set, rwcombine writes the temporary files it creates to this directory.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwcombine may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwcombine may use this environment variable. See the FILES section for details.

SILK_TEMPFILE_DEBUG

When set to 1, rwcombine prints debugging messages to the standard error as it creates, re-opens, and removes temporary files.

FILES

${SILK_CONFIG_FILE}

${SILK_DATA_ROOTDIR}/silk.conf

/data/silk.conf

${SILK_PATH}/share/silk/silk.conf

${SILK_PATH}/share/silk.conf

/usr/local/share/silk/silk.conf

/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

${SILK_TMPDIR}/

${TMPDIR}/

/tmp/

Directory in which to create temporary files.

SEE ALSO

rwfilter(1), rwcut(1), rwuniq(1), rwfileinfo(1), sensor.conf(5), silk(7), yaf(1), zlib(3)

NOTES

The first release of rwcombine occurred in SiLK 3.9.0.

rwcompare

Compare the records in two SiLK Flow files

SYNOPSIS

  rwcompare [--quiet] [--site-config-file] FILE1 FILE2

  rwcompare --help

  rwcompare --version

DESCRIPTION

rwcompare opens the two files named on the command and compares the SiLK Flow records they contain. If the records are identical, rwcompare exits with status 0. If any of the records differ, rwcompare prints a message and exits with status 1. If there is an issue reading either file, an error is printed and the exit status is 2. Use the --quiet switch to suppress all output (error messages included). You may use - or stdin for one of the file names, in which case rwcompare reads from the standard input.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--quiet

Do not print a message if the files differ, and do not an print error message if a file cannot be opened or read.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwcombine searches for the site configuration file in the locations specified in the FILES section.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. Some input lines are split over multiple lines in order to improve readability, and a backslash (\) is used to indicate such lines. The examples assume the existence of the file data.rw that contains SiLK Flow records. The exit status of the most recent command is available in the shell variable $?.

Compare a file with itself:

 $ rwcompare data.rw data.rw
 $ echo $?
 0

Compare a file with itself, where one instance of the file is read from the standard input:

 $ rwcat data.rw | rwcompare - data.rw
 $ echo $?
 0

Use rwsort(1) to modify one instance of the file and compare the results:

 $ rwsort --fields=proto data.rw | rwcompare - data.rw
 - data.rw differ: record 1
 $ echo $?
 1

Run the command again and use the --quiet switch:

 $ rwsort --fields=proto data.rw | rwcompare --quiet - data.rw
 $ echo $?
 1

Compare the file with input containing two copies of the file:

 $ rwcat data.rw data.rw | rwcompare data.rw -
 data.rw - differ: EOF data.rw
 $ echo $?
 1

Compare the file with /dev/null:

 $ rwcompare --quiet /dev/null data.rw
 $ echo $?
 2

rwcompare checks whether two files have the same records in the same order. To compare two arbitrary files, use rwsort(1) to reorder the records. Make certain to provide enough fields to the rwsort command so that the records are in the same order.

 $ rwsort --fields=1-10,12-15,20-29 data.rw > /tmp/sorted-data.rw
 $ rwsort --fields=1-10,12-15,20-29 other-data.rw   \
   | rwcompare /tmp/sorted-data.rw -
 /tmp/sorted-data.rw - differ: record 103363

ENVIRONMENT

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwcombine may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwcombine may use this environment variable. See the FILES section for details.

FILES

${SILK_CONFIG_FILE}

${SILK_DATA_ROOTDIR}/silk.conf

/data/silk.conf

${SILK_PATH}/share/silk/silk.conf

${SILK_PATH}/share/silk.conf

/usr/local/share/silk/silk.conf

/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwfileinfo(1), rwcat(1), rwsort(1), silk(7)

rwcount

Print traffic summary across time

SYNOPSIS

  rwcount [--bin-size=SIZE] [--load-scheme=LOADSCHEME]
        [--start-time=START_TIME] [--end-time=END_TIME]
        [--skip-zeroes] [--bin-slots] [--epoch-slots]
        [--timestamp-format=FORMAT] [--no-titles]
        [--no-columns] [--column-separator=CHAR]
        [--no-final-delimiter] [{--delimited | --delimited=CHAR}]
        [--print-filenames] [--copy-input=PATH] [--output-path=PATH]
        [--pager=PAGER_PROG] [--site-config-file=FILENAME]
        [{--legacy-timestamps | --legacy-timestamps={1,0}}]
        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}

  rwcount --help

  rwcount --version

DESCRIPTION

rwcount summarizes SiLK flow records across time. It counts the records in the input stream, and groups their byte and packet totals into time bins. rwcount produces textual output with one row for each bin.

rwcount reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwcount reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.

rwcount splits each flow record into bins whose size is determined by the argument to the --bin-size switch. When that switch is not provided, rwcount uses 30-second bins by default.

By default, the first row of data rwcount prints is the bin containing the starting time of the earliest record that appears in the input. rwcount then prints a row for every bin until it reaches the bin containing the most recent ending time. Rows whose counts are zero are printed unless the --skip-zero switch is specified.

The --start-time and --end-time switches tell rwcount to use a specific time for the first row and the final row. The --start-time switch always sets the time stamp on the first bin to the specified time. With the --end-time switch, rwcount computes a maximum end-time by setting any unspecified hour, minute, second, and millisecond field to its maximum value, and the final bin is that which contains the maximum end-time.

When --start-time and --end-time are both specified, rwcount reserves the memory for the bins before it begins processing the records. If the memory cannot be allocated, rwcount exits. If this happens, try reducing the time span or increasing the bin-size.

Load Scheme

A router or other flow generator summarizes the traffic it sees into records. In addition to the five-tuple (source port and address, destination port and address, and protocol), the record has its start time, end time, total byte count, and total packet count. There is no way to know how the bytes and packets were distributed during the duration of the record: their distribution could be front-loaded, back-loaded, uniform, et cetera.

When the start and end times of a individual flow record put that record into a single bin, rwcount can simply add that record’s volume (byte and packet counts) to the bin.

When the duration of a flow record causes it to span multiple bins, rwcount must to told how to allocate the volume among the bins. The --load-scheme switch determines this, and it has supports the following allocation schemes:

time-proportional

Each bin a flow spans is allocated a percentage of the flow’s volume proportional to the amount of the flow’s active time that spans the bin. Specifically, rwcount divides the total volume of the flow by the duration of the flow, and multiplies the quotient by the time spent in the bin. This models a flow where the volume/second ratio is uniform throughout the flow.

bin-uniform

Each bin a flow spans is allocated an equal portion of the flow’s volume. rwcount divides the volume of the flow by the number of bins the flow spans, and adds the quotient to each of the bins. In this scheme, the volume/bin ratio is uniform.

start-spike

The bin that contains the flow’s start time is allocated all of the flow’s volume regardless of the flow’s duration. rwcount adds the total volume for the flow into the bin containing the start time of the flow. This models a flow that is front-loaded to the point where the entire volume is a single spike occurring in the initial millisecond of flow.

middle-spike

The bin that contains the midpoint between the flow’s start time and end time is allocated all of the flow’s volume regardless of the flow’s duration.

end-spike

The bin that contains the flow’s end time is allocated all of the flow’s volume regardless of the flow’s duration. This models a flow that is back-loaded to the point where the entire volume is a single spike occurring in final millisecond of the flow.

maximum-volume

Each bin the flow spans is allocated all of the flow’s volume. rwcount adds the entire volume for the flow into every bin that contains any part of the flow. In theory, the distribution of the bytes in the record could be a spike that occurs at any point during the flow’s duration. This scheme allows one to determine, in aggregate, the maximum possible volume that could have occurred during this bin. In this scheme, the Records column gives the number of records that were active during the bin.

minimum-volume

For a record that spans multiple bins, each bin is allocated none of the flow’s volume. That is, rwcount acts as though the volume for the flow occurred in some other bin. Since it is possible that a record that spans multiple bins did not contribute any volume to the current bin, this scheme allows one to determine, in aggregate, the minimum possible volume that may have occurred during this bin. The Records column in this scheme, as in the maximum-volume scheme, gives the number of flow records that were active during the bin.

Be aware that the ”spike” load-schemes allocate the entire flow to a single bin. This can create the impression that there is more traffic occurring during a particular time window that the physical network supports.

The maximum-volume and minimum-volume schemes are used to compute the maximum and minimum volumes that could have been transferred during any one bin. maximum-volume intentionally over-counts the flow volume and minimum-volume intentionally under-counts.

To see the effect of the various load-schemes, suppose rwcount is using 60-second bins and the input contains two records. The first record begins at 12:03:50, ends at 12:06:20, and contains 9,000 bytes (60 bytes/second for 150 seconds). This record may contribute to bins at 12:03, 12:04, 12:05, and 12:06. The second record begins at 12:04:05 and lasts 15 seconds; this record’s volume always contributes its 200 bytes to the 12:04 bin. The --load-scheme option splits the byte-counts of the records as follows:

 BIN                 12:03:00    12:04:00    12:05:00    12:06:00

 time-proportional        600        3800        3600        1200
 bin-uniform             2250        2450        2250        2250
 start-spike             9000         200           0           0
 middle-spike               0         200        9000           0
 end-spike                  0         200           0        9000
 maximum-volume          9000        9200        9000        9000
 minimum-volume             0         200           0           0

For the record that spans multiple bins: the time-proportional scheme assumes 60 bytes/second, the bin-uniform scheme divides the volume evenly by the four bins, the middle-spike scheme assumes all the volume occurs at 12:05:05, the maximum-volume scheme adds the volume to every bin, and the minimum-volume scheme ignores the record.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--bin-size=SIZE

Denote the size of each time bin, in seconds; defaults to 30 seconds. rwcount supports millisecond size bins; SIZE may be a floating point value equal to or greater than than 0.001.

--load-scheme=LOADSCHEME

Specify how a flow record that spans multiple bins allocates its bytes and packets among the bins. The default scheme is time-proportional, which assumes the volume/second ratio of the flow record is constant. See the Load Scheme section for additional information on the load-scheme choices. The LOADSCHEME may be one of the following names or numbers; names may be abbreviated to the shortest prefix that is unique.

time-proportional,4

Allocate the volume in proportion to the amount of time the flow spent in the bin.

bin-uniform,0

Allocate the volume evenly across the bins that contain any part of the flow’s duration.

start-spike,1

Allocate the entire volume to the bin containing the start time of the flow.

middle-spike,3

Allocate the entire volume to the bin containing the time at the midpoint of the flow.

end-spike,2

Allocate the entire volume to the bin containing the end time of the flow.

maximum-volume,5

Allocate the entire volume to all of the bins containing any part of the flow.

minimum-volume,6

Allocate the flow’s volume to a bin only if the flow is completely contained within the bin; otherwise ignore the flow.

--start-time=START_TIME

Set the time of the first bin to START_TIME. When this switch is not given, the first bin is one that holds the starting time of the earliest record. The START_TIME may be specified in a format of yyyy/mm/dd[:HH[:MM[:SS[.sss]]]] (or T may be used in place of : to separate the day and hour). The time must be specified to at least day precision, and unspecified hour, minute, second, and millisecond values are set to zero. Whether the date strings represent times in UTC or the local timezone depend on how SiLK was compiled, which can be determined from the Timezone support setting in the output from rwcount --version. Alternatively, the time may be specified as seconds since the UNIX epoch, and an unspecified milliseconds value is set to 0.

--end-time=END_TIME

Set the time of the final bin to END_TIME. When this switch is not given, the final bin is one that holds the ending time of the latest record. The format of END_TIME is the same as that for START_TIME. Unspecified hour, minute, second, and millisecond values are set to 23, 59, 59, and 999 respectively. When END_TIME is specified as seconds since the UNIX epoch, an unspecified milliseconds value is set to 999. When both --start-time and --end-time are used, the END_TIME is adjusted so that the final bin represents a complete interval.

--skip-zeroes

Disable printing of bins with no traffic. By default, all bins are printed.

--bin-slots

Use the internal bin index as the label for each bin in the output; the default is to label each bin with the time in a human-readable format.

--epoch-slots

Use the UNIX epoch time (number of seconds since midnight UTC on 1970-01-01) as the label for each bin in the output; the default is to label each bin with the time in a human-readable format. This switch is equivalent to --timestamp-format=epoch. This switch is deprecated as of SiLK 3.11.0, and it will be removed in the SiLK 4.0 release.

--timestamp-format=FORMAT

Specify the format and/or timezone to use when printing timestamps. When this switch is not specified, the SILK_TIMESTAMP_FORMAT environment variable is checked for a default format and/or timezone. If it is empty or contains invalid values, timestamps are printed in the default format, and the timezone is UTC unless SiLK was compiled with local timezone support. FORMAT is a comma-separated list of a format and/or a timezone. The format is one of:

default

Print the timestamps as YYYY/MM/DDThh:mm:ss .

iso

Print the timestamps as YYYY-MM-DD hh:mm:ss .

m/d/y

Print the timestamps as MM/DD/YYYY hh:mm:ss .

epoch

Print the timestamps as the number of seconds since 00:00:00 UTC on 1970-01-01.

When a timezone is specified, it is used regardless of the default timezone support compiled into SiLK. The timezone is one of:

utc

Use Coordinated Universal Time to print timestamps.

local

Use the TZ environment variable or the local timezone.

--no-titles

Turn off column titles. By default, titles are printed.

--no-columns

Disable fixed-width columnar output.

--column-separator=C

Use specified character between columns and after the final column. When this switch is not specified, the default of ’|’ is used.

--no-final-delimiter

Do not print the column separator after the final column. Normally a delimiter is printed.

--delimited

--delimited=C

Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default ’|’.

--print-filenames

Print to the standard error the names of input files as they are opened.

--copy-input=PATH

Copy all binary SiLK Flow records read as input to the specified file or named pipe. PATH may be stdout or - to write flows to the standard output as long as the --output-path switch is specified to redirect rwcount’s textual output to a different location.

--output-path=PATH

Write the textual output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output (and bypass the paging program). If PATH names an existing file, rwcount exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is either sent to the pager or written to the standard output.

--pager=PAGER_PROG

When output is to a terminal, invoke the program PAGER_PROG to view the output one screen full at a time. This switch overrides the SILK_PAGER environment variable, which in turn overrides the PAGER variable. If the --output-path switch is given or if the value of the pager is determined to be the empty string, no paging is performed and all output is written to the terminal.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwcount searches for the site configuration file in the locations specified in the FILES section.

--legacy-timestamps

--legacy-timestamps=NUM

When NUM is not specified or is 1, this switch is equivalent to --timestamp-format=m/d/y. Otherwise, the switch has no effect. This switch is deprecated as of SiLK 3.0.0, and it will be removed in the SiLK 4.0 release.

--xargs

--xargs=FILENAME

Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwcount opens each named file in turn and reads records from it as if the filenames had been listed on the command line.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

--start-epoch=START_TIME

Alias the --start-time switch. This switch is deprecated as of SiLK 3.8.0.

--end-epoch=START_TIME

Alias the --end-time switch. This switch is deprecated as of SiLK 3.8.0.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

To count all web traffic on Feb 12, 2009, into 1 hour bins:

 $ rwfilter --pass=stdout --start-date=2009/02/12:00        \
        --end-date=2009/02/12:23 --proto=6 --aport=80       \
   | rwcount --bin-size=3600
                Date|      Records|          Bytes|      Packets|
 2009/02/12T00:00:00|      1490.49|   578270918.16|    463951.55|
 2009/02/12T01:00:00|      1459.33|   596455716.52|    457487.80|
 2009/02/12T02:00:00|      1529.06|   562602842.44|    451456.41|
 2009/02/12T03:00:00|      1503.89|   562683116.38|    455554.81|
 2009/02/12T04:00:00|      1561.89|   590554569.78|    489273.81|
 ....

To bin the records according to their start times, use the --load-scheme switch:

 $ rwfilter ... --pass=stdout       \
   | rwcount --bin-size=3600 --load-scheme=1
                Date|      Records|          Bytes|      Packets|
 2009/02/12T00:00:00|      1494.00|   580350969.00|    464952.00|
 2009/02/12T01:00:00|      1462.00|   596145212.00|    457871.00|
 2009/02/12T02:00:00|      1526.00|   561629416.00|    451088.00|
 2009/02/12T03:00:00|      1502.00|   563500618.00|    455262.00|
 2009/02/12T04:00:00|      1562.00|   589265818.00|    489279.00|
 ...

To bin the records by their end times: $ rwfilter ... --pass=stdout \| rwcount --bin-size=3600 --load-scheme=2 Date| Records| Bytes| Packets| 2009/02/12T00:00:00| 1488.00| 577132372.00| 463393.00| 2009/02/12T01:00:00| 1458.00| 596956697.00| 457376.00| 2009/02/12T02:00:00| 1530.00| 562806395.00| 451551.00| 2009/02/12T03:00:00| 1506.00| 562101791.00| 455671.00| 2009/02/12T04:00:00| 1562.00| 591408602.00| 489371.00| ...

To force the hourly bins to run from 30 minutes past the hour, use the --start-time switch:

 $ rwfilter ... --pass=stdout       \
   | rwcount --bin-size=3600 --start-time=2002/12/31:23:30
                Date|      Records|          Bytes|      Packets|
 2009/02/12T00:30:00|      1483.26|   581251364.04|    456554.40|
 2009/02/12T01:30:00|      1494.00|   575037453.00|    449280.00|
 2009/02/12T02:30:00|      1486.36|   559700466.61|    447700.15|
 2009/02/12T03:30:00|      1555.23|   588882400.58|    480724.48|
 2009/02/12T04:30:00|      1537.79|   564756248.52|    472003.45|
 ...

ENVIRONMENT

SILK_TIMESTAMP_FORMAT

This environment variable is used as the value for --timestamp-format when that switch is not provided. Since SiLK 3.11.0.

SILK_PAGER

When set to a non-empty string, rwcount automatically invokes this program to display its output a screen at a time. If set to an empty string, rwcount does not automatically page its output.

PAGER

When set and SILK_PAGER is not set, rwcount automatically invokes this program to display its output a screen at a time.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwcount may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwcount may use this environment variable. See the FILES section for details.

TZ

When the argument to the --timestamp-format switch includes local or when a SiLK installation is built to use the local timezone, the value of the TZ environment variable determines the timezone in which rwcount displays timestamps. (If both of those are false, the TZ environment variable is ignored.) If the TZ environment variable is not set, the machine’s default timezone is used. Setting TZ to the empty string or 0 causes timestamps to be displayed in UTC. For system information on the TZ variable, see tzset(3) or environ(7). (To determine if SiLK was built with support for the local timezone, check the Timezone support value in the output of rwcount --version.) The TZ environment variable is also used when rwcount parses the timestamp specified in the --start-time or --end-time switches if SiLK is built with local timezone support.

FILES

${SILK_CONFIG_FILE}

${SILK_DATA_ROOTDIR}/silk.conf

/data/silk.conf

${SILK_PATH}/share/silk/silk.conf

${SILK_PATH}/share/silk.conf

/usr/local/share/silk/silk.conf

/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwfilter(1), rwuniq(1), silk(7), tzset(3), environ(7)

BUGS

Unlike rwuniq(1), rwcount does not support counting the number of distinct IPs in a bin. However, using the --bin-time switch on rwuniq can provide time-based binning similar to what rwcount supports. Note that rwuniq always bins by the each record’s start-time (similar to rwcount --load-factor=1), and there is no support in rwuniq for dividing a SiLK record among multiple time bins.

rwcut

Print selected fields of binary SiLK Flow records

SYNOPSIS

  rwcut [{--fields=FIELDS | --all-fields}]
        {[--start-rec-num=START_NUM] [--end-rec-num=END_NUM]
         | [--tail-recs=TAIL_START_NUM]}
        [--num-recs=REC_COUNT] [--dry-run] [--icmp-type-and-code]
        [--timestamp-format=FORMAT] [--epoch-time]
        [--ip-format=FORMAT] [--integer-ips] [--zero-pad-ips]
        [--integer-sensors] [--integer-tcp-flags]
        [--no-titles] [--no-columns] [--column-separator=CHAR]
        [--no-final-delimiter] [{--delimited | --delimited=CHAR}]
        [--print-filenames] [--copy-input=PATH] [--output-path=PATH]
        [--pager=PAGER_PROG] [--site-config-file=FILENAME]
        [--ipv6-policy={ignore,asv4,mix,force,only}]
        [{--legacy-timestamps | --legacy-timestamps={1,0}}]
        [--plugin=PLUGIN [--plugin=PLUGIN ...]]
        [--python-file=PATH [--python-file=PATH ...]]
        [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
        [--pmap-column-width=NUM]
        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}

  rwcut [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
        [--plugin=PLUGIN ...] [--python-file=PATH ...] --help

  rwcut [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
        [--plugin=PLUGIN ...] [--python-file=PATH ...] --help-fields

  rwcut --version

DESCRIPTION

rwcut reads binary SiLK Flow records and prints the user-selected record attributes (or fields) to the terminal in a textual, bar-delimited (|) format. See the EXAMPLES section below for sample output.

rwcut reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwcut reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.

The user may provide the --fields switch to select the record attributes to print. When --fields is not specified rwcut prints the source and destination IP address, source and destination port, protocol, packet count, byte count, TCP flags, start time, duration, end time, and the sensor name. The fields are printed in the order in which they occur in the --fields switch. Fields may be repeated.

A subset of the input records may be selected by using the --start-rec-num, --end-rec-num, --num-recs, and --tail-recs switches.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--fields=FIELDS

FIELDS contains the list of flow attributes (a.k.a. fields or columns) to print. The columns will be displayed in the order the fields are specified. Fields may be repeated. FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Field-names are case-insensitive. Example:

      --fields=stime,10,1-5

If the --fields switch is not given, FIELDS defaults to:

      sIP,dIP,sPort,dPort,protocol,packets,bytes,flags,sTime,dur,eTime,sensor

The complete list of built-in fields that the SiLK tool suite supports follows, though note that not all fields are present in all SiLK file formats; when a field is not present, its value is 0.

sIP,1

source IP address

dIP,2

destination IP address

sPort,3

source port for TCP and UDP, or equivalent

dPort,4

destination port for TCP and UDP, or equivalent

protocol,5

IP protocol

packets,pkts,6

packet count

bytes,7

byte count

flags,8

bit-wise OR of TCP flags over all packets

sTime,9

starting time of flow in microsecond resolution

duration,10

duration of flow in microsecond resolution

eTime,11

end time of flow in microsecond resolution

sensor,12

name or ID of sensor at the collection point

class,20

class of sensor at the collection point

type,21

type of sensor at the collection point

iType

the ICMP type value for ICMP or ICMPv6 flows and empty for non-ICMP flows. This field was introduced in SiLK 3.8.1.

iCode

the ICMP code value for ICMP or ICMPv6 flows and empty for non-ICMP flows. See note at iType.

icmpTypeCode,25

equivalent to iType,iCode. This field is deprecated as of SiLK 3.8.1.

Many SiLK file formats do not store the following fields and their values will always be 0; they are listed here for completeness:

in,13

router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))

out,14

router SNMP output interface or postVlanId

nhIP,15

router next hop IP

Enhanced flow metering software (such as yaf(1)) may provide flow information elements in addition to those found in NetFlow. SiLK stores some of these elements in the fields named below. For flows without this additional information, the field’s value is always 0.

initialFlags,26

TCP flags on first packet in the flow

sessionFlags,27

bit-wise OR of TCP flags on the second through final packets in the flow

attributes,28

flow attributes set by the flow generator:

S

all the packets in this flow record are exactly the same size

F

flow generator saw additional packets in this flow following a packet with a FIN flag (excluding ACK packets)

T

flow generator prematurely created a record for a long-running connection due to a timeout. (When the flow generator yaf(1) is run with the --silk switch, it will prematurely create a flow and mark it with T if the byte count of the flow cannot be stored in a 32-bit value.)

C

flow generator created this flow as a continuation of long-running connection, where the previous flow for this connection met a timeout (or a byte threshold in the case of yaf).

Consider a long-running ssh session that exceeds the flow generator’s active timeout. (This is the active timeout since the flow generator creates a flow for a connection that still has activity). The flow generator will create multiple flow records for this ssh session, each spanning some portion of the total session. The first flow record will be marked with a T indicating that it hit the timeout. The second through next-to-last records will be marked with TC indicating that this flow both timed out and is a continuation of a flow that timed out. The final flow will be marked with a C, indicating that it was created as a continuation of an active flow.

application,29

guess as to the content of the flow. Some software that generates flow records from packet data, such as yaf, will inspect the contents of the packets that make up a flow and use traffic signatures to label the content of the flow. SiLK calls this label the application; yaf refers to it as the appLabel. The application is the port number that is traditionally used for that type of traffic (see the /etc/services file on most UNIX systems). For example, traffic that the flow generator recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard HTTP/web port (80).

The following fields provide a way to label the IPs or ports on a record. These fields require external files to provide the mapping from the IP or port to the label:

sType,16

for the source IP address, the value 0 if the address is non-routable, 1 if it is internal, or 2 if it is routable and external. Uses the mapping file specified by the SILK_ADDRESS_TYPES environment variable, or the address_types.pmap mapping file, as described in addrtype(3).

dType,17

as sType for the destination IP address

scc,18

for the source IP address, a two-letter country code abbreviation denoting the country where that IP address is located. Uses the mapping file specified by the SILK_COUNTRY_CODES environment variable, or the country_codes.pmap mapping file, as described in ccfilter(3). The abbreviations are those defined by ISO 3166-1 (see for example https://www.iso.org/iso-3166-country-codes.html or https://en.wikipedia.org/wiki/ISO\3166-1\alpha-2) or the following special codes: -- N/A (e.g. private and experimental reserved addresses); a1 anonymous proxy; a2 satellite provider; o1 other

dcc,19

as scc for the destination IP

src-map-name

label contained in the prefix map file associated with map-name. If the prefix map is for IP addresses, the label is that associated with the source IP address. If the prefix map is for protocol/port pairs, the label is that associated with the protocol and source port. See also the description of the --pmap-file switch below and the pmapfilter(3) manual page.

dst-map-name

as src-map-name for the destination IP address or the protocol and destination port.

sval

as src-map-name when no map-name is associated with the prefix map file

dval

as dst-map-name when no map-name is associated with the prefix map file

Finally, the list of built-in fields may be augmented by the run-time loading of PySiLK code or plug-ins written in C (also called shared object files or dynamic libraries), as described by the --python-file and --plugin switches.

--all-fields

Instruct rwcut to print all known fields. This switch may not be combined with the --fields switch. This switch suppresses error messages from the plug-ins.

--plugin=PLUGIN

Augment the list of fields by using run-time loading of the plug-in (shared object) whose path is PLUGIN. The switch may be repeated to load multiple plug-ins. The creation of plug-ins is described in the silk-plugin(3) manual page. When PLUGIN does not contain a slash (/), rwcut will attempt to find a file named PLUGIN in the directories listed in the FILES section. If rwcut finds the file, it uses that path. If PLUGIN contains a slash or if rwcut does not find the file, rwcut relies on your operating system’s dlopen(3) call to find the file. When the SILK_PLUGIN_DEBUG environment variable is non-empty, rwcut prints status messages to the standard error as it attempts to find and open each of its plug-ins.

--start-rec-num=START_NUM

Begin printing with the START_NUM’th record by skipping the first START_NUM-1 records. The default is 1; that is, to start printing at the first record; START_NUM must be a positive integer. If START_NUM is greater than the number of input records, rwcut only outputs the title. This switch may not be combined with the --tail-recs switch. When using multiple input files, records are treated as a single stream for the purposes of the --start-rec-num, --end-rec-num, --tail-recs, and --num-recs switches. This switch does not affect the records written to the stream specified by --copy-input.

--end-rec-num=END_NUM

Stop printing after the END_NUM’th record. When END_NUM is 0, the default, printing stops once all input records have been printed; that is, END_NUM is effectively infinity. If this value is non-zero, it must not be less than START_NUM. This switch may not be combined with the --tail-recs switch. When using multiple input files, records are treated as a single stream for the purposes of the --start-rec-num, --end-rec-num, --tail-recs, and --num-recs switches. This switch does not affect the records written to the stream specified by --copy-input.

--tail-recs=TAIL_START_NUM

Begin printing once rwcut is TAIL_START_NUM records from end of the input stream, where TAIL_START_NUM is a positive integer. rwcut will print the remaining records in the input stream unless --num-recs is also specified and is less than TAIL_START_NUM. The --tail-recs switch is similar to the --start-rec-num switch except it counts from the end of the input stream. This switch may not be combined with the --start-rec-num and --end-rec-num switches. When using multiple input files, records are treated as a single stream for the purposes of the --start-rec-num, --end-rec-num, --tail-recs, and --num-recs switches. This switch does not affect the records written to the stream specified by --copy-input.

--num-recs=REC_COUNT

Print no more than REC_COUNT records. Specifying a REC_COUNT of 0 will print all records, which is the default. This switch is ignored under the following conditions: When both --start-rec-num and --end-rec-num are specified; when only --end-rec-num is given and END_NUM is less than REC_COUNT; when --tail-recs is specified and TAIL_START_NUM is less than REC_COUNT. When using multiple input files, records are treated as a single stream for the purposes of the --start-rec-num, --end-rec-num, --tail-recs, and --num-recs switches. This switch does not affect the records written to the stream specified by --copy-input.

--dry-run

Causes rwcut to print the column headers and exit. Useful for testing.

--icmp-type-and-code

Unlike TCP or UDP, ICMP messages do not use ports, but instead have types and codes. Specifying this switch will cause rwcut to print, for ICMP records, the message’s type and code in the sPort and dPort columns, respectively. Use of this switch has been discouraged since SiLK 0.9.10. As for SiLK 3.8.1, this switch is deprecated and it will be removed in SiLK 4.0; use the iType and iCode fields instead.

--timestamp-format=FORMAT

Specify the format, timezone, and/or precision (representation of fractional seconds) to use when printing timestamps and the duration. When this switch is not specified, the SILK_TIMESTAMP_FORMAT environment variable is checked for a format, timezone, and precision. If it is empty or contains invalid values, timestamps are printed in the default format with microseconds, and the timezone is UTC unless SiLK was compiled with local timezone support. FORMAT is a comma-separated list of a format, a timezone, and/or a precision in any order. The format is one of:

default

Print the timestamps as YYYY /MM/DDThh:mm:ss.sss.

iso

Print the timestamps as YYYY -MM-DD hh:mm:ss.sss.

m/d/y

Print the timestamps as MM/DD/YYYY hh:mm:ss.sss.

epoch

Print the timestamps as the number of seconds since 00:00:00 UTC on 1970-01-01.

The --timestamp-format switch may change the representation of fractional seconds, or precision, of the timestamp and duration fields from their default of microseconds. Note: When using a precision less than that used by SiLK internally, the printed start time and duration may not equal the printed end time. The available precisions are:

no-frac

Truncate the fractional seconds value on the timestamps and on the duration field. Previously this was called no-msec. Since SiLK 3.23.0.

milli

Print the fractional seconds to 3 decimal places. Since SiLK 3.23.0.

micro

Print the fractional seconds to 6 decimal places. Since SiLK 3.23.0.

nano

Print the fractional seconds to 9 decimal places. Since SiLK 3.23.0.

no-msec

Truncate the fractional seconds value on the timestamps and on the duration field. This is an alias for no-frac and is deprecated as of SiLK 3.23.0.

When a timezone is specified, it is used regardless of the default timezone support compiled into SiLK. The timezone is one of:

utc

Use Coordinated Universal Time to print timestamps.

local

Use the TZ environment variable or the local timezone.

--epoch-time

Print timestamps as epoch time (number of seconds since midnight GMT on 1970-01-01). This switch is equivalent to --timestamp-format=epoch, it is deprecated as of SiLK 3.0.0, and it will be removed in the SiLK 4.0 release.

--ip-format=FORMAT

Specify how IP addresses are printed, where FORMAT is a comma-separated list of the arguments described below. When this switch is not specified, the SILK_IP_FORMAT environment variable is checked for a value and that format is used if it is valid. The default FORMAT is canonical according to whether the individual flow record is marked as IPv4 or IPv6. Since SiLK 3.7.0.

canonical

Print IP addresses in the canonical format. For an IPv4 record, use dot-separated decimal (192.0.2.1). For an IPv6 record, use either colon-separated hexadecimal (2001:db8::1) or a mixed IPv4-IPv6 representation for IPv4-mapped IPv6 addresses (the ::ffff:0:0/96 netblock, e.g., ::ffff:192.0.2.1) and IPv4-compatible IPv6 addresses (the ::/96 netblock other than ::/127, e.g., ::192.0.2.1).

no-mixed

Print IP addresses in the canonical format (192.0.2.1 or 2001:db8::1) but do not used the mixed IPv4-IPv6 representations. For example, use ::ffff:c000:201 instead of ::ffff:192.0.2.1. Since SiLK 3.17.0.

decimal

Print IP addresses as integers in decimal format. For example, print 192.0.2.1 and 2001:db8::1 as 3221225985 and 42540766411282592856903984951653826561, respectively.

hexadecimal

Print IP addresses as integers in hexadecimal format. For example, print 192.0.2.1 and 2001:db8::1 as c00000201 and 20010db8000000000000000000000001, respectively.

zero-padded

Make all IP address strings contain the same number of characters by padding numbers with leading zeros. For example, print 192.0.2.1 and 2001:db8::1 as 192.000.002.001 and 2001:0db8:0000:0000:0000:0000:0000:0001, respectively. For IPv6 addresses, this setting implies no-mixed, so that ::ffff:192.0.2.1 is printed as 0000:0000:0000:0000:0000:ffff:c000:0201. As of SiLK 3.17.0, may be combined with any of the above, including decimal and hexadecimal.

The following arguments modify certain IP addresses prior to printing. These arguments may be combined with the above formats.

map-v4

Change IPv4 addresses to IPv4-mapped IPv6 addresses (addresses in the ::ffff:0:0/96 netblock) prior to formatting. Since SiLK 3.17.0.

unmap-v6

Change any IPv4-mapped IPv6 addresses (addresses in the ::ffff:0:0/96 netblock) to IPv4 addresses prior to formatting. Since SiLK 3.17.0.

The following argument is also available:

force-ipv6

Set FORMAT to map-v4,no-mixed.

--integer-ips

Print IP addresses as integers. This switch is equivalent to --ip-format=decimal, it is deprecated as of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.

--zero-pad-ips

Print IP addresses as fully-expanded, zero-padded values in their canonical form. This switch is equivalent to --ip-format=zero-padded, it is deprecated as of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.

--integer-sensors

Print the integer ID of the sensor rather than its name.

--integer-tcp-flags

Print the TCP flag fields (flags, initialFlags, sessionFlags) as an integer value. Typically, the characters F,S,R,P,A,U,E,C are used to represent the TCP flags.

--no-titles

Turn off column titles. By default, titles are printed.

--no-columns

Disable fixed-width columnar output.

--column-separator=C

Use specified character between columns and after the final column. When this switch is not specified, the default of ’|’ is used.

--no-final-delimiter

Do not print the column separator after the final column. Normally a delimiter is printed.

--delimited

--delimited=C

Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default ’|’.

--print-filenames

Print to the standard error the names of input files as they are opened.

--copy-input=PATH

Copy all binary SiLK Flow records read as input to the specified file or named pipe. PATH may be stdout or - to write flows to the standard output as long as the --output-path switch is specified to redirect rwcut’s textual output to a different location.

--output-path=PATH

Write the textual output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output (and bypass the paging program). If PATH names an existing file, rwcut exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is either sent to the pager or written to the standard output.

--pager=PAGER_PROG

When output is to a terminal, invoke the program PAGER_PROG to view the output one screen full at a time. This switch overrides the SILK_PAGER environment variable, which in turn overrides the PAGER variable. If the --output-path switch is given or if the value of the pager is determined to be the empty string, no paging is performed and all output is written to the terminal.

--ipv6-policy=POLICY

Determine how IPv4 and IPv6 flows are handled when SiLK has been compiled with IPv6 support. When the switch is not provided, the SILK_IPV6_POLICY environment variable is checked for a policy. If it is also unset or contains an invalid policy, the POLICY is mix. When SiLK has not been compiled with IPv6 support, IPv6 flows are always ignored, regardless of the value passed to this switch or in the SILK_IPV6_POLICY variable. The supported values for POLICY are:

ignore

Ignore any flow record marked as IPv6, regardless of the IP addresses it contains. Only records marked as IPv4 will be printed.

asv4

Convert IPv6 flow records that contain addresses in the ::ffff:0:0/96 netblock (that is, IPv4-mapped IPv6 addresses) to IPv4 and ignore all other IPv6 flow records.

mix

Process the input as a mixture of IPv4 and IPv6 flow records.

force

Convert IPv4 flow records to IPv6, mapping the IPv4 addresses into the ::ffff:0:0/96 netblock.

only

Print only flow records that are marked as IPv6 and ignore IPv4 flow records in the input.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwcut searches for the site configuration file in the locations specified in the FILES section.

--legacy-timestamps

--legacy-timestamps=NUM

When NUM is not specified or is 1, this switch is equivalent to --timestamp-format=m/d/y,no-msec. Otherwise, the switch has no effect. This switch is deprecated as of SiLK 3.0.0, and it will be removed in the SiLK 4.0 release.

--xargs

--xargs=FILENAME

Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwcut opens each named file in turn and reads records from it as if the filenames had been listed on the command line.

--help

Print the available options and exit. Specifying switches that add new fields or additional switches before --help will allow the output to include descriptions of those fields or switches.

--help-fields

Print the description and alias(es) of each field and exit. Specifying switches that add new fields before --help-fields will allow the output to include descriptions of those fields.

--version

Print the version number and information about how SiLK was configured, then exit the application.

--pmap-file=PATH

--pmap-file=MAPNAME:PATH

Load the prefix map file located at PATH and create fields named src-map-name and dst-map-name where map-name is either the MAPNAME part of the argument or the map-name specified when the file was created (see rwpmapbuild(1)). If no map-name is available, rwcut names the fields sval and dval. Specify PATH as - or stdin to read from the standard input. The switch may be repeated to load multiple prefix map files, but each prefix map must use a unique map-name. The --pmap-file switch(es) must precede the --fields switch. See also pmapfilter(3).

--pmap-column-width=NUM

When printing a label associated with a prefix map, this switch gives the maximum number of characters to use when displaying the textual value of the field.

--python-file=PATH

When the SiLK Python plug-in is used, rwcut reads the Python code from the file PATH to define additional fields for possible output. This file should call register_field() for each field it wishes to define. For details and examples, see the silkpython(3) and pysilk(3) manual pages.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

The standard output from rwcut resembles the following (with the text wrapped for readability):

            sIP|            dIP|sPort|dPort|pro|\
    10.30.30.31|    10.70.70.71|   80|36761|  6|\

        packets|     bytes|    flags|\
              7|      3227|FS PA    |\

                    sTime| duration|                  eTime|senso|
  2003/01/01T00:00:14.625|    3.959|2003/01/01T00:00:18.584|EDGE1|

The first line of the output is the title line which shows the names of the selected fields; the --no-titles switch will disable the printing of the title line. The second line and onward will contain the printed representation of the records, with one line per record.

A common use of rwcut is to read the output of rwfilter(1). For example, to see representative TCP traffic:

 $ rwfilter --start-date=2002/01/19:00 --end-date=2002/01/19:01     \
        --proto=6 --pass=stdout                                     \
   | rwcut

To see only selected fields, use the --fields switch. For example, to print only the protocol for each record in the input file data.rw, use:

 $ rwcut --fields=proto  data.rw

The silkpython(3) manual page provides examples that use PySiLK to create and print arbitrary fields for rwcut.

The order of the FIELDS is significant, and fields can be repeated. For example, here is a case where in addition to the default fields of 1-12, you also to prefix each row with an integer form of the destination IP and the start time to make processing by another tool (e.g., a spreadsheet) easier. However, within the default fields of 1-12, you want to see dotted-decimal IP addresses. (The num2dot(1) tool converts the numeric fields in column positions three and four to dotted quad IPs.)

 $ rwfilter ... --pass=stdout \
   | rwcut --fields=2,9,1-12 --ip-format=decimal --timestamp-format=epoch \
   | num2dot --ip-field=3,4

Both of the following commands print the title line and the first record in the input stream:

 $ rwcut --num-recs=1  data.rw

 $ rwcut --end-rec-num=1  data.rw

The following prints all records except the first (plus the title):

 $ rwcut --start-rec-num=2  data.rw

These three commands print only the second record:

 $ rwcut --no-title --start-rec-num=2 --num-recs=1  data.rw

 $ rwcut --no-title --start-rec-num=2 --end-rec-num=2  data.rw

 $ rwcut --no-title --end-rec-num=2 --num-recs=1  data.rw

This command prints the title line and the final record in the input stream:

 $ rwcut --tail-recs=1  data.rw

This command prints the next to last record in the input stream:

 $ rwcut --no-title --tail-recs=2 --num-recs=1  data.rw

Using the sIP and dIP fields can be confusing when the file you are examining contains both incoming and outgoing flow records. To make the output more clear, consider using the int-ext-fields(3) plug-in. The plug-in defines four additional fields representing the external IP address, the external port, the internal IP address, and the internal port. The plug-in requires the user to specify which class/type pairs are incoming and which are outgoing. See its manual page for additional information.

 $ rwcut --fields=sip,sport,dip,dport,proto,type \
        --num-rec=8 data.rw
             sIP|sPort|            dIP|dPort|pro|   type|
 192.168.111.201|29617|   172.24.2.123|   53| 17|    out|
    172.24.2.123|   53|192.168.111.201|29617| 17|     in|
 192.168.111.201|29618|  10.252.217.50|   22|  6|    out|
   10.252.217.50|   22|192.168.111.201|29618|  6|     in|
 192.168.204.193|   68|    172.30.2.67|   67| 17|    out|
     172.30.2.67|   67|192.168.204.193|   68| 17|     in|
   10.239.85.193|29897|192.168.228.153|   25|  6|     in|
 192.168.228.153|   25|  10.239.85.193|29897|  6|    out|

 $ export INCOMING_FLOWTYPES=all/in,all/inweb
 $ export OUTGOING_FLOWTYPES=all/out,all/outweb
 $ rwcut --plugin=int-ext-fields.so                         \
        --fields=int-ip,int-port,ext-ip,ext-port,proto,type \
        --num-rec=8 data.rw
          int-ip|int-p|         ext-ip|ext-p|pro|   type|
 192.168.111.201|29617|   172.24.2.123|   53| 17|    out|
 192.168.111.201|29617|   172.24.2.123|   53| 17|     in|
 192.168.111.201|29618|  10.252.217.50|   22|  6|    out|
 192.168.111.201|29618|  10.252.217.50|   22|  6|     in|
 192.168.204.193|   68|    172.30.2.67|   67| 17|    out|
 192.168.204.193|   68|    172.30.2.67|   67| 17|     in|
 192.168.228.153|   25|  10.239.85.193|29897|  6|     in|
 192.168.228.153|   25|  10.239.85.193|29897|  6|    out|

ENVIRONMENT

SILK_IPV6_POLICY

This environment variable is used as the value for --ipv6-policy when that switch is not provided.

SILK_IP_FORMAT

This environment variable is used as the value for --ip-format when that switch is not provided. Since SiLK 3.11.0.

SILK_TIMESTAMP_FORMAT

This environment variable is used as the value for --timestamp-format when that switch is not provided. Since SiLK 3.11.0.

SILK_PAGER

When set to a non-empty string, rwcut automatically invokes this program to display its output a screen at a time. If set to an empty string, rwcut does not automatically page its output.

PAGER

When set and SILK_PAGER is not set, rwcut automatically invokes this program to display its output a screen at a time.

PYTHONPATH

This environment variable is used by Python to locate modules. When --python-file is specified, rwcut must load the Python files that comprise the PySiLK package, such as silk/__init__.py. If this silk/ directory is located outside Python’s normal search path (for example, in the SiLK installation tree), it may be necessary to set or modify the PYTHONPATH environment variable to include the parent directory of silk/ so that Python can find the PySiLK module.

SILK_PYTHON_TRACEBACK

When set, Python plug-ins will output traceback information on Python errors to the standard error.

SILK_COUNTRY_CODES

This environment variable allows the user to specify the country code mapping file that rwcut uses when computing the scc and dcc fields. The value may be a complete path or a file relative to the SILK_PATH. See the FILES section for standard locations of this file.

SILK_ADDRESS_TYPES

This environment variable allows the user to specify the address type mapping file that rwcut uses when computing the sType and dType fields. The value may be a complete path or a file relative to the SILK_PATH. See the FILES section for standard locations of this file.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwcut may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files and plug-ins, rwcut may use this environment variable. See the FILES section for details.

TZ

When the argument to the --timestamp-format switch includes local or when a SiLK installation is built to use the local timezone, the value of the TZ environment variable determines the timezone in which rwcut displays timestamps. (If both of those are false, the TZ environment variable is ignored.) If the TZ environment variable is not set, the machine’s default timezone is used. Setting TZ to the empty string or 0 causes timestamps to be displayed in UTC. For system information on the TZ variable, see tzset(3) or environ(7). (To determine if SiLK was built with support for the local timezone, check the Timezone support value in the output of rwcut --version.)

SILK_PLUGIN_DEBUG

When set to 1, rwcut prints status messages to the standard error as it attempts to find and open each of its plug-ins. In addition, when an attempt to register a field fails, rwcut prints a message specifying the additional function(s) that must be defined to register the field in rwcut. Be aware that the output can be rather verbose.

FILES

$SILK_ADDRESS_TYPES

$SILK_PATH/share/silk/address_types.pmap

$SILK_PATH/share/address_types.pmap

/usr/local/share/silk/address_types.pmap

/usr/local/share/address_types.pmap

Possible locations for the address types mapping file required by the sType and dType fields.

${SILK_CONFIG_FILE}

${SILK_DATA_ROOTDIR}/silk.conf

/data/silk.conf

${SILK_PATH}/share/silk/silk.conf

${SILK_PATH}/share/silk.conf

/usr/local/share/silk/silk.conf

/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

$SILK_COUNTRY_CODES

$SILK_PATH/share/silk/country_codes.pmap

$SILK_PATH/share/country_codes.pmap

/usr/local/share/silk/country_codes.pmap

/usr/local/share/country_codes.pmap

Possible locations for the country code mapping file required by the scc and dcc fields.

${SILK_PATH}/lib64/silk/

${SILK_PATH}/lib64/

${SILK_PATH}/lib/silk/

${SILK_PATH}/lib/

/usr/local/lib64/silk/

/usr/local/lib64/

/usr/local/lib/silk/

/usr/local/lib/

Directories that rwcut checks when attempting to load a plug-in.

NOTES

Fields sTime+msec, eTime+msec, dur+msec, and their aliases (22, 23, 24) were removed in SiLK 3.23.0. Use fields sTime, eTime, and duration instead.

If you are interested in only a few fields, use the --fields option to reduce the volume of data to be produced. For example, if you are checking to see which internal host got hit with the slammer worm (signature: UDP, destPort 1434, pkt size 404), then the following rwfilter, rwcut combination will be much faster than simply using default values:

 $ rwfilter --proto-17 --dport=1434 --bytes-per-packet=404-404      \
   | rwcut --fields=dip,stime

SEE ALSO

rwfilter(1), num2dot(1), rwpmapbuild(1), addrtype(3), ccfilter(3), int-ext-fields(3), pmapfilter(3), silk-plugin(3), silkpython(3), pysilk(3), sensor.conf(5), silk(7), yaf(1), dlopen(3), tzset(3), environ(7)

rwdedupe

Eliminate duplicate SiLK Flow records

SYNOPSIS

  rwdedupe [--ignore-fields=FIELDS] [--packets-delta=NUM]
        [--bytes-delta=NUM] [--stime-delta=FLOAT]
        [--duration-delta=FLOAT]
        [--temp-directory=DIR_PATH] [--buffer-size=SIZE]
        [--note-add=TEXT] [--note-file-add=FILE]
        [--compression-method=COMP_METHOD] [--print-filenames]
        [--output-path=PATH] [--site-config-file=FILENAME]
        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}

  rwdedupe --help

  rwdedupe --help-fields

  rwdedupe --version

DESCRIPTION

rwdedupe reads SiLK Flow records from one or more input sources. Records that appear in the input file(s) multiple times will only appear in the output stream once; that is, duplicate records are not written to the output. The SiLK Flows are written to the file specified by the --output-path switch or to the standard output when the --output-path switch is not provided and the standard output is not connected to a terminal.

Note: As part of its processing, rwdedupe re-orders the records before writing them.

rwdedupe reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwdedupe reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.

By default, rwdedupe will consider one record to be a duplicate of another when all the fields in the records match exactly. From another point on view, any difference in two records results in both records appearing in the output. Note that all means every field that exists on a SiLK Flow record. The complete list of fields is specified in the description of --ignore-fields in the OPTIONS section below.

To have rwdedupe ignore fields in the comparison, specify those fields in the --ignore-fields switch. When --ignore-fields=FIELDS is specified, a record is considered a duplicate of another if all fields except those in FIELDS match exactly. rwdedupe will treat FIELDS as being identical across all records. Put another way, if the only difference between two records is in the FIELDS fields, only one of those records will be written to the output.

The --packets-delta, --bytes-delta, --stime-delta and --duration-delta switches allow for ”fuzziness” in the input. For example, if --stime-delta=NUM is specified and the only difference between two records is in the sTime fields, and the fields are within NUM milliseconds of each other, only one record will be written to the output.

As of SiLK 3.23, the --stime-delta and --duration-delta switches accept a floating point number to allow for sub-millisecond differences to reflect the nanosecond resolution in added in that release. The argument is still specified in term of milliseconds: use --stime-delta=5000 for 5 seconds, --stime-delta=5 for 5 milliseconds, and --stime-delta=0.005 for 5 microseconds.

During its processing, rwdedupe will try to allocate a large (near 2GB) in-memory array to hold the records. (You may use the --buffer-size switch to change this maximum buffer size.) If more records are read than will fit into memory, the in-core records are temporarily stored on disk as described by the --temp-directory switch. When all records have been read, the on-disk files are merged to produce the output.

By default, the temporary files are stored in the /tmp directory. Because of the sizes of the temporary files, it is strongly recommended that /tmp not be used as the temporary directory, and rwdedupe will print a warning when /tmp is used. To modify the temporary directory used by rwdedupe, provide the --temp-directory switch, set the SILK_TMPDIR environment variable, or set the TMPDIR environment variable.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--ignore-fields=FIELDS

Ignore the fields listed in FIELDS when determining if two flow records are identical; that is, treat FIELDS as being identical across all flows. By default, all fields are treated as significant.

FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Field-names are case-insensitive. Example:

      --ignore-fields=stime,12-15

The list of supported fields are:

sIP,1

source IP address

dIP,2

destination IP address

sPort,3

source port for TCP and UDP, or equivalent

dPort,4

destination port for TCP and UDP, or equivalent

protocol,5

IP protocol

packets,pkts,6

packet count

bytes,7

byte count

flags,8

bit-wise OR of TCP flags over all packets

sTime,9

starting time of flow (microseconds resolution)

duration,10

duration of flow (microseconds resolution)

sensor,12

name or ID of sensor at the collection point

in,13

router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))

out,14

router SNMP output interface or postVlanId

nhIP,15

router next hop IP

class,20,type,21

class and type of sensor at the collection point (represented internally by a single value)

initialFlags,26

TCP flags on first packet in the flow

sessionFlags,27

bit-wise OR of TCP flags over all packets except the first in the flow

attributes,28

flow attributes set by flow generator

application,29

guess as to the content of the flow. Some software that generates flow records from packet data, such as yaf(1), will inspect the contents of the packets that make up a flow and use traffic signatures to label the content of the flow. SiLK calls this label the application; yaf refers to it as the appLabel. The application is the port number that is traditionally used for that type of traffic (see the /etc/services file on most UNIX systems). For example, traffic that the flow generator recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard HTTP/web port (80).

--packets-delta=NUM

Treat the packets field on two records as being the same if the values differ by NUM packets or less. If not specified, the default is 0.

--bytes-delta=NUM

Treat the bytes field on two records as being the same if the values differ by NUM bytes or less. If not specified, the default is 0.

--stime-delta=FLOAT

Treat the start-time field on two records as being the same if the values differ by FLOAT milliseconds or less. As of SiLK 3.23, the argument may be floating point number to support sub-millisecond differences. If not specified, the default is 0.

--duration-delta=FLOAT

Treat the duration field on two records as being the same if the values differ by FLOAT milliseconds or less. As of SiLK 3.23, the argument may be floating point number to support sub-millisecond differences. If not specified, the default is 0.

--temp-directory=DIR_PATH

Specify the name of the directory in which to store data files temporarily when more records have been read that will fit into RAM. This switch overrides the directory specified in the SILK_TMPDIR environment variable, which overrides the directory specified in the TMPDIR variable, which overrides the default, /tmp.

--buffer-size=SIZE

Set the maximum size of the buffer to use for holding the records, in bytes. A larger buffer means fewer temporary files need to be created, reducing the I/O wait times. The default maximum for this buffer is near 2GB. The SIZE may be given as an ordinary integer, or as a real number followed by a suffix K, M or G, which represents the numerical value multiplied by 1,024 (kilo), 1,048,576 (mega), and 1,073,741,824 (giga), respectively. For example, 1.5K represents 1,536 bytes, or one and one-half kilobytes. (This value does not represent the absolute maximum amount of RAM that rwdedupe will allocate, since additional buffers will be allocated for reading the input and writing the output.)

--output-path=PATH

Write the binary SiLK Flow records to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwdedupe exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwdedupe to exit with an error.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--print-filenames

Print to the standard error the names of input files as they are opened.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwdedupe searches for the site configuration file in the locations specified in the FILES section.

--xargs

--xargs=FILENAME

Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwdedupe opens each named file in turn and reads records from it as if the filenames had been listed on the command line.

--help

Print the available options and exit.

--help-fields

Print the description and alias(es) of each field and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

LIMITATIONS

When the temporary files and the final output are stored on the same file volume, rwdedupe will require approximately twice as much free disk space as the size of input data.

When the temporary files and the final output are on different volumes, rwdedupe will require between 1 and 1.5 times as much free space on the temporary volume as the size of the input data.

EXAMPLE

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line.

Suppose you have made several rwfilter(1) runs to find interesting traffic:

 $ rwfilter --start-date=2008/02/04 ... --pass=data1.rw
 $ rwfilter --start-date=2008/02/04 ... --pass=data2.rw
 $ rwfilter --start-date=2008/02/04 ... --pass=data3.rw
 $ rwfilter --start-date=2008/02/04 ... --pass=data4.rw

You now want to merge that traffic into a single output file, but you want to ensure that any records appearing in multiple output files are only counted once. You can use rwdedupe to merge the output files to a single file, data.rw:

 $ rwdedupe data1.rw data2.rw data3.rw data4.rw --output=data.rw

ENVIRONMENT

SILK_TMPDIR

When set and --temp-directory is not specified, rwdedupe writes the temporary files it creates to this directory. SILK_TMPDIR overrides the value of TMPDIR.

TMPDIR

When set and SILK_TMPDIR is not set, rwdedupe writes the temporary files it creates to this directory.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwdedupe may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwdedupe may use this environment variable. See the FILES section for details.

SILK_TEMPFILE_DEBUG

When set to 1, rwdedupe prints debugging messages to the standard error as it creates, re-opens, and removes temporary files.

FILES

${SILK_CONFIG_FILE}

${SILK_DATA_ROOTDIR}/silk.conf

/data/silk.conf

${SILK_PATH}/share/silk/silk.conf

${SILK_PATH}/share/silk.conf

/usr/local/share/silk/silk.conf

/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

${SILK_TMPDIR}/

${TMPDIR}/

/tmp/

Directory in which to create temporary files.

SEE ALSO

rwfilter(1), rwfileinfo(1), sensor.conf(5), silk(7), yaf(1), zlib(3)

rwfglob

Print files that rwfilter’s File Selection switches will access

SYNOPSIS

  rwfglob { [--class=CLASS] [--type={all | TYPE[,TYPE ...]}]
            | [--flowtypes=CLASS/TYPE[,CLASS/TYPE ...]] }
        [--sensors=SENSOR[,SENSOR ...]]
        [--start-date=YYYY/MM/DD[:HH] [--end-date=YYYY/MM/DD[:HH]]]
        [--data-rootdir=ROOT_DIRECTORY] [--site-config-file=FILENAME]
        [--print-missing-files] [--no-block-check] [--no-file-names]
        [--no-summary]

  rwfglob [--data-rootdir=ROOT_DIRECTORY]
        [--site-config-file=FILENAME] --help

  rwfglob --version

DESCRIPTION

rwfglob accepts the same File Selection Switches of rwfilter(1) and prints, to the standard output, the pathnames of the files that rwfilter would process, one file name per line. At the end, a summary is printed to the standard output of the number of files that rwfglob found. To suppress the printing of the file names and/or the summary, specify the --no-file-names and/or --no-summary switches, respectively.

By default, rwfglob only prints the names of files that exist. When the --print-missing-files switch is provided, rwfglob prints, to the standard error, the names of files that it did not find, one file name per line, preceded by the text ’Missing ’. To redirect the output of --print-missing-files to the standard output, use the following in a Bourne-compatible shell:

 $ rwfglob --print-missing-files ... 2>&1

Read Selection Argument Values from a File

As of SiLK 3.20, the Selection Switches --class, --type, --flowtypes, and --sensors accept a value in the form ”@PATH”, where @ is the ”at” character (ASCII 0x40) and PATH names a file or a path to a file. For example, the following reads the name of types from the file t.txt and uses the sensors S3, S7, and the names and/or IDs read from /tmp/sensor.txt:

 rwfglob --type=@t.txt --sensors=S3,@/tmp/sensor.txt,S7

Multiple @PATH values are allowed within a single argument. If the name of the file is -, the names are read from the standard input.

The file must be a text file. Blank lines are ignored as are comments, which begin with the # character and continue to the end of the line. Whitespace at the beginning and end of a line is ignored as is whitespace that surrounds commas; all other whitespace within a line is significant.

A file may contain a value on each line and/or multiple values on a line separated by commas and optional whitespace. For example:

 # Sensor 4
       S4
 # The first sensors
 S0, S1,S2
 S3     # Sensor 3

An attempt to use an @PATH directive in a file is an error.

When rwfglob is parsing the name of a file, it converts the sequences @, and @@ to , and @, respectively. For example, --class=@cl@@ss.txt@,v reads the class from the file cl@ss.txt,v. It is an error if any other character follows an embedded @ (--flowtypes=@f@il contains @i) or if a single @ occurs at the end of the name (--sensor=@errat@).

Offline Storage Support

For each file it finds, rwfglob will check the size of the file and the number of blocks allocated to the file. If the block count is zero but the file size is non-zero, rwfglob treats the file as existing but as residing on tape. The names of these files are printed to the standard output, but each name is preceded by the text ’  \t*** ON_TAPE ***’ where ’\t’ represents a tab character. The summary line will include the number of files that rwfglob believes are on tape. To suppress this check and to remove the count from the summary line, use the --no-block-check switch.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

Selection Switches

This set of switches are the same as those used by rwfilter to select the files to process. At least one of these switches must be provided.

--class={CLASS | @PATH}

The --class switch is used to specify a group of files to print. Only a single class may be selected with the --class switch; for multiple classes, use the --flowtypes switch. The argument may be ”@PATH” which causes rwfglob to open the file PATH and read the class name from it; see Read Selection Argument Values from a File for details. Classes are defined in the silk.conf(5) site configuration file. If neither the --class nor --flowtypes option is given, the default-class as specified in silk.conf is used. To see the available classes and the default class, either examine the output from rwfglob --help or invoke rwsiteinfo(1) with the switch --fields=class,default-class.

--type={all | TYPE[,TYPE,@PATH ...]}

The --type predicate further specifies data within the selected CLASS by listing the TYPEs of traffic to process. The switch takes either the keyword all to select all types for CLASS or a comma-separated list of type names and ”@PATH” directives, where @PATH tells rwfglob to read type names from the file PATH; see Read Selection Argument Values from a File for details. Types are defined in silk.conf, they typically refer to the direction of the flow, and they may vary by class. When neither the --type nor --flowtypes switch is given, a list of default types is used: The default-type list is determined by the value of CLASS, and the default types often include only incoming traffic. To see the available types and the default types for each class, examine the --help output of rwfglob or run rwsiteinfo with --fields=class,type,default-type.

--flowtypes=CLASS/TYPE[,CLASS/TYPE,@PATH

...]

The --flowtypes predicate provides an alternate way to specify class/type pairs. The --flowtypes switch allows a single rwfglob invocation to print filenames from multiple classes. The keyword all may be used for the CLASS and/or TYPE to select all classes and/or types. As of SiLK 3.20.0, the arguments may also include ”@PATH” which causes rwfglob to open the file PATH and read the class/type pairs from it; see Read Selection Argument Values from a File.

--sensors=SENSOR[,SENSOR,SENSOR-GROUP,@PATH

...]

The --sensors switch is used to select data from specific sensors. The parameter is a comma separated list of sensor names, sensor IDs (integers), ranges of sensor IDs, sensor group names, and/or ”@PATH” directives. As described in Read Selection Argument Values from a File, @PATH tells rwfglob to read the names of the sensors from the file PATH. Sensors and sensor groups are defined in the silk.conf(5) site configuration file, and the rwsiteinfo(1) command can be used to print a mapping of sensor names to IDs and classes (--fields=sensor,id-sensor,class:list). When the --sensors switch is not specified, the default is to use all sensors which are valid for the specified class(es). Support for using sensor group names was added in SiLK 3.21.0.

--start-date=YYYY/MM/DD[:HH]

--end-date=YYYY/MM/DD[:HH]

The date predicates indicate which days and hours to consider when creating the list of files. The dates may be expressed as seconds since the UNIX epoch or in YYYY/MM/DD[:HH] format, where the hour is optional. A T may be used in place of the : to separate the day and hour. Whether the YYYY/MM/DD[:HH] strings represent times in UTC or the local timezone depend on how SiLK was compiled. To determine how your version of SiLK was compiled, see the Timezone support setting in the output from rwfglob --version.

When times are expressed in YYYY/MM/DD[:HH] format:

When at least one time is expressed as seconds since the UNIX epoch:

When neither --start-date nor --end-date is given, rwfglob prints all files for the current day.

It is an error to specify --end-date without specifying --start-date.

--data-rootdir=ROOT_DIRECTORY

Tell rwfglob to use ROOT_DIRECTORY as the root of the data repository, which overrides the location given in the SILK_DATA_ROOTDIR environment variable, which in turn overrides the location that was compiled into rwfglob (/data).

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwfglob searches for the site configuration file in the locations specified in the FILES section.

--print-missing-files

This option prints to the standard error the names of the files that rwfglob expected to find but did not. The file names are preceded by the text ’Missing ’; each file name appears on a separate line. This switch is useful for debugging, but the list of files it produces can be misleading. For example, suppose there is a decommissioned sensor that still appears in the silk.conf file; rwfglob considers these data files as missing even though their absence is expected. Use the output from this switch judiciously.

Application Switches
--no-block-check

This option instructs rwfglob not to check whether the file exists on tape by checking whether the number of blocks allocated to the file is zero. By default, rwfglob precedes a file name that has a block count of 0 with the text ’  \t*** ON_TAPE ***’.

--no-file-names

This option instructs rwfglob not to print the names of the files that it successfully finds. By default, rwfglob prints the names of the files it finds and a summary line showing the number of files it found. When both this switch and --print-missing-files are specified, rwfglob prints only the names of missing files (and the summary).

--no-summary

This option instructs rwfglob not to print the summary line (that is, the line that shows the number of files found). By default, rwfglob prints the names of the files it finds and a summary line showing the number of files it found.

--help

Print the available options and exit. The available classes and types will be included in output; you may specify a different root directory or site configuration file before --help to see the classes and types available for that site.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line.

Looking at a day on a single sensor:

 $ rwfglob --start=2003/10/11 --sensor=2
 /data/in/2003/10/11/in-GAMMA_20031011.23
 /data/in/2003/10/11/in-GAMMA_20031011.22
 /data/in/2003/10/11/in-GAMMA_20031011.21
 /data/in/2003/10/11/in-GAMMA_20031011.20
 /data/in/2003/10/11/in-GAMMA_20031011.19
 /data/in/2003/10/11/in-GAMMA_20031011.18
 /data/in/2003/10/11/in-GAMMA_20031011.17
 /data/in/2003/10/11/in-GAMMA_20031011.16
 /data/in/2003/10/11/in-GAMMA_20031011.15
 /data/in/2003/10/11/in-GAMMA_20031011.14
 /data/in/2003/10/11/in-GAMMA_20031011.13
 /data/in/2003/10/11/in-GAMMA_20031011.12
 /data/in/2003/10/11/in-GAMMA_20031011.11
 /data/in/2003/10/11/in-GAMMA_20031011.10
 /data/in/2003/10/11/in-GAMMA_20031011.09
 /data/in/2003/10/11/in-GAMMA_20031011.08
 /data/in/2003/10/11/in-GAMMA_20031011.07
 /data/in/2003/10/11/in-GAMMA_20031011.06
 /data/in/2003/10/11/in-GAMMA_20031011.05
 /data/in/2003/10/11/in-GAMMA_20031011.04
 /data/in/2003/10/11/in-GAMMA_20031011.03
 /data/in/2003/10/11/in-GAMMA_20031011.02
 /data/in/2003/10/11/in-GAMMA_20031011.01
 /data/in/2003/10/11/in-GAMMA_20031011.00
 globbed 24 files; 0 on tape

If you only want the summary, specify --no-file-names

 $ rwfglob --start-date=2003/10/11 --sensor=2 --no-file-names
 globbed 24 files; 0 on tape

ENVIRONMENT

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. This value overrides the compiled-in value, and rwfglob uses it unless the --data-rootdir switch is specified. In addition, rwfglob may use this value when searching for the SiLK site configuration file. See the FILES section for details.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwfglob may use this environment variable. See the FILES section for details.

TZ

When a SiLK installation is built to use the local timezone (to determine if this is the case, check the Timezone support value in the output from rwfglob --version), the value of the TZ environment variable determines the timezone in which rwfglob parses timestamps. (The date on the filenames that rwfglob returns are always in UTC.) If the TZ environment variable is not set, the default timezone is used. Setting TZ to 0 or the empty string causes timestamps to be parsed as UTC. The value of the TZ environment variable is ignored when the SiLK installation uses utc. For system information on the TZ variable, see tzset(3) or environ(7).

FILES

${SILK_CONFIG_FILE}

ROOT_DIRECTORY/silk.conf

${SILK_PATH}/share/silk/silk.conf

${SILK_PATH}/share/silk.conf

/usr/local/share/silk/silk.conf

/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided, where ROOT_DIRECTORY/ is the directory rwfglob is using as the root of the data repository.

${SILK_DATA_ROOTDIR}/

/data/

Locations for the root directory of the data repository when the --data-rootdir switch is not specified.

SEE ALSO

rwfilter(1), rwsiteinfo(1), silk.conf(5), silk(7), tzset(3), environ(7)

NOTES

The ability to use @PATH in --class, --type, --flowtypes, and --sensors was added in SiLK 3.20.0.

As of SiLK 3.20.0, --types is an alias for --type.

The --sensors switch also accepts the names of groups defined in the silk.conf(5) file as of SiLK 3.21.0.

The output of --print-missing-files goes to the standard error, while all other output goes to the standard output. To redirect the output of --print-missing-files to the standard output, use the following in a Bourne-compatible shell:

 $ rwfglob --print-missing-files ... 2>&1

The --print-missing-files option needs to be smarter about what files are really missing.

The block count check is of unknown portability across different tape-farm systems.

rwfileinfo

Print information about a SiLK file

SYNOPSIS

  rwfileinfo [--fields=FIELDS] [--summary] [--no-titles]
        [--site-config-file=FILENAME]
        {--xargs | --xargs=FILENAME | FILE [FILE...]}

  rwfileinfo --help

  rwfileinfo --help-fields

  rwfileinfo --version

DESCRIPTION

rwfileinfo prints information about a binary SiLK file that can be determined by reading the file’s header and by moving quickly over the data blocks in the file.

rwfileinfo requires one or more filename arguments to be given on the command line or the use of the --xargs switch. When the --xargs switch is provided, rwfileinfo reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line. rwfileinfo does not read a SiLK file’s content from the standard input by default, but it does when either - or stdin is given as a filename argument.

When the --summary switch is given, rwfileinfo first prints the information for each individual file and then prints the number of files processed, the sum of the individual file sizes, and the sum of the individual record counts.

Field Descriptions

By default, rwfileinfo prints the following information for each file argument. Use the --fields switch to modify which pieces of information are printed.

(rwfileinfo prints each field in the order in which support for that field was added to SiLK. The field descriptions are presented here in a more logical order.)

file-size

The size of the file on disk as reported by the operating system. rwfileinfo prints 0 for the file-size when reading from the standard input.

version

Every binary file written by SiLK has a version number field. Since SiLK 1.0.0, the version number field has been used to indicate the general structure (or layout) of the file. The file structure adopted in SiLK 1.0.0 uses a version number of 16 and has a header section and a data section. The header section begins with 16 bytes that specify well-defined values, and those bytes are followed by one or more variably-sized header entries. The specifics of the data section depend on the content of the file.

header-length

The header-length field shows the number of octets required by header (i.e., the initial 16 bytes and the header entries). Since everything after the header is data, the header-length is the starting offset of the data section. The smallest header length is 24 bytes, but typically the header is padded to be an integer multiple of the record-length. The header-length that rwfileinfo prints for a file is determined dynamically by reading the file’s header.

silk-version

When a SiLK tool creates a binary file, the tool writes the current SiLK release number (such as 3.9.0) into the file’s header as a way to help diagnose issues should a bug with a particular release of SiLK be discovered in the future.

byte-order

Every SiLK file has a byte-order or endian field. SiLK uses the machine’s native representation of integers when writing data, and this field shows what representation the file contains. BigEndian is network byte order and littleEndian is used by Intel chips. The rwswapbytes(1) tool changes a file’s integer representation, and some tools have a --byte-order switch that allows the user to specify the integer representation of output files. The header-section of a file is always written in network byte order.

compression

SiLK tools may use the zlib library ( http://zlib.net/), the LZO library (http://www.oberhumer.com/opensource/lzo/), or the snappy library (http://google.github.io/snappy/) to compress the data section of a file. The compression field specifies which library (if any) was used to compress the data section. If a file is compressed with a library that was not included in an installation of SiLK, SiLK is unable to read the data section of the file. Many SiLK tools accept the --compression-method switch to choose a particular compression method. (The compression field does not indicate whether the entire file has been compressed with an external compression utility such as gzip(1).)

format

Every binary file written by SiLK has two fields in the header that specify exactly what the file contains: the format and the record-version. In general, the format indicates the content type of the file and the record-version indicates the evolution of that content.

The contents of a file whose format is FT_IPSET, FT_RWBAG, or FT_PREFIXMAP is fairly obvious (an IPset, a Bag, a prefix map).

There are many different file formats for writing SiLK Flow records, but the SiLK analysis tools largely use a single Flow file format. That format is FT_RWIPV6ROUTING if SiLK has been compiled with IPv6 support, or FT_RWGENERIC otherwise. A file that uses the FT_RWGENERIC format is only capable of holding IPv4 addresses.

The other SiLK Flow file formats are created by rwflowpack(8) as it writes flow records to the repository. These formats often omit fields and use reduced bit-sizes for fields to reduce the space required for an individual flow record.

The record-version field indicates changes within the general type specified by the format field. For example, SiLK incremented the record-version of the formats that hold flow records when the resolution of record timestamps changed from seconds to milliseconds and again from milliseconds to nanoseconds.

record-version

Together with the format fields specifies the contents of the file. See the discussion of format for details.

record-length

Files created by SiLK 1.0.0 and later have a record length field. This field contains the length of an individual record, and this value is dependent on the format and record-version fields described above. Some files (such as those containing IPsets or prefix maps) do not write individual records to the output, and the record length is 1 for these files.

count-records

The count-records field is generated dynamically by determining the length the data section would require if it were completely uncompressed and dividing it by the record-length. When the record-length is 1 (such as for IPset files), the count-records field does not provide much information beyond the length of the uncompressed data. For an uncompressed file, adding header-length to the product of count-records and record-length is equal to the file-size.

The fields given above are either present in the well-defined header or are computed by reading the file.

The following fields are generated by reading the header entries and determining if one or more header entries of the specified type are present. The field is not printed in the output when the header entry is not present in the file.

command-lines

Many of the SiLK tools write a header entry to the output file that contains the command line invocation used to create that file, and some of the SiLK tools also copy the command line history from their input files to the output file. (The --invocation-strip switch on the tools can be used to prevent copying and recording of the invocation.) The command lines are stored in individual header entries and this field displays those entries with the most recent invocation at the end of the list.

The command line history is has a couple of issues:

annotations

Most of SiLK tools that create binary output files provide the --note-add and --note-file-add switches which allow an arbitrary annotation to be added to the header of a file. Some tools also copy the annotations from the source files to the destination files. The annotations are stored in individual header entries and this field displays those entries.

ipset

The IPset writing tools (rwset(1), rwsetbuild(1), rwsettool(1), rwaggbagtool(1), and rwbagtool(1)) support the following output formats for IPset data structures:

 2 

May hold only IPv4 addresses and does not have an ipset header entry.

 3 

May hold IPv4 or IPv6 addresses and is readable by SiLK 3.0 and later. It contains a header entry that describes the IPset data structure, and the entry specifies the number of nodes, the number of branches from each node, the number of leaves, the size of the nodes and leaves, and which node is the root of the tree.

 4 

May hold IPv4 or IPv6 addresses and is readable by SiLK 3.7 and later. The file’s header entry specifies whether the file contains IPv4 addresses or IPv6 addresses.

 5 

May hold only IPv6 addresses and is readable by SiLK 3.14 and later. The header entry specifies that the file contains IPv6 data.

bag

Since SiLK 3.0.0, the tools that write binary Bag files (rwbag(1), rwbagbuild(1), and rwbagtool(1)) have written a header entry that specifies the type and size of the key and of the counter in the file.

aggregate-bag

The tools rwaggbag(1), rwaggbagbuild(1), and rwaggbagtool(1) write a header entry that contains the field types that comprise the key and the counter.

prefix-map

When using rwpmapbuild(1) to create a prefix map file, a string that specifies a mapname may be provided. rwpmapbuild writes the mapname to a header entry in the prefix map file. The mapname is used to generate command line switches or field names when the --pmap-file switch is specified to several of the SiLK tools (see pmapfilter(3) for details). When displaying the mapname, rwfileinfo prefixes it with the string v1: which denotes a version number for the prefix-map header entry. (The version number is printed for completeness.)

packed-file-info

When rwflowpack(8) creates a SiLK Flow file for the repository, all the records in the file have the same starting hour, the same sensor, and the same flowtype (class/type pair). rwflowpack writes a header entry to the file that contains these values, and this field displays those values. (To print the names for the sensor and flowtype, the silk.conf(5) file must be accessible.)

probe-name

When flowcap(8) creates a SiLK flow file, it adds a header entry specifying the name of the probe from which the data was collected.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--fields=FIELDS

Specify what information to print for each file argument on the command line. FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Field-names are case-insensitive and may be shortened to a unique prefix. When the --fields option is not given, all fields are printed if the file contains the necessary information. The fields are always printed in the order they appear here regardless of the order they are specified in FIELDS.

The possible field values are given next with a brief description of each. For a full description of each field, see Field Descriptions above.

format,1

The contents of the file as a name and the corresponding hexadecimal ID.

version,2

An integer describing the layout or structure of the file.

byte-order,3

Either BigEndian or littleEndian to indicate the representation used to store integers in the file (network or non-network byte order).

compression,4

The compression library (if any) used to compress the data-section of the file, specified as a name and its decimal ID.

header-length,5

The octet length of the file’s header; alternatively the offset where data begins.

record-length,6

The octet length of a single record or the value 1 if the file’s content is not record-based.

count-records,7

The number of records in the file, computed by dividing the uncompressed data length by the record-length.

file-size,8

The size of the file on disk as reported by the operating system.

command-lines,9

The command line invocation used to generate this file.

record-version,10

The version of the records contained in the file.

silk-version,11

The release of SiLK that wrote this file.

packed-file-info,12

For a repository Flow file generated by rwflowpack(8), this prints the timestamp of the starting hour, the flowtype, and the sensor of each flow record in the file.

probe,13

For a Flow file generated by flowcap(8), the name of the probe where the flow records where initially collected.

annotations,14

The notes (annotations) that users have added to the file’s header.

prefix-map,15

For a prefix map file, the mapname that was set when the file was created by rwpmapbuild(1).

ipset,16

For an IPset file whose record-version is 3, a description of the tree data structure. For an IPset file whose record-version is 4, the type of IP addresses (IPv4 or IPv6).

bag,17

For a bag file, the type and size of the key and of the counter.

aggregate-bag,18

For an aggregate bag file, the field types that comprise the key and the counter.

--summary

After the data for each individual file is printed, print a summary that shows the number of files processed, the sum of the individual file sizes, and the total number of records contained in those files.

--no-titles

Suppress printing of the file name and field names. The output contains only the values, where each value is printed left-justified on a single line.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwfileinfo searches for the site configuration file in the locations specified in the FILES section.

--xargs

--xargs=FILENAME

Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwfileinfo opens each named file in turn and prints its information as if the filenames had been listed on the command line. Since SiLK 3.15.0.

--help

Print the available options and exit.

--help-fields

Print a description of each field, its alias, and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLE

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line.

Get information about the file tcp-data.rw:

 $ rwfileinfo tcp-data.rw
 tcp-data.rw:
   format(id)          FT_RWGENERIC(0x16)
   version             16
   byte-order          littleEndian
   compression(id)     none(0)
   header-length       208
   record-length       52
   record-version      5
   silk-version        1.0.1
   count-records       7
   file-size           572
   command-lines
                    1  rwfilter --proto=6 --pass=tcp-data.rw ...
   annotations
                    1  This is some interesting TCP data

Return a single value which is the number of records in the file tcp-data.rw:

 $ rwfileinfo --no-titles --field=count-records tcp-data.rw
 7

ENVIRONMENT

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the FILES section, rwfileinfo may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwfileinfo may use this environment variable. See the FILES section for details.

FILES

${SILK_CONFIG_FILE}

${SILK_DATA_ROOTDIR}/silk.conf

/data/silk.conf

${SILK_PATH}/share/silk/silk.conf

${SILK_PATH}/share/silk.conf

/usr/local/share/silk/silk.conf

/usr/local/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwfilter(1), rwaggbag(1), rwaggbagbuild(1), rwaggbagtool(1), rwbag(1), rwbagbuild(1), rwbagtool(1), rwpmapbuild(1), rwset(1), rwsetbuild(1), rwsettool(1) rwswapbytes(1), silk.conf(5), pmapfilter(3), flowcap(8), rwflowpack(8), silk(7), gzip(1)

rwfilter

Choose which SiLK Flow records to process

SYNOPSIS

  rwfilter INPUT_ARGS OUTPUT_ARGS PARTITIONING_ARGS [MISC_ARGS]

Selection switches, input switches, or input files are required:

  rwfilter ...
        {{ [--class=CLASS] [--type={all | TYPE[,TYPE ...]}]
           | [--flowtypes=CLASS/TYPE[,CLASS/TYPE ...]] }
         [--sensors=SENSOR[,SENSOR ...]]
         [--start-date=YYYY/MM/DD[:HH] [--end-date=YYYY/MM/DD[:HH]]]
         [--data-rootdir=ROOT_DIRECTORY] [--print-missing-files] }
        | [--input-pipe=INPUT_PATH]
        | [--xargs] | [--xargs=INPUT_PATH]
        | [INPUT_PATH [INPUT_PATH...]]

One or more output switches are required:

  rwfilter ...
        [--all-destination=ALL_PATH [--all-destination=ALL_PATH ...]]
        [--fail-destination=FAIL_PATH [--fail-destination=FAIL_PATH ...]]
        [--pass-destination=PASS_PATH [--pass-destination=PASS_PATH ...]]
        [{ --print-statistics[=STATS_PATH]
           | --print-volume-statistics[=STATS_PATH] }]

One or more partitioning switches are often used:

  rwfilter ...
        [--ack-flag=SCALAR] [--active-time=TIME_WINDOW]
        [{--any-address=IP_WILDCARD | --not-any-address=IP_WILDCARD}]
        [--any-cc=COUNTRY_CODE_LIST]
        [{--any-cidr=IP_OR_CIDR_LIST | --not-any-cidr=IP_OR_CIDR_LIST}]
        [--any-index=INTEGER_LIST]
        [{--anyset=IP_SET_FILENAME | --not-anyset=IP_SET_FILENAME}]
        [--aport=INTEGER_LIST] [--application=INTEGER_LIST]
        [--attributes=ATTRIBUTES_LIST]
        [--bytes=INTEGER_RANGE] [--bytes-per-packet=DECIMAL_RANGE]
        [--cwr-flag=SCALAR]
        [{--daddress=IP_WILDCARD | --not-daddress=IP_WILDCARD}]
        [--dcc=COUNTRY_CODE_LIST]
        [{--dcidr=IP_OR_CIDR_LIST | --not-dcidr=IP_OR_CIDR_LIST}]
        [{--dipset=IP_SET_FILENAME | --not-dipset=IP_SET_FILENAME}]
        [--dport=INTEGER_LIST] [--dtype=SCALAR]
        [--duration=DECIMAL_RANGE] [--ece-flag=SCALAR]
        [--etime=TIME_WINDOW] [--fin-flag=SCALAR]
        [--flags-all=HIGH_MASK_FLAGS_LIST]
        [--flags-initial=HIGH_MASK_FLAGS_LIST]
        [--flags-session=HIGH_MASK_FLAGS_LIST]
        [--icmp-code=INTEGER_LIST] [--icmp-type=INTEGER_LIST]
        [--input-index=INTEGER_LIST] [--ip-version=INTEGER_LIST]
        [--ipa-src-expr=IPA_EXPR] [--ipa-dst-expr=IPA_EXPR]
        [--ipa-any-expr=IPA_EXPR]
        [{--next-hop-id=IP_WILDCARD | --not-next-hop-id=IP_WILDCARD}]
        [{--nhcidr=IP_OR_CIDR_LIST | --not-nhcidr=IP_OR_CIDR_LIST}]
        [{--nhipset=IP_SET_FILENAME | --not-nhipset=IP_SET_FILENAME}]
        [--output-index=INTEGER_LIST] [--packets=INTEGER_RANGE]
        [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]
         { [--pmap-src-MAPNAME=LABELS] [--pmap-dst-MAPNAME=LABELS]
           [--pmap-any-MAPNAME=LABELS] } ]
        [--protocol=INTEGER_LIST] [--psh-flag=SCALAR]
        [--python-expr=PYTHON_EXPR]
        [--python-file=FILENAME [--python-file=FILENAME ...]]
        [--rst-flag=SCALAR]
        [{--saddress=IP_WILDCARD | --not-saddress=IP_WILDCARD}]
        [--scc=COUNTRY_CODE_LIST]
        [{--scidr=IP_OR_CIDR_LIST | --not-scidr=IP_OR_CIDR_LIST}]
        [{--sipset=IP_SET_FILENAME | --not-sipset=IP_SET_FILENAME}]
        [--sport=INTEGER_LIST] [--stime=TIME_WINDOW] [--stype=SCALAR]
        [--syn-flag=SCALAR] [--tcp-flags=TCP_FLAGS]
        [--tuple-file=TUPLE_FILENAME { [--tuple-fields=FIELDS]
                                       [--tuple-direction=DIRECTION]
                                       [--tuple-delimiter=CHAR] } ]
        [--urg-flag=SCALAR]

Miscellaneous switches:

  rwfilter ...
        [--compression-method=COMP_METHOD] [--dry-run]
        [--max-fail-records=N] [--max-pass-records=N]
        [--note-add=TEXT] [--note-file-add=FILE]
        [--plugin=PLUGIN [--plugin=PLUGIN ...]]
        [--print-filenames] [--site-config-file=FILENAME]
        [--threads=N]

Help switches:

  rwfilter [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
        [--plugin=PLUGIN ...] [--python-file=PATH]
        [--data-rootdir=ROOT_DIRECTORY] [--site-config-file=FILENAME]
        --help

  rwfilter --version

DESCRIPTION

rwfilter serves two purposes: (1) It acts as an interface to the data store to select which SiLK Flow records to process, and (2) it partitions those records into one or more pass and/or fail streams. Most invocations of rwfilter will both select and partition records but both actions are not required.

The Selection Switches let one choose flow records from the SiLK data store by specifying where the flow was collected (its sensor), the date of collection, and/or the flow’s direction. The act of selecting records from the data store is sometimes called a ”data pull”. If the --all-destination switch is given, all these selected records are written to the named stream (a file or the standard output), and partitioning is optional.

The Partitioning Switches describe various types of traffic behavior (e.g., TCP traffic, or all traffic going to port 80). When a flow record matches all of the behaviors, it is written to the streams specified by the --pass-destination switches. If a record fails to match any of these behavior predicates, it is written to the streams specified by --fail-destination.

The all, pass, and fail output streams from rwfilter are always binary SiLK Flow records. The output must be either written to a file or piped into another tool in the SiLK Suite, and rwfilter complains if it determines you are attempting to send the stream to a terminal. To view the records, pipe the records into rwcut(1).

In addition to the partitioning switches built in to rwfilter, additional partitioning predicates can be created as C or PySiLK plug-ins, and these can be loaded into rwfilter using the --plugin and/or --python-file switches as described below.

Instead of using the selection switches to choose flow records from the data store, rwfilter can apply the partitioning switches to existing files of SiLK flow records---such as files generated by a previous invocation of rwfilter. To run rwfilter in this mode, you may

When rwfilter is reading flow records from input files, some of the selection switches act as partitioning switches. The remaining selection switches may not be specified when using the alternate forms of input, and it is an error to specify multiple types of input.

Unlike many other tools in the SiLK tool suite, rwfilter requires that you specify one or more Output Switches that tell rwfilter what types of output to produce.

Finally, there are Miscellaneous Switches that control other aspects of rwfilter.

Read Selection Argument Values from a File

As of SiLK 3.20, the Selection Switches --class, --type, --flowtypes, and --sensors accept a value in the form ”@PATH”, where @ is the ”at” character (ASCII 0x40) and PATH names a file or a path to a file. For example, the following reads the name of types from the file t.txt and uses the sensors S3, S7, and the names and/or IDs read from /tmp/sensor.txt:

 rwfilter --type=@t.txt --sensors=S3,@/tmp/sensor.txt,S7 ...

Multiple @PATH values are allowed within a single argument. If the name of the file is -, the names are read from the standard input.

The file must be a text file. Blank lines are ignored as are comments, which begin with the # character and continue to the end of the line. Whitespace at the beginning and end of a line is ignored as is whitespace that surrounds commas; all other whitespace within a line is significant.

A file may contain a value on each line and/or multiple values on a line separated by commas and optional whitespace. For example:

 # Sensor 4
       S4
 # The first sensors
 S0, S1,S2
 S3     # Sensor 3

An attempt to use an @PATH directive in a file is an error.

When rwfilter is parsing the name of a file, it converts the sequences @, and @@ to , and @, respectively. For example, --class=@cl@@ss.txt@,v reads the class from the file cl@ss.txt,v. It is an error if any other character follows an embedded @ (--flowtypes=@f@il contains @i) or if a single @ occurs at the end of the name (--sensor=@errat@).

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

Selection Switches

To read files from the data store, use the following options to specify which files to process. When rwfilter gets its input from files listed on the command line or from the --xargs or --input-pipe switches, the first four switches (--class, --type, --flowtypes, and --sensors) act as partitioning switches, and specifying any other selection switch produces an error.

--class={CLASS | @PATH}

The --class switch is used to specify a group of data files to process. Only a single class may be selected with the --class switch; for multiple classes, use the --flowtypes switch. The argument may be ”@PATH” which causes rwfilter to open the file PATH and read the class name from it; see Read Selection Argument Values from a File for details. Classes are defined in the silk.conf(5) site configuration file. If neither the --class nor --flowtypes option is given, the default-class as specified in silk.conf is used. To see the available classes and the default class, either examine the output from rwfilter --help or invoke rwsiteinfo(1) with the switch --fields=class,default-class.

--type={all | TYPE[,TYPE,@PATH ...]}

The --type predicate further specifies data within the selected CLASS by listing the TYPEs of traffic to process. The switch takes either the keyword all to select all types for CLASS or a comma-separated list of types names and ”@PATH” directives, where @PATH tells rwfilter to read type names from the file PATH; see Read Selection Argument Values from a File for details. Types are defined in silk.conf, they typically refer to the direction of the flow, and they may vary by class. When neither the --type nor --flowtypes switch is given, a list of default types is used: The default-type list is determined by the value of CLASS, and the default types often include only incoming traffic. To see the available types and the default types for each class, examine the --help output of rwfilter or run rwsiteinfo with --fields=class,type,default-type.

--flowtypes=CLASS/TYPE[,CLASS/TYPE,@PATH

...]

The --flowtypes predicate provides an alternate way to specify class/type pairs. The --flowtypes switch allows a single rwfilter invocation to process data from multiple classes. The keyword all may be used for the CLASS and/or TYPE to select all classes and/or types. As of SiLK 3.20.0, the arguments may also include ”@PATH” which causes rwfilter to open the file PATH and read the class/type pairs from it; see Read Selection Argument Values from a File.

--sensors=SENSOR[,SENSOR,SENSOR-GROUP,@PATH

...]

The --sensors switch is used to select data from specific sensors. The parameter is a comma separated list of sensor names, sensor IDs (integers), ranges of sensor IDs, sensor group names, and/or ”@PATH” directives. As described in Read Selection Argument Values from a File, @PATH tells rwfilter to read the names of the sensors from the file PATH. Sensors and sensor groups are defined in the silk.conf(5) site configuration file, and the rwsiteinfo(1) command can be used to print a mapping of sensor names to IDs and classes (--fields=sensor,id-sensor,class:list)). When the --sensors switch is not specified, the default is to use all sensors which are valid for the specified class(es).

--start-date=YYYY/MM/DD[:HH]

--end-date=YYYY/MM/DD[:HH]

The date predicates indicate which days and hours to consider when creating the list of files. The dates may be expressed as seconds since the UNIX epoch or in YYYY/MM/DD[:HH] format, where the hour is optional. A T may be used in place of the : to separate the day and hour. Whether the YYYY/MM/DD[:HH] strings represent times in UTC or the local timezone depend on how SiLK was compiled. To determine how your version of SiLK was compiled, see the Timezone support setting in the output from rwfilter --version.

When times are expressed in YYYY/MM/DD[:HH] format:

When at least one time is expressed as seconds since the UNIX epoch:

When neither --start-date nor --end-date is given, rwfilter processes all files for the current day.

It is an error to specify --end-date without specifying --start-date.

It is an error to specify --start-date when rwfilter believes there is some other input specified (see Non-Selection Input Switches).

--data-rootdir=ROOT_DIRECTORY

Tell rwfilter to use ROOT_DIRECTORY as the root of the data repository, which overrides the location given in the SILK_DATA_ROOTDIR environment variable, which in turn overrides the location that was compiled into rwfilter (/data). It is an error to specify this switch when files are specified on the command line or Non-Selection Input Switches are given.

--print-missing-files

This option prints to the standard error the names of the files that rwfilter’s file selection switches expected to find but did not. The file names are preceded by the text ’Missing ’; each file name appears on a separate line. This switch is useful for debugging, but the list of files it produces can be misleading. For example, suppose there is a decommissioned sensor that still appears in the silk.conf file; rwfilter considers these data files as missing even though their absence is expected. Use the output from this switch judiciously. It is an error to specify this switch when files are specified on the command line or Non-Selection Input Switches are given.

Non-Selection Input Switches

Instead of using the Selection Switches to read flow records from files in the data store, you can tell rwfilter to process files named on the command line or use one (and only one) of the following switches. To have rwfilter read flow records from the standard input, specify stdin or - as the name of an input file or use the (deprecated) --input-pipe switch.

--xargs

--xargs=INPUT_PATH

Read the names of the input files from INPUT_PATH or from the standard input if INPUT_PATH is not provided. The input is expected to have one filename per line. rwfilter opens each named file in turn and reads records from it as if the filenames had been listed on the command line.

--input-pipe=INPUT_PATH

Specify a source for SiLK Flow records, where INPUT_PATH is a named pipe or the string stdin or - to represent the standard input. You do not need to use this switch, you can simply specify the named pipe or the strings stdin or - on the command line. NOTE: This switch is deprecated, and it will be removed in the SiLK 4.0 release.

Output Switches

At least one of the following output switches must be provided:

--all-destination=ALL_PATH

Write every SiLK Flow record to ALL_PATH, where ALL_PATH refers to a file, a named pipe, the string stderr to refer to the standard error, or the strings stdout or - to refer to the standard output. This switch may be repeated to write all input records to multiple locations. It is not necessary to specify Partitioning Switches when --all-destination is given and --fail-destination and --pass-destination are not.

--fail-destination=FAIL_PATH

Write SiLK Flow records that have failed ANY of the partitioning predicates to FAIL_PATH, where FAIL_PATH refers to a non-existent file, a named pipe, the string stderr to refer to the standard error, or the strings stdout or - to refer to the standard output. This switch may be repeated to write records that fail any predicate to multiple locations. When using --fail-destination, partitioning switches are required.

--pass-destination=PASS_PATH

Write SiLK Flow records that have passed ALL of the partitioning predicates to PASS_PATH, where PASS_PATH refers to a non-existent file, a named pipe, the string stderr to refer to the standard error, or the strings stdout or - to refer to the standard output. This switch may be repeated to write records that pass every predicate to multiple locations. When using --pass-destination, partitioning switches are required.

--print-statistics

--print-statistics=STATS_PATH

Print a one line summary specifying the number of files processed, the total number of records read, the number of records that passed all partitioning predicates, and the number of records that failed. If STATS_PATH is provided, the summary is printed there; otherwise it is printed to the standard error. This switch cannot be mixed with --print-volume-statistics. When running rwfilter with multiple threads and --max-pass-records or --max-fail-records is specified, the statistics may not match the number of records written by rwfilter. When using this switch, either partitioning switches or --all-destination is required.

--print-volume-statistics

--print-volume-statistics=STATS_PATH

Print a four line summary of rwfilter’s processing. For each of all records, records that pass all the partitioning predicates, and records that fail, print the number of flow records and the number of packets and bytes represented by those flow records. The output also includes the number of files processed. If STATS_PATH is provided, the summary is printed there; otherwise it is printed to the standard error. This switch cannot be mixed with --print-statistics. When running rwfilter with multiple threads and --max-pass-records or --max-fail-records is specified, the statistics may not match the number of records written by rwfilter. When using this switch, either partitioning switches or --all-destination is required.

Partitioning Switches

rwfilter supports the following partitioning switches, at least one of which must be specified (unless the only Output Switch is --all-destination). The switches are AND’ed together; i.e., to pass the filter, the record must pass the test implied by each switch. Any record that does not pass is written to the fail-destination(s), if specified.

Each partitioning switch defines a test. These tests can be grouped into several broad categories; within each category, the tests are applied in the order in which the switches appear on the command line. The categories of the partitioning tests are:

Partitioning Switches for IP Addresses

There are three families of switches that partition based on an IP address. Each family can partition by the source IP, the destination IP, the next hop IP, or either source or destination IP. Each family includes a --not-* variant to reverse the sense of the test.

The --*cidr-family takes as its argument an IP_OR_CIDR_LIST, which is a one or more of the following separated by commas: an IPv4 address (10.1.2.3), an IPv6 address (2001:db8::10.1.2.3), an unsigned 32-bit integer representing an IPv4 address (167838211), or any of those with a CIDR block designation (192.168.0.0/16, 2001:db8::/32, 167772160/8).

The --*set-family requires that you store the IPs in a binary IPset file and pass the name of the file to the switch. IPset files are created from SiLK Flow records with rwset(1), or from textual input with rwsetbuild(1).

The --*address-family (which includes --next-hop-id) takes as its argument a single IP address, a single CIDR block, or a single SiLK IP Wildcard. A SiLK IP Wildcard may represent multiple, disjointed IPv4 or IPv6 addresses. An IP Wildcard contains an IP in its canonical form, except each part of the IP (where part is an octet for IPv4 or a hexadectet for IPv6) may be a single value, a range, a comma separated list of values and ranges, or the letter x to signify any value for that part of the IP (that is, 0-255 for IPv4). You may not specify a CIDR suffix when using the IP Wildcard notation. The following IP_WILDCARDs all represent the same value:

 ::ffff:0:0/112
 ::ffff:0:x
 ::ffff:0:aaab-ffff,aaaa,0-aaa9
 ::ffff:0.0.0.0/112
 ::ffff:0.0.128-254,0-126,255,127.x

The next hop address often has a value of 0.0.0.0 since the default configuration of SiLK does not store the next hop address in the data repository.

With one restriction, any combination of IP partitioning switches is allowed in a single rwfilter invocation: A positive and negative version of the same switch (e.g., --sipset and --not-sipset) is not allowed. (--sipset and --not-scidr may be used together, as can --sipset and --not-dipset.)

The address-partitioning switches are:

--scidr=IP_OR_CIDR_LIST

Pass the record if its source IP address matches a value in IP_OR_CIDR_LIST, a comma separated list of IPs and/or CIDR blocks. See also --saddress and --sipset.

--dcidr=IP_OR_CIDR_LIST

Pass the record if its destination IP address matches a value in IP_OR_CIDR_LIST. See also --daddress and --dipset.

--any-cidr=IP_OR_CIDR_LIST

Pass the record if either its source or its destination IP address matches a value in IP_OR_CIDR_LIST. This switch does not consider the next hop IP address. See also --any-address and --anyset.

--nhcidr=IP_OR_CIDR_LIST

Pass the record if its next hop IP address matches a value in IP_OR_CIDR_LIST. See also --next-hop-id and --nhipset.

--not-scidr=IP_OR_CIDR_LIST

Pass the record if its source IP address does not match a value in IP_OR_CIDR_LIST, a comma separated list of IPs and/or CIDR blocks. See also --not-saddress and --not-sipset.

--not-dcidr=IP_OR_CIDR_LIST

Pass the record if its destination IP address does not match a value in IP_OR_CIDR_LIST. See also --not-daddress and --not-dipset.

--not-any-cidr=IP_OR_CIDR_LIST

Pass the record if neither its source nor its destination IP address matches a value in IP_OR_CIDR_LIST. See also --not-any-address and --not-anyset.

--not-nhcidr=IP_OR_CIDR_LIST

Pass the record if its next hop IP address does not match a value in IP_OR_CIDR_LIST. See also --not-next-hop-id and --not-nhipset.

--saddress=IP_WILDCARD

Pass the record if its source IP address is matched by the SiLK IP Wildcard IP_WILDCARD. To match on multiple IPs, use --scidr or create an IPset and use --sipset.

--daddress=IP_WILDCARD

Pass the record if its destination IP address is matched by IP_WILDCARD, a SiLK IP Wildcard. See also --dcidr and --dipset.

--any-address=IP_WILDCARD

Pass the record if either its source or its destination IP address is matched by IP_WILDCARD, a SiLK IP Wildcard. This switch does not consider the next hop IP address. See also --any-cidr and --anyset.

--next-hop-id=IP_WILDCARD

Pass the record if its next hop IP address is matched by this IP_WILDCARD, a SiLK IP Wildcard. To match on multiple IPs, use --nhcidr or create an IPset and use --nhipset.

--not-saddress=IP_WILDCARD

Pass the record if its source IP address is not matched by this IP_WILDCARD, a SiLK IP Wildcard. See also --not-scidr and --not-sipset.

--not-daddress=IP_WILDCARD

Pass the record if its destination IP address is not matched by this IP_WILDCARD. See also --not-dcidr and --not-dipset.

--not-any-address=IP_WILDCARD

Pass the record if neither its source nor its destination IP address is matched by this IP_WILDCARD. Does not consider the next hop address. See also --not-any-cidr and --not-anyset.

--not-next-hop-id=IP_WILDCARD

Pass the record if its next hop IP address is not matched by this IP_WILDCARD. See also --not-nhcidr and --not-nhipset.

--sipset=IP_SET_FILENAME

Pass the record if its source IP address is in the list of IPs contained in the binary set file IP_SET_FILENAME. See also --scidr.

--dipset=IP_SET_FILENAME

As --sipset for the destination IP address. See also --dcidr.

--anyset=IP_SET_FILENAME

Pass the record if either its source IP address or its destination IP address is in the list of IPs contained in the binary set file IP_SET_FILENAME. Does not consider the next hop IP. See also --any-cidr.

--nhipset=IP_SET_FILENAME

As --sipset for the next-hop IP address. See also --nhcidr.

--not-sipset=IP_SET_FILENAME

Pass the record if its source IP address is not in the list of IPs contained in the binary set file IP_SET_FILENAME. See also --not-scidr.

--not-dipset=IP_SET_FILENAME

As --not-sipset for the destination IP address. See also --not-dcidr.

--not-anyset=IP_SET_FILENAME

Pass the record if neither its source IP address nor its destination IP address is in the list of IPs contained in the binary set file IP_SET_FILENAME. Does not consider the next hop IP. See also --not-any-cidr.

--not-nhipset=IP_SET_FILENAME

As --not-sipset for the next hop IP address. See also --not-nhcidr.

Partitioning Switches for Remainder of Five-Tuple

The following switches partition based on the protocol and source or destination port. The parameter to each of these switches is an INTEGER_LIST, which is a comma-separated list of individual non-negative integer values and ranges of those values. For example, 1,2,3,5-10,99-103. A range may be specified without an upper limit, such as 1-, in which case the upper limit is set to the maximum value.

--sport=INTEGER_LIST

Pass the record if its source port is in this INTEGER_LIST, possible values are 0-65535.

--dport=INTEGER_LIST

Pass the record if its destination port is in this INTEGER_LIST, possible values are 0-65535

--aport=INTEGER_LIST

Pass the record if its source port and/or its destination port is in this INTEGER_LIST, possible values are 0-65535. For example, use --aport=25 to see all SMTP conversions regardless or where they originated.

--protocol=INTEGER_LIST

Pass the record if its IP Suite Protocol is in this INTEGER_LIST, possible values are 0-255.

--icmp-type=INTEGER_LIST

Pass the record if its ICMP (or ICMPv6) type is in this INTEGER_LIST; possible values 0-255. This switch also verifies that the flow’s protocol is 1 (or 58 if the flow is IPv6). It is an error to specify a --protocol that does not include 1 and/or 58.

--icmp-code=INTEGER_LIST

Pass the record if its ICMP (or ICMPv6) code is in this INTEGER_LIST; possible values 0-255. This switch also verifies that the flow’s protocol is 1 (or 58 if the flow is IPv6). It is an error to specify a --protocol that does not include 1 and/or 58.

Partitioning Switches for Time

These switches partition based on whether the time stamps on the flow record occur within the specified time window. The form of the argument is range of two dates, start-window and end-window, each in the form YYYY/MM/DD[:HH[:MM[:SS[.ssssss]]]], for example 2003/01/31:23:45:00.000-2003/01/31:23:59:59.999 represents the last fifteen minutes of Jan 31, 2003. (A T may be used in place of : to separate the day and hour.) The start-window and end-window must be set to at least day precision. For the start-window, unspecified hour, minute, second, and nanosecond values are set to 0; for the end-window, those values are set to 23, 59, 59, and 999999999 respectively. Thus 2003/01/31:23-2003/01/31:23 becomes 2003/01/31:23:00:00.000-2003/01/31:23:59:59.999999999. If an end-window is not given, it is set to the start-window, giving a window of a single nanosecond. The date strings are considered to be in the timezone specified when SiLK was compiled, which you can determine from the output of rwfilter --version. You may also specify the times as seconds since the UNIX epoch; when the end-time is in epoch seconds, an unspecified nanoseconds value is set to 999999999 and otherwise the value is unchanged.

--active-time=TIME_WINDOW

Pass the record if the record was active at ANY time during this TIME_WINDOW. If a single time is specified, pass the record if it was active at that instant.

--stime=TIME_WINDOW

Pass the record if its starting time is in this TIME_WINDOW.

--etime=TIME_WINDOW

As --stime for the ending time.

--duration=DECIMAL_RANGE

Pass the record if its duration--that is, the record’s end time minus its start time, as measured in seconds--is in this DECIMAL_RANGE. Use floating point numbers to specify fractional second values. The range should be specified as MIN-MAX; for example, 5.0-10.031. If a single value is given, the duration must match that value exactly. The upper limit may be omitted; for example, a range of 1.5- passes records whose duration is at least 1.5 seconds.

Partitioning Switches for Volume

The following switches partition based on the volume of the flow; that is, the number of bytes or packets. For additional volume-related switches, load the flowrate plug-in as described in the