SiLK 3.23
Copyright 2024 Carnegie Mellon University.
NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN ”AS-IS” BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT.
Licensed under a GNU GPL 2.0-style license, please see LICENSE.txt or contact permission@sei.cmu.edu for full terms.
[DISTRIBUTION STATEMENT A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for non-US Government use and distribution.
This Software includes and/or makes use of Third-Party Software each subject to its own license.
DM24-1064
The SiLK Reference Guide contains the manual page for each analysis tool, utility, plug-in, file format, and collection facility in the SiLK Collection and Analysis Suite.
This document is meant for reference only. The SiLK Analysis Handbook provides both a tutorial for learning about the tools and examples of how they can be used in analyzing flow data. See the SiLK Installation Handbook for instructions on installing SiLK at your site.
This reference guide is broken into sections like the traditional UNIX manual: end-user analysis tools and utilities are described in Section 1; the libraries and plug-ins that augment the behavior of some tools are presented in Section 3; Section 5 contains information about file formats; miscellaneous information is in Section 7; and commands for the installer and administrator of SiLK appear in Section 8.
This section provides the manual page for each analysis tool and utility that the users of SiLK may employ in their day-to-day work.
Map between sensor names and sensor numbers
mapsid [--print-classes] [--print-descriptions] [--site-config-file=FILENAME] [{ <sensor-name> | <sensor-number> } ...]
mapsid --help
mapsid --version
As of SiLK 3.0, mapsid is deprecated, and it will be removed in the SiLK 4.0 release. Use rwsiteinfo(1) instead---the EXAMPLES section shows how to use rwsiteinfo to get output similar to that produced by mapsid.
mapsid is a utility that maps sensor names to sensor numbers or vice versa depending on the input arguments. Sensors are defined in the silk.conf(5) file.
When no sensor arguments are given to mapsid, the mapping of all sensor numbers to names is printed. When a numeric argument is given, the number to name mapping is printed for the specified argument. When a name is given, its numeric id is printed. For convenience when typing in sensor names, case is ignored.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
For each sensor, print the classes for which the sensor collects data. The classes are enclosed in square brackets, [].
For each sensor, print the description of the sensor as defined in the silk.conf file (if any).
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, mapsid searches for the site configuration file in the locations specified in the FILES section.
Print the available options and exit.
Print the version number and information about how SiLK was configured, then exit the application.
The following examples demonstrate the use of mapsid. In addition, each example shows how to get similar output using rwsiteinfo(1).
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.
$ mapsid beta BETA -> 1
$ rwsiteinfo --fields=sensor,id-sensor --sensors=BETA Sensor|Sensor-ID| BETA| 1|
Unlike mapsid, matching of the sensor name is case-sensitive in rwsiteinfo.
$ mapsid 3 3 -> DELTA
$ rwsiteinfo --fields=id-sensor,sensor --sensors=3 --delimited=, Sensor-ID,Sensor 3,DELTA
$ mapsid 0 -> ALPHA 1 -> BETA 2 -> GAMMA 3 -> DELTA 4 -> EPSLN 5 -> ZETA ....
$ rwsiteinfo --fields=id-sensor,sensor --no-titles 0| ALPHA| 1| BETA| 2| GAMMA| 3| DELTA| 4| EPSLN| 5| ZETA| ...
$ mapsid --print-classes 3 ZETA 3 -> DELTA [all] ZETA -> 5 [all]
$ rwsiteinfo --fields=id-sensor,sensor,class:list --sensors=4,ZETA Sensor-ID|Sensor|Class:list| 3| DELTA| all| 5| ZETA| all|
$ mapsid --print-classes --print-description 0 1 0 -> ALPHA [all] "Primary gateway" 1 -> BETA [all] "Secondary gateway"
rwsiteinfo supports using an integer range when specifying sensors.
$ rwsiteinfo --fields=id-sensor,sensor,class:list,describe-sensor \ --sensors=0-1 Sensor-ID|Sensor|Class:list|Sensor-Description| 0| ALPHA| all| Primary gateway| 1| BETA| all| Secondary gateway|
This environment variable is used as the value for the --site-config-file when that switch is not provided.
This environment variable specifies the root directory of data repository. As described in the FILES section, mapsid may use this environment variable when searching for the SiLK site configuration file.
This environment variable gives the root of the install tree. When searching for configuration files, mapsid may use this environment variable. See the FILES section for details.
Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.
rwsiteinfo(1), silk.conf(5), silk(7)
As of SiLK 3.0, mapsid is deprecated; use rwsiteinfo(1) instead.
Convert an integer IP to dotted-decimal notation
num2dot [--ip-fields=FIELDS] [--delimiter=C]
num2dot --help
num2dot --version
num2dot is a filter to speedup sorting of IP numbers and yet result in both a natural order (i.e., 29.23.1.1 will appear before 192.168.1.1) and readable output (i.e., dotted decimal rather than an integer representation of the IP number).
It is designed specifically to deal with the output of rwcut(1). Its job is to read stdin and convert specified fields (default field 1) separated by a delimiter (default ’|’) from an integer number into a dotted decimal IP address. Up to three IP fields can be specified via the --ip-fields=FIELDS option. The --delimiter option can be used to specify an alternate delimiter.
num2dot does not support IPv6 addresses. The EXAMPLES section below includes an example PySiLK script to handle IPv6.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
Column number of the input that should be considered IP numbers. Column numbers start from 1. If not specified, the default is 1.
The character that separates the columns of the input. Default is ’|’.
Print the available options and exit.
Print the version number and information about how SiLK was configured, then exit the application.
In the following example, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.
Suppose in addition to the default fields of 1-12 produced by rwcut(1), you want to prefix each row with an integer form of the destination IP and the start time to make processing by another tool (e.g., a spreadsheet) easier. However, within the default rwcut output fields of 1-12, you want to see dotted-decimal IP addresses. You could use the following command:
$ rwfilter ... --pass=stdout \ | rwcut --fields=dip,stime,1-12 --ip-format=decimal \ --timestamp-format=epoch \ | num2dot --ip-field=3,4
In the rwcut invocation, you prepend the fields of interest (dip and stime before the standard fields. The first six columns produced by rwcut will be dIP, sTime, sIP, dIP, sPort, dPort. The --ip-format switch causes the first, third, and fourth columns to be printed as integers, but you only want the first column to have an integer representation. The pipe through num2dot will convert the third and fourth columns to dotted-decimal IP numbers.
num2dot does not support converting integers to IPv6 addresses. The following PySiLK script (see pysilk(3)) could be used as a starting-point to create a version of num2dot that supports IPv6 addresses:
#! /usr/bin/env python from __future__ import print_function import sys import silk # The IPv6 fields to process; the ID of the first field is 0 ip_fields = (0, 1) # The delimiter between fields delim = ’|’ # The width of the IPv6 fields width = 39 # The file to process; this script processes standard input f = sys.stdin try: for line in f: fields = line.rstrip(f.newlines).split(delim) for i in ip_fields: fields[i] = "%*s" % (width, silk.IPv6Addr(int(fields[i]))) print(delim.join(fields)) finally: f.close()
rwcut(1), pysilk(3), silk(7)
num2dot has no support for IPv6 addresses.
Count activity by IPv4 address
rwaddrcount {--print-recs | --print-ips | --print-stat} [--use-dest] [--min-bytes=BYTEMIN] [--max-bytes=BYTEMAX] [--min-records=RECMIN] [--max-records=RECMAX] [--min-packets=PACKMIN] [--max-packets=PACKMAX] [--set-file=PATHNAME] [--sort-ips] [--timestamp-format=FORMAT] [--ip-format=FORMAT] [--integer-ips] [--zero-pad-ips] [--no-titles] [--no-columns] [--column-separator=CHAR] [--no-final-delimiter] [{--delimited | --delimited=CHAR}] [--print-filenames] [--copy-input=PATH] [--output-path=PATH] [--pager=PAGER_PROG] [--site-config-file=FILENAME] [{--legacy-timestamps | --legacy-timestamps=NUM}] {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwaddrcount --help
rwaddrcount --version
rwaddrcount reads SiLK Flow records, sums the byte-, packet-, and record-counts on those records by individual source or destination IP address and maintains the time window during which that IP address was active. At the end of the count operation, the results per IP address are displayed when the --print-recs switch is given. rwaddrcount includes facilities for displaying only those IP address whose byte-, packet- or flow-counts are between specified minima and maxima.
rwaddrcount does not support IPv6 addresses. To generate output for IPv6 records, use the rwuniq(1) tool:
rwuniq --fields=sip --values=bytes,packets,records,stime,etime
rwaddrcount reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwaddrcount reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
For the application to operate, one of the three --print options must be chosen.
Print one row for each bin that meets the minima/maxima criteria. Each bin contains the IP address, number of bytes, number of packets, number of flow records, earliest start time, and latest end time.
Print a single column containing the IP addresses for each bin that meets the minima/maxima criteria.
Print a one or two line summary (plus a title line) that summarizes the bins. The first line is a summary across all bins, and it contains the number of unique IP addresses and the sums of the bytes, packets, and flow records. The second line is printed only when one or more minima or maxima are specified. This second line contains the same columns as first, and its values are the sums across those bins that meet the criteria.
Count by destination IP address in the filter record rather than source IP.
Filtering criterion; for the final output (stats or printing), only include count records where the total number of bytes exceeds BYTEMIN
Filtering criterion; for the final output (stats or printing), only include count records where the total number of packets exceeds PACKMIN
Filtering criterion; for the final output (stats or printing), only include count records where the total number of filter records contributing to that count record exceeds RECMIN.
Filtering criterion; for the final output (stats or printing), only include count records where the total number of bytes is less than BYTEMAX.
Filtering criterion; for the final output (stats or printing), only include count records where the total number of packets is less than PACKMAX.
Filtering criterion; for the final output (stats or printing), only include count records which at most RECMAX filter records contributed to.
Write the IPs into the rwset(1)-style binary IP-set file named PATHNAME. Use rwsetcat(1) to see the contents of this file.
Specify the format and/or timezone to use when printing timestamps. When this switch is not specified, the SILK_TIMESTAMP_FORMAT environment variable is checked for a default format and/or timezone. If it is empty or contains invalid values, timestamps are printed in the default format, and the timezone is UTC unless SiLK was compiled with local timezone support. FORMAT is a comma-separated list of a format and/or a timezone. The format is one of:
Print the timestamps as YYYY /MM/DDThh:mm:ss
Print the timestamps as YYYY -MM-DD hh:mm:ss
Print the timestamps as MM/DD/YYYY hh:mm:ss
Print the timestamps as the number of seconds since 00:00:00 UTC on 1970-01-01.
When a timezone is specified, it is used regardless of the default timezone support compiled into SiLK. The timezone is one of:
Use Coordinated Universal Time to print timestamps.
Use the TZ environment variable or the local timezone.
For the --print-recs and --print-ips output formats, specify how IP addresses are printed, where FORMAT is a comma-separated list of the arguments described below. When this switch is not specified, the SILK_IP_FORMAT environment variable is checked for a value and that format is used if it is valid. The default FORMAT is canonical. Since SiLK 3.7.0.
Print IP addresses in the canonical format: dot-separated decimal for IPv4 (192.0.2.1).
Print IP addresses in the canonical format (192.0.2.1). Prevent use of the mixed IPv4-IPv6 representation when map-v4 is also included in FORMAT. For example, use ::ffff:c000:201 instead of ::ffff:192.0.2.1. Since SiLK 3.17.0.
Print IP addresses as integers in decimal format. For example, print 192.0.2.1 and ::ffff:192.0.2.1 as 3221225985 and 281473902969345, respectively.
Print IP addresses as integers in hexadecimal format. For example, print 192.0.2.1 and ::ffff:192.0.2.1 as c00000201 and ffffc00000201, respectively.
Make all IP address strings contain the same number of characters by padding numbers with leading zeros. For example, print 192.0.2.1 as 192.000.002.001. For IPv6 addresses, this setting implies no-mixed, so that ::ffff:192.0.2.1 is printed as 0000:0000:0000:0000:0000:ffff:c000:0201. As of SiLK 3.17.0, may be combined with any of the above, including decimal and hexadecimal.
The following arguments modify certain IP addresses prior to printing. These arguments may be combined with the above formats.
Change addresses to IPv4-mapped IPv6 addresses (addresses in the ::ffff:0:0/96 netblock) prior to formatting. Since SiLK 3.17.0.
Do nothing (rwaddrcount does not support IPv6 addresses as the key). Since SiLK 3.17.0.
The following argument is also available:
Set FORMAT to map-v4,no-mixed.
Print IP addresses as integers. This switch is equivalent to --ip-format=decimal, it is deprecated as of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.
Print IP addresses as fully-expanded, zero-padded values in the canonical format. This switch is equivalent to --ip-format=zero-padded, it is deprecated as of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release
For the --print-recs and --print-ips output formats, the results are presented sorted by IP address.
Turn off column titles. By default, titles are printed.
Disable fixed-width columnar output.
Use specified character between columns and after the final column. When this switch is not specified, the default of ’|’ is used.
Do not print the column separator after the final column. Normally a delimiter is printed.
Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default ’|’.
Print to the standard error the names of input files as they are opened.
Copy all binary SiLK Flow records read as input to the specified file or named pipe. PATH may be stdout or - to write flows to the standard output as long as the --output-path switch is specified to redirect rwaddrcount’s textual output to a different location.
Write the textual output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output (and bypass the paging program). If PATH names an existing file, rwaddrcount exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is either sent to the pager or written to the standard output.
When output is to a terminal, invoke the program PAGER_PROG to view the output one screen full at a time. This switch overrides the SILK_PAGER environment variable, which in turn overrides the PAGER variable. If the --output-path switch is given or if value of the pager is determined to be the empty string, no paging is performed and all output is written to the terminal.
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwaddrcount searches for the site configuration file in the locations specified in the FILES section.
When NUM is not specified or is 1, this switch is equivalent to --timestamp-format=m/d/y. Otherwise, the switch has no effect. This switch is deprecated as of SiLK 3.0.0, and it will be removed in the SiLK 4.0 release.
Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwaddrcount opens each named file in turn and reads records from it as if the filenames had been listed on the command line.
Print the available options and exit.
Print the version number and information about how SiLK was configured, then exit the application.
The following switches are deprecated. They will be removed in SiLK 4.0.
Deprecated alias for --min-bytes.
Deprecated alias for --min-packets.
Deprecated alias for --min-records.
Deprecated alias for --max-bytes.
Deprecated alias for --max-packets.
Deprecated alias for --max-records.
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.
To print a list of source IP addresses that appeared in exactly one TCP record during the first 12 hours of 2003-Sep-01, use:
$ rwfilter --start-date=2003/09/01:00 --end-date=2003/09/01:11 \ --proto=6 --pass=stdout \ | rwaddrcount --max-records=1 --print-ips
In general, to print out record information, use rwaddrcount with --print-recs
$ rwfilter --start-date=2003/01/17:00 --end-date=2003/01/17:23 \ --proto=6 --pass=stdout \ | rwaddrcount --print-rec --no-title | head -3
10.10.10.1| 65792| 147| 21| 2003/01/17T00:19:01| 2003/01/17T02:00:13| 10.10.10.2| 110744| 89| 7| 2003/01/17T01:21:42| 2003/01/17T01:39:21| 10.10.10.3| 864| 18| 6| 2003/01/17T00:20:33| 2003/01/17T01:25:38|
We note some overlapping features between rwaddrcount and rwuniq(1). There is often more than one way to perform the same task in the SiLK tool set.
Here’s a guide to replacing each of the outputs of rwaddrcount:
The --print-recs switch prints five pieces of information for each source or destination address:
$ rwaddrcount --print-recs data.rw sIP|Bytes|Packets|Records| Start_Time| End_Time| 10.0.0.144| 1646| 4| 1|2007/05/09T18:01:41|2007/05/09T18:01:41| 10.14.203.121| 40| 1| 1|2007/05/09T18:31:54|2007/05/09T18:31:54| 10.14.203.122| 40| 1| 1|2007/05/09T18:32:43|2007/05/09T18:32:43| 10.15.6.14| 539| 3| 3|2007/05/09T18:03:05|2007/05/09T18:08:07| 12.0.101.22| 4365| 23| 2|2007/05/09T18:26:43|2007/05/09T18:43:46|
To do the same in rwuniq, specify either sip in --fields and the --values shown here:
$ rwuniq --fields=sip --values=bytes,packets,flows,stime,etime data.rw sIP|Bytes|Packets|Records| min_sTime| max_eTime| 10.0.0.144| 1646| 4| 1|2007/05/09T18:01:41|2007/05/09T18:01:41| 10.14.203.121| 40| 1| 1|2007/05/09T18:31:54|2007/05/09T18:31:54| 10.14.203.122| 40| 1| 1|2007/05/09T18:32:43|2007/05/09T18:32:43| 10.15.6.14| 539| 3| 3|2007/05/09T18:03:05|2007/05/09T18:08:07| 12.0.101.22| 4365| 23| 2|2007/05/09T18:26:43|2007/05/09T18:43:46|
When rwaddrcount includes --use-dest, change the --fields switch of rwuniq to dip. Replace the --sort-ips switch of rwaddrcount with --sort-output in rwuniq.
The --print-stat switch in rwaddrcount prints a one-line summary of the data:
$ rwaddrcount --print-stat data.rw | sIP_Uniq| Bytes| Packets| Records| Total| 57727| 948620676| 2026581| 382578|
This is difficult to produce with rwuniq. If there is a field that you know is either empty or constant across all records (such as nhip or in), you can use that as the key field in rwuniq.
$ rwuniq --fields=nhIP --values=distinct:sip,bytes,packets,flows data.rw nhIP|sIP-Distinct| Bytes| Packets| Records| 0.0.0.0| 57727| 948620676| 2026581| 382578|
Note that class generally does not work since each type within a class produces its own row:
$ rwuniq --fields=class --values=distinct:sip,bytes,packets,flows data.rw class|sIP-Distinct| Bytes| Packets| Records| all| 8674| 260143344| 964621| 151447| all| 55540| 688477332| 1061960| 6184399|
One trick is to use stime as the key with a very large --bin-time:
$ rwuniq --fields=stime --bin-time=2147483647 \ --values=distinct:sip,bytes,packets,flows data.rw sTime|sIP-Distinct| Bytes| Packets| Records| 1970/01/01T00:00:00| 57727| 948620676| 2026581| 382578|
Finally, you can use separate invocations of rwfilter(1), rwset(1), and rwsetcat(1):
$ rwfilter --print-volume --all=stdout data.rw \ | rwset --sip=stdout \ | rwsetcat --count-ips | Recs| Packets| Bytes| Files| Total| 382578| 2026581| 948620676| 1| Pass| 382578| 2026581| 948620676| | Fail| 0| 0| 0| | 57727
rwaddrcount’s --print-ips switch prints the IP addresses as text:
$ rwaddrcount --print-ips data.rw sIP 10.0.0.144 10.14.203.121 10.14.203.122 10.15.6.14 12.0.101.22
A combination of rwset and rwsetcat is the best way to handle this:
$ rwset --sip-file=stdout data.rw | rwsetcat --print-ips 10.0.0.144 10.14.203.121 10.14.203.122 10.15.6.14 12.0.101.22
Alternatively, use rwuniq and the UNIX tool cut(1) to only print the first column:
$ rwuniq --fields=sIP data.rw \ | cut -d ’|’ -f 1 sIP 10.0.0.144 10.14.203.121 10.14.203.122 10.15.6.14 12.0.101.22
rwaddrcount allows you to restrict the output to bins that have a certain minimum or maximum count of bytes, packets, or flows via --min-bytes, --max-bytes, --min-packets, --max-packets, --min-records, and --max-records:
$ rwaddrcount --print-recs --min-byte=1024 --max-byte=2048 \ --max-records=1 data.rw sIP|Bytes|Packets|Records| Start_Time| End_Time| 10.0.0.144| 1646| 4| 1|2007/05/09T18:01:41|2007/05/09T18:01:41| 10.14.203.121| 40| 1| 1|2007/05/09T18:31:54|2007/05/09T18:31:54| 10.14.203.122| 40| 1| 1|2007/05/09T18:32:43|2007/05/09T18:32:43|
rwuniq supports the same operations using the --bytes, --packets, and --flows switches, each of which allows you to define a desired minimum and maximum value.
$ rwuniq --fields=sip --values=bytes,packets,records,stime,etime \ --bytes=1024-2048 --flows=1-1 data.rw sIP|Bytes|Packets|Records| min_sTime| max_eTime| 10.0.0.144| 1646| 4| 1|2007/05/09T18:01:41|2007/05/09T18:01:41| 10.14.203.121| 40| 1| 1|2007/05/09T18:31:54|2007/05/09T18:31:54| 10.14.203.122| 40| 1| 1|2007/05/09T18:32:43|2007/05/09T18:32:43|
This environment variable is used as the value for --ip-format when that switch is not provided. Since SiLK 3.11.0.
This environment variable is used as the value for --timestamp-format when that switch is not provided. Since SiLK 3.11.0.
When set to a non-empty string, rwaddrcount automatically invokes this program to display its output a screen at a time. If set to an empty string, rwaddrcount does not automatically page its output.
When set and SILK_PAGER is not set, rwaddrcount automatically invokes this program to display its output a screen at a time.
The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.
This environment variable is used as the value for the --site-config-file when that switch is not provided.
This environment variable specifies the root directory of data repository. As described in the FILES section, rwaddrcount may use this environment variable when searching for the SiLK site configuration file.
This environment variable gives the root of the install tree. When searching for configuration files, rwaddrcount may use this environment variable. See the FILES section for details.
When the argument to the --timestamp-format switch includes local or when a SiLK installation is built to use the local timezone, the value of the TZ environment variable determines the timezone in which rwaddrcount displays timestamps. (If both of those are false, the TZ environment variable is ignored.) If the TZ environment variable is not set, the machine’s default timezone is used. Setting TZ to the empty string or 0 causes timestamps to be displayed in UTC. For system information on the TZ variable, see tzset(3) or environ(7). (To determine if SiLK was built with support for the local timezone, check the Timezone support value in the output of rwaddrcount --version.)
Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.
rwset(1), rwsetcat(1), rwstats(1), rwtotal(1), rwuniq(1), silk(7), tzset(3), environ(7)
rwaddrcount only supports IPv4 addresses, and it will not be modified to support IPv6 addresses. To produce output similar to rwaddrcount for IPv6 addresses, use rwuniq(1):
rwuniq --fields=sip --values=bytes,packets,records,stime,etime
When used in an IPv6 environment, rwaddrcount converts IPv6 flow records that contain addresses in the ::ffff:0:0/96 prefix to IPv4 and processes them. IPv6 records having addresses outside of that prefix are ignored.
rwaddrcount uses a fairly large hashtable to store data, but it is likely that as the amount of data expands, the application will take more time to process data.
Similar binning of records are produced by rwstats(1), rwtotal(1), and rwuniq(1).
To generate a list of IP addresses without the volume information, use rwset(1).
Build a binary Aggregate Bag from SiLK Flow records
rwaggbag --keys=KEY --counters=COUNTER [--note-strip] [--note-add=TEXT] [--note-file-add=FILE] [--invocation-strip] [--print-filenames] [--copy-input=PATH] [--compression-method=COMP_METHOD] [--ipv6-policy={ignore,asv4,mix,force,only}] [--output-path=PATH] [--site-config-file=FILENAME] {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwaggbag --help
rwaggbag --help-fields
rwaggbag --version
rwaggbag reads SiLK Flow records and builds an Aggregate Bag file. To build an Aggregate Bag from textual input, use rwaggbagbuild(1).
An Aggregate Bag is a binary file that maps a key to a counter, where the key and the counter are both composed of one or more fields. For example, an Aggregate Bag could contain the sum of the packet count and the sum of the byte count for each unique source IP and source port pair.
For each SiLK flow record rwaggbag reads, it extracts the values of the fields listed in the --keys switch, combines those fields into a key, searches for an existing bin that has that key and creates a new bin for that key if none is found, and adds the values for each of the fields listed in the --counters switch to the bin’s counter. Both the --keys and --counters switches are required.
rwaggbag reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwaggbag reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.
If rwaggbag runs out of memory, it will exit immediately. The output Aggregate Bag file remains behind with a size of 0 bytes.
To print the contents of an Aggregate Bag as text, use rwaggbagcat(1). The rwaggbagbuild(1) tool can create an Aggregate Bag from textual input. rwaggbagtool(1) allows you to manipulate binary Aggregate Bag files.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
Create a key for binning flow records using the values of the comma-separated field(s) listed in KEY. The field names are case-insensitive, a name may be abbreviated to its shortest unique prefix, and a name may only be used one time. The list of available KEY fields are
source IP address when IPv4
source IP address when IPv6
destination IP address when IPv4
destination IP address when IPv6
source port for TCP or UDP, or equivalent
destination port for TCP or UDP, or equivalent
IP protocol
count of packets recorded for this flow record
count of bytes recorded for this flow record
bit-wise OR of TCP flags over all packets in the flow
starting time of the flow, in seconds resolution
duration of the flow, in seconds resolution
ending time of the flow, in seconds resolution
numeric ID of the sensor where the flow was collected
router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))
router SNMP output interface or postVlanId
router next hop IP address when IPv4
router next hop IP address when IPv6
TCP flags on first packet in the flow as reported by yaf(1)
bit-wise OR of TCP flags over all packets in the flow except the first as reported by yaf
flow attributes set by the flow generator
content of the flow as reported in the applabel field of yaf
class of the sensor at the collection point
type of the sensor at the collection point
ICMP type value for ICMP and ICMPv6 flows, 0 otherwise
ICMP code value for ICMP and ICMPv6 flows, 0 otherwise
the country code of the source IP address. Uses the mapping file specified by the SILK_COUNTRY_CODES environment variable or the country_codes.pmap mapping file, as described in FILES. (See also ccfilter(3).) Since SiLK 3.19.0.
the country code of the destination IP address. See scc. Since SiLK 3.19.0.
Add to the bin determined by the fields in --key the values of the comma-separated field(s) listed in COUNTER. The field names are case-insensitive, a name may be abbreviated to its shortest unique prefix, and a name may only be used one time. The list of available COUNTER fields are
count of the number of flow records that match the key
the sum of the packet counts for flow records that match the key
the sum of the byte counts for flow records that match the key
the sum of the durations (in seconds) for flow records that match the key
Do not copy the notes (annotations) from the input file(s) to the output file. When this switch is not specified, notes from the input file(s) are copied to the output.
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.
Do not record any command line history: do not copy the invocation history from the input files to the output file(s), and do not record the current command line invocation in the output. The invocation may be viewed with rwfileinfo(1).
Print to the standard error the names of input files as they are opened.
Copy all binary SiLK Flow records read as input to the specified file or named pipe. PATH may be stdout or - to write flows to the standard output as long as the --output-path switch is specified to redirect rwaggbag’s output to a different location.
Write the binary Aggregate Bag output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwaggbag exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwaggbag to exit with an error.
Determine how IPv4 and IPv6 flows are handled when SiLK has been compiled with IPv6 support. When the switch is not provided, the SILK_IPV6_POLICY environment variable is checked for a policy. If it is also unset or contains an invalid policy, the POLICY is mix. When SiLK has not been compiled with IPv6 support, IPv6 flows are always ignored, regardless of the value passed to this switch or in the SILK_IPV6_POLICY variable. The supported values for POLICY are:
Ignore any flow record marked as IPv6, regardless of the IP addresses it contains. Only IP addresses contained in IPv4 flow records will be added to the Aggregate Bag.
Convert IPv6 flow records that contain addresses in the ::ffff:0:0/96 netblock (that is, IPv4-mapped IPv6 addresses) to IPv4 and ignore all other IPv6 flow records.
Process the input as a mixture of IPv4 and IPv6 flow records. When creating a bag whose key is an IP address and the input contains IPv6 addresses outside of the ::ffff:0:0/96 netblock, this policy is equivalent to force; otherwise it is equivalent to asv4.
Convert IPv4 flow records to IPv6, mapping the IPv4 addresses into the ::ffff:0:0/96 netblock.
Process only flow records that are marked as IPv6. Only IP addresses contained in IPv6 flow records will be added to the Aggregate Bag.
Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.
Do not compress the output using an external library.
Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.
Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.
Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.
Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwaggbag searches for the site configuration file in the locations specified in the FILES section.
Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwaggbag opens each named file in turn and reads records from it as if the filenames had been listed on the command line.
Print the available options and exit.
Print the names and descriptions of the keys and counters that may be used in the --keys and --counters switches and exit. Since SiLK 3.22.0.
Print the version number and information about how SiLK was configured, then exit the application.
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.
To create an Aggregate Bag that sums the packet count for destination IPs addresses in the SiLK Flow file data.rw:
$ rwaggbag --key=dipv6 --counter=sum-packets data.rw \ | rwaggbagcat
To sum the number of records, packet count, and byte count for all flow records
$ rwaggbag --key=dport --counter=records,sum-packets,sum-bytes \ --output-path=dport.aggbag data.rw
To count the number of records seen for each unique source port, destination port, and protocol:
$ rwaggbag --key=sport,dport,proto --counter=records data.rw \ | rwaggbagcat
This environment variable allows the user to specify the country code mapping file that rwaggbag uses when mapping an IP to a country for the scc and dcc keys. The value may be a complete path or a file relative to the SILK_PATH. See the FILES section for standard locations of this file.
This environment variable is used as the value for --ipv6-policy when that switch is not provided.
The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.
This environment variable is used as the value for --compression-method when that switch is not provided.
This environment variable is used as the value for the --site-config-file when that switch is not provided.
This environment variable specifies the root directory of data repository. As described in the FILES section, rwaggbag may use this environment variable when searching for the SiLK site configuration file.
This environment variable gives the root of the install tree. When searching for configuration files, rwaggbag may use this environment variable. See the FILES section for details.
Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.
Possible locations for the country code mapping file required by the scc and dcc keys.
rwaggbag and the other Aggregate Bag tools were introduced in SiLK 3.15.0.
rwaggbagbuild(1), rwaggbagcat(1), rwaggbagtool(1), rwbag(1), rwfileinfo(1), rwfilter(1), rwnetmask(1), rwset(1), rwuniq(1), ccfilter(3), sensor.conf(5), silk(7), yaf(1), zlib(3)
Create a binary aggregate bag from non-flow data
rwaggbagbuild [--fields=FIELDS] [--constant-field=FIELD=VALUE [--constant-field=FIELD=VALUE...]] [--column-separator=CHAR] [--no-titles] [--bad-input-lines=FILE] [--verbose] [--stop-on-error] [--note-add=TEXT] [--note-file-add=FILE] [--invocation-strip] [--compression-method=COMP_METHOD] [--output-path=PATH] [--site-config-file=FILENAME] {[--xargs] | [--xargs=FILENAME] | [FILE [FILE...]]}
rwaggbagbuild --help
rwaggbagbuild --help-fields
rwaggbagbuild --version
rwaggbagbuild builds a binary Aggregate Bag file by reading one or more files containing textual input. To build an Aggregate Bag from SiLK Flow records, use rwaggbag(1).
An Aggregate Bag is a binary file that maps a key to a counter, where the key and the counter are both composed of one or more fields. For example, an Aggregate Bag could contain the sum of the packet count and the sum of the byte count for each unique source IP and source port pair.
rwaggbagbuild reads its input from the files named on the command line or from the standard input when no file names are specified, when --xargs is not present, and when the standard input is not a terminal. To read the standard input in addition to the named files, use - or stdin as a file name. When the --xargs switch is provided, rwaggbagbuild reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.
The new Aggregate Bag file is written to the location specified by the --output-path switch. If it is not provided, output is sent to the standard output when it is not connected to a terminal.
The Aggregate Bag file must have at least one field that it considers and key field and at least one field that it considers a counter field. See the description of the --fields switch.
In general (and as detailed below), each line of the text input files becomes one entry in the Aggregate Bag file. It is also possible to specify that each entry in the Aggregate Bag file contains additional fields, each with a specific value. These fields are specified by the --constant-field switch whose argument is a field name, an equals sign (’=’), and a textual representation of a value. The named field becomes one of the key or counter fields in the Aggregate Bag file, and that field is given the specified value for each entry that is read from an input file. See the --fields switch in the OPTIONS section for the names of the fields and the acceptable forms of the textual input for each field.
The remainder of this section details how rwaggbagbuild processes each text input file to create an Aggregate Bag file.
When the --fields switch is specified, its argument specifies the key and counter fields that the new Aggregate Bag file is to contain. If --fields is not specified, the first line of the first input file is expected to contain field names, and those names determine the Aggregate Bag’s key and counter. A field name of ignore causes rwaggbagbuild to ignore the values in that field when parsing the input.
The textual input is processed one line at a time. Comments begin with a ’#’-character and continue to the end of the line; they are stripped from each line. After removing the comments, any line that is blank or contains only whitespace is ignored.
All other lines must contain valid input, which is a set of fields separated by a delimiter. The default delimiter is the virtual bar (’|’) and may be changed with the --column-separator switch. Whitespace around a delimiter is allowed; however, using space or tab as the separator causes each space or tab character to be treated as a field delimiter. The newline character is not a valid delimiter character since it is used to denote records, and ’#’ is not a valid delimiter since it begins a comment.
The first line of each input file may contain delimiter-separated field names denoting in which order the fields appear in this input file. As mentioned above, when the --fields switch is not given, the first line of the first file determines the Aggregate Bag’s key and counter. To tell rwaggbagbuild to treat the first line of each file as field values to be parsed, specify the --no-titles switch.
Every other line must contain delimiter-separated field values. A delimiter may follow the final field on a line. rwaggbagbuild ignores lines that contain either too few or too many fields.
See the description of the --fields switch in the OPTIONS section for the names of the fields and the acceptable forms of the textual input for each field.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
Specify the fields in the input files. FIELDS is a comma separated list of field names. Field names are case-insensitive, and a name may be abbreviated to the shortest unique prefix. Other than the ignore field, a field name may not be specified more than once. The Aggregate Bag file must have at least one key field and at least one counter field.
The names of the fields that are considered key fields, their descriptions, and the format of the input that each expects are:
field that rwaggbagbuild is to skip
source IP address, IPv4 only; either the canonical dotted-quad format or an integer from 0 to 4294967295 inclusive
destination IP address, IPv4 only; uses the same format as sIPv4
next hop IP address, IPv4 only; uses the same format as sIPv4
a generic IPv4 address; uses the same format as sIPv4
source IP address, IPv6 only; the canonical hex-encoded format for IPv6 addresses
destination IP address, IPv6 only; uses the same format as sIPv6
next hop IP address, IPv6 only; uses the same format as sIPv6
a generic IPv6 address; uses the same format as sIPv6
source port; an integer from 0 to 65535 inclusive
destination port; an integer from 0 to 65535 inclusive
a generic port; an integer from 0 to 65535 inclusive
IP protocol; an integer from 0 to 255 inclusive
packet count; an integer from 1 to 4294967295 inclusive
byte count; an integer from 1 to 4294967295 inclusive
bit-wise OR of TCP flags over all packets; a string containing F, S, R, P, A, U, E, C in upper- or lowercase
TCP flags on the first packet; uses the same form as flags
bit-wise OR of TCP flags on the second through final packet; uses the same form as flags
starting time in seconds; uses the form YYYY/MM/DD[:hh[:mm[:ss[.sss]]]] (any fractional seconds value is dropped). A T may be used in place of : to separate the day and hour fields. A floating point value between 536870912 and 2147483647 is also allowed and is treated as seconds since the UNIX epoch.
ending time in seconds; uses the same format as sTime
a generic time in seconds; uses the same format as sTime
duration of flow; a floating point value from 0.0 to 4294967.295
sensor name or ID at the collection point; a string as given in silk.conf(5)
class at collection point; a string as given in silk.conf
type at collection point; a string as given in silk.conf
router SNMP ingress interface or vlanId; an integer from 0 to 65535
router SNMP egress interface or postVlanId; an integer from 0 to 65535
a generic SNMP value; an integer from 0 to 65535
flow attributes set by the flow generator:
all the packets in this flow record are exactly the same size
flow generator saw additional packets in this flow following a packet with a FIN flag (excluding ACK packets)
flow generator prematurely created a record for a long-running connection due to a timeout or a byte-count threshold
flow generator created a record as a continuation of a previous record for a connection that exceeded a timeout or byte-count threshold
guess as to the content of the flow; as an integer from 0 to 65535
ICMP type; an integer from 0 to 255 inclusive
ICMP code; an integer from 0 to 255 inclusive
the country code of the source; accepts a two character string to use as the country of the source IP. The code is not checked for validity against the country_codes.pmap file. The code must be ASCII and it may contain two letters, a letter followed by a number, or the string --. Since SiLK 3.19.0.
the country code of the destination. See scc. Since SiLK 3.19.0.
a generic country code. See scc. Since SiLK 3.19.0.
a generic key; an integer from 0 to 4294967295 inclusive
The names and descriptions of the fields that are considered counter fields are listed next. For each, the type of input is an unsigned 64-bit number; that is, an integer from 0 to 18446744073709551615.
count of records that match the key
sum of packet counts
sum of byte counts
sum of duration values
a generic counter
For each entry (row) read from the input file(s), insert or replace a field named FIELD and set its value to VALUE. VALUE is a textual representation of the field’s value as described in the description of the --fields switch above. When FIELD is a counter field and the same key appears multiple times in the input, VALUE is added to the counter multiple times. If a field named FIELD appears in an input file, its value from that file is ignored. Specify the --constant-field switch multiple times to insert multiple fields.
When reading textual input, use the character CHAR as the delimiter between columns (fields) in the input. The default column separator is the vertical pipe (’|’). rwaggbagbuild normally ignores whitespace (space and tab) around the column separator; however, using space or tab as the separator causes each space or tab character to be treated as a field delimiter. The newline character is not a valid delimiter character since it is used to denote records, and ’#’ is not a valid delimiter since it begins a comment.
When parsing textual input, copy any lines than cannot be parsed to FILEPATH. The strings stdout and stderr may be used for the standard output and standard error, respectively. Each bad line is prepended by the name of the source input file, a colon, the line number, and a colon. On exit, rwaggbagbuild removes FILEPATH if all input lines were successfully parsed.
When a textual input line fails to parse, print a message to the standard error describing the problem. When this switch is not specified, parsing failures are not reported. rwaggbagbuild continues to process the input after printing the message. To stop processing when a parsing error occurs, use --stop-on-error.
When a textual input line fails to parse, print a message to the standard error describing the problem and exit the program. When this occurs, the output file contains any records successfully created prior to reading the bad input line. The default behavior of rwaggbagbuild is to silently ignore parsing errors. To report parsing errors and continue processing the input, use --verbose.
Parse the first line of the input as field values. Normally when the --fields switch is specified, rwaggbagbuild examines the first line to determine if the line contains the names (titles) of fields and skips the line if it does. rwaggbagbuild exits with an error when --no-titles is given but --fields is not.
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.
Do not record the command used to create the Aggregate Bag file in the output. When this switch is not given, the invocation is written to the file’s header, and the invocation may be viewed with rwfileinfo(1).
Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.
Do not compress the output using an external library.
Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.
Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.
Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.
Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.
Write the binary Aggregate Bag output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwaggbagbuild exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwaggbagbuild to exit with an error.
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwaggbagbuild searches for the site configuration file in the locations specified in the FILES section.
Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwaggbagbuild opens each named file in turn and reads text from it as if the filenames had been listed on the command line.
Print the available options and exit.
Print the names and descriptions of the keys and counters that may be used in the --fields and --constant-field switches and exit. Since SiLK 3.22.0.
Print the version number and information about how SiLK was configured, then exit the application.
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.
Assume the following textual data in the file rec.txt:
dIP|dPort| packets| bytes| 10.245.15.175| 80| 127| 12862| 192.168.251.186|29222| 131| 351213| 10.247.186.130| 80| 596| 38941| 192.168.239.224|29362| 600| 404478| 192.168.215.219| 80| 400| 32375| 10.255.252.19|28925| 404| 1052274| 192.168.255.249| 80| 112| 7412| 10.208.7.238|29246| 109| 112977| 192.168.254.127| 80| 111| 9759| 10.218.34.108|29700| 114| 461845|
To create an Aggregate Bag file from this data, provide the --fields switch with the names used by the Aggregate Bag tools:
$ rwaggbagbuild --fields=dipv4,dport,sum-packets,sum-bytes \ --output-path=ab.aggbag rec.txt
Use the rwaggbagcat(1) tool to view it:
$ rwaggbagcat ab.aggbag dIPv4|dPort| sum-packets| sum-bytes| 10.208.7.238|29246| 109| 112977| 10.218.34.108|29700| 114| 461845| 10.245.15.175| 80| 127| 12862| 10.247.186.130| 80| 596| 38941| 10.255.252.19|28925| 404| 1052274| 192.168.215.219| 80| 400| 32375| 192.168.239.224|29362| 600| 404478| 192.168.251.186|29222| 131| 351213| 192.168.254.127| 80| 111| 9759| 192.168.255.249| 80| 112| 7412|
Create an Aggregate Bag from the destination port field and count the number of times each port appears, ignore all fields except the dPort fields and use --constant-field to add a new field:
$ rwaggbagbuild --fields=ignore,dport,ignore,ignore \ --constant-field=record=1 \ | rwaggbagcat dPort| records| 80| 5| 28925| 1| 29222| 1| 29246| 1| 29362| 1| 29700| 1|
Alternatively, use rwaggbagtool(1) to get the same information from the ab.aggbag file created above:
$ rwaggbagtool --select-fields=dport \ --insert-field=record=1 ab.aggbag \ | rwaggbagcat dPort| records| 80| 5| 28925| 1| 29222| 1| 29246| 1| 29362| 1| 29700| 1|
The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.
This environment variable is used as the value for --compression-method when that switch is not provided.
This environment variable is used as the value for the --site-config-file when that switch is not provided.
This environment variable specifies the root directory of data repository. As described in the FILES section, rwaggbagbuild may use this environment variable when searching for the SiLK site configuration file.
This environment variable gives the root of the install tree. When searching for configuration files, rwaggbagbuild may use this environment variable. See the FILES section for details.
Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.
rwaggbag(1), rwaggbagcat(1), rwaggbagtool(1), rwfileinfo(1), rwset(1), rwsetbuild(1), rwsetcat(1), rwsettool(1), ccfilter(3), silk.conf(5), silk(7), zlib(3)
rwaggbagbuild and the other Aggregate Bag tools were introduced in SiLK 3.15.0.
Output a binary Aggregate Bag file as text
rwaggbagcat [--fields=FIELDS [--missing-field=FIELD=STRING [--missing-field=FIELD=STRING...]]] [--timestamp-format=FORMAT] [--ip-format=FORMAT] [--integer-sensors] [--integer-tcp-flags] [--no-titles] [--no-columns] [--column-separator=C] [--no-final-delimiter] [{--delimited | --delimited=C}] [--output-path=PATH] [--pager=PAGER_PROG] [--site-config-file=FILENAME] [AGGBAGFILE [AGGBAGFILE...]]
rwaggbagcat --help
rwaggbagcat --help-fields
rwaggbagcat --version
rwaggbagcat reads a binary Aggregate Bag as created by rwaggbag(1) or rwaggbagbuild(1), converts it to text, and outputs it to the standard output, the pager, or the specified file.
As of SiLK 3.22.0, rwaggbagcat accepts a --fields switch to control the order in which the fields are printed.
rwaggbagcat reads the AGGBAGFILEs specified on the command line; if no AGGBAGFILE arguments are given, rwaggbagcat attempts to read an Aggregate Bag from the standard input. To read the standard input in addition to the named files, use - or stdin as an AGGBAGFILE name. If any input does not contain an Aggregate Bag file, rwaggbagcat prints an error to the standard error and exits abnormally.
When multiple AGGBAGFILEs are specified on the command line, each is handled individually. To process the files as a single Aggregate Bag, use rwaggbagtool(1) to combine the Aggregate Bags and pipe the output of rwaggbagtool into rwaggbagcat. Using --fields in this situation allows for a consistent output across the multiple files and causes the titles to appear only once. No value is printed if --fields names a key or counter that is not present in one of the files.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
Print only the key and/or counter fields given in this comma separated list. Fields are printed in the order given in FIELDS, and keys and counters may appear in any order or not at all. Specifying --fields only changes the order in which the columns are printed, it does not re-order the entries (rows) in the Aggregate Bag file. If FIELDS includes fields not present in an input Aggregate Bag file, prints the string specified for that field by --missing-field or an empty value. The title line is printed only one time even if multiple Aggregate Bag files are read.
The names of the fields that may appear in FIELDS are:
source IP address, IPv4 only
destination IP address, IPv4 only
next hop IP address, IPv4 only
a generic IPv4 address
source IP address, IPv6 only
destination IP address, IPv6 only
next hop IP address, IPv6 only
a generic IPv6 address
source port
destination port
a generic port
IP protocol
packet count
byte count
bit-wise OR of TCP flags over all packets
TCP flags on the first packet
bit-wise OR of TCP flags on the second through final packet
starting time in seconds
ending time in seconds
a generic time in seconds
duration of flow
sensor name or ID at the collection point
class at collection point
type at collection point
router SNMP ingress interface or vlanId
router SNMP egress interface or postVlanId
a generic SNMP value
flow attributes set by the flow generator
guess as to the content of the flow
ICMP type
ICMP code
the country code of the source
the country code of the destination
a generic country code
a generic key
counter: count of records that match the key
counter: sum of packet counts
counter: sum of byte counts
counter: sum of duration values
counter: a generic counter
Since SiLK 3.22.0.
When --fields is active, print STRING as the value for FIELD when FIELD is not present in the input Aggregate Bag file. The default value is the empty string. The switch may be repeated to set the missing value string for multiple fields. rwaggbagcat exits with an error if FIELD is not present in --fields or if this switch is specified but --fields is not. STRING may be any string. Since SiLK 3.22.0.
Specify the format, timezone, and/or modifier to use when printing timestamps. When this switch is not specified, the SILK_TIMESTAMP_FORMAT environment variable is checked for a format, timezone, and modifier. If it is empty or contains invalid values, timestamps are printed in the default format, and the timezone is UTC unless SiLK was compiled with local timezone support. FORMAT is a comma-separated list of a format, a timezone, and/or a modifier. The format is one of:
Print the timestamps as YYYY /MM/DDThh:mm:ss.sss.
Print the timestamps as YYYY -MM-DD hh:mm:ss.sss.
Print the timestamps as MM/DD/YYYY hh:mm:ss.sss.
Print the timestamps as the number of seconds since 00:00:00 UTC on 1970-01-01.
When a timezone is specified, it is used regardless of the default timezone support compiled into SiLK. The timezone is one of:
Use Coordinated Universal Time to print timestamps.
Use the TZ environment variable or the local timezone.
Specify how IP addresses are printed, where FORMAT is a comma-separated list of the arguments described below. When this switch is not specified, the SILK_IP_FORMAT environment variable is checked for a value and that format is used if it is valid. The default FORMAT is canonical.
Print IP addresses in the canonical format. If the column is IPv4, use dot-separated decimal (192.0.2.1). If the column is IPv6, use colon-separated hexadecimal (2001:db8::1) or a mixed IPv4-IPv6 representation for IPv4-mapped IPv6 addresses (the ::ffff:0:0/96 netblock, e.g., ::ffff:192.0.2.1) and IPv4-compatible IPv6 addresses (the ::/96 netblock other than ::/127, e.g., ::192.0.2.1).
Print IP addresses in the canonical format (192.0.2.1 or 2001:db8::1) but do not used the mixed IPv4-IPv6 representations. For example, use ::ffff:c000:201 instead of ::ffff:192.0.2.1. Since SiLK 3.17.0.
Print IP addresses as integers in decimal format. For example, print 192.0.2.1 and 2001:db8::1 as 3221225985 and 42540766411282592856903984951653826561, respectively.
Print IP addresses as integers in hexadecimal format. For example, print 192.0.2.1 and 2001:db8::1 as c00000201 and 20010db8000000000000000000000001, respectively.
Make all IP address strings contain the same number of characters by padding numbers with leading zeros. For example, print 192.0.2.1 and 2001:db8::1 as 192.000.002.001 and 2001:0db8:0000:0000:0000:0000:0000:0001, respectively. For IPv6 addresses, this setting implies no-mixed, so that ::ffff:192.0.2.1 is printed as 0000:0000:0000:0000:0000:ffff:c000:0201. As of SiLK 3.17.0, may be combined with any of the above, including decimal and hexadecimal.
The following arguments modify certain IP addresses prior to printing. These arguments may be combined with the above formats.
Change an IPv4 column to IPv4-mapped IPv6 addresses (addresses in the ::ffff:0:0/96 netblock) prior to formatting. Since SiLK 3.17.0.
For an IPv6 column, change any IPv4-mapped IPv6 addresses (addresses in the ::ffff:0:0/96 netblock) to IPv4 addresses prior to formatting. Since SiLK 3.17.0.
The following argument is also available:
Set FORMAT to map-v4,no-mixed.
Print the integer ID of the sensor rather than its name.
Print the TCP flag fields (flags, initialFlags, sessionFlags) as an integer value. Typically, the characters F,S,R,P,A,U,E,C are used to represent the TCP flags.
Turn off column titles. By default, titles are printed.
Disable fixed-width columnar output.
Use specified character between columns and after the final column. When this switch is not specified, the default of ’|’ is used.
Do not print the column separator after the final column. Normally a delimiter is printed.
Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default ’|’.
Write the textual output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output (and bypass the paging program). If PATH names an existing file, rwaggbagcat exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this option is not given, the output is either sent to the pager or written to the standard output.
When output is to a terminal, invoke the program PAGER_PROG to view the output one screen full at a time. This switch overrides the SILK_PAGER environment variable, which in turn overrides the PAGER variable. If the --output-path switch is given or if the value of the pager is determined to be the empty string, no paging is performed and all output is written to the terminal.
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwaggbagcat searches for the site configuration file in the locations specified in the FILES section.
Print the available options and exit.
Print the names and descriptions of the keys and counters that may be used in the --fields and --missing-field switches and exit. Since SiLK 3.22.0.
Print the version number and information about how SiLK was configured, then exit the application.
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.
The formatting switches on rwaggbagcat are similar to those on the other SiLK tools.
First, use rwaggbag(1) to create an Aggregate Bag file from the SiLK Flow file data.rw:
$ rwaggbag --key=sport,dport --counter=sum-pack,sum-byte \ --output-path=ab.aggbag data.rw
To print the contents of the Aggregate Bag file:
$ rwaggbagcat ab.aggbag | head -4 sPort|dPort| sum-packets| sum-bytes| 0| 0| 73452| 6169968| 0| 769| 15052| 842912| 0| 771| 14176| 793856|
Use the --fields switch (added in SiLK 3.22.0) to control the order of the columns in the output or to select only some columns:
$ rwaggbagcat --fields=dPort,sPort,sum-bytes ab.aggbag | head -4 dPort|sPort| sum-bytes| 0| 0| 6169968| 769| 0| 842912| 771| 0| 793856|
The --fields switch only changes the positions of the columns. The sPort field is still the primary key in the output shown above.
The --fields switch may also include fields that are not in the input. By default, rwaggbagcat prints an empty value for those fields, but the --missing-field switch may be used to display any string instead. The argument to --missing-field is FIELD=STRING where FIELD is one of the fields in --fields.
$ rwaggbagcat --fields=sipv4,proto,dport,sum-bytes \ --missing=sipv4=n/a ab.aggbag | head -4 sIPv4|pro|dPort| sum-bytes| n/a| | 0| 6169968| n/a| | 769| 842912| n/a| | 771| 793856|
When creating an Aggregate Bag file with the source IP address and protocol as keys, rwaggbagcat prints the columns in a different order depending on whether the address is treated as IPv4 or IPv6.
When the key is the source IPv4 address and the protocol, the Aggregate Bag is built with the source address as the primary key:
$ rwaggbag --key=sipv4,proto --counter=records data.rw \ | rwaggbagcat sIPv4|pro| records| 10.4.52.235| 6| 1| 10.5.231.251| 6| 1| 10.9.77.117| 6| 1|
Reading the same file but treating the data as IPv6 results in the protocol being the primary key:
$ rwaggbag --key=sipv6,proto --counter=records data.rw \ | rwaggbagcat pro| sIPv6| records| 1| ::ffff:10.40.151.242| 1| 1| ::ffff:10.44.140.138| 1| 1| ::ffff:10.53.204.62| 1|
In the latter case, the --fields may be used to display the source IPv6 address first, but the switch only changes the positions of the columns, it does not reorder the entries (rows):
$ rwaggbag --key=sipv6,proto --counter=records data.rw \ | rwaggbagcat --fields=sipv6,proto,records sIPv6|pro| records| ::ffff:10.40.151.242| 1| 1| ::ffff:10.44.140.138| 1| 1| ::ffff:10.53.204.62| 1| 1|
To produce comma separated data:
rwaggbagcat --delimited=, /tmp/ab.aggbag | head -4 sPort,dPort,sum-packets,sum-bytes 0,0,73452,6169968 0,769,15052,842912 0,771,14176,793856
To remove the title:
$ rwaggbagcat --no-title ab.aggbag | head -4 0| 0| 73452| 6169968| 0| 769| 15052| 842912| 0| 771| 14176| 793856| 0| 2048| 14356| 1205904|
To change the format of IP addresses:
$ rwaggbag --key=sipv4,dipv4 --counter=sum-pack,sum-byte data.rw \ | rwaggbagcat --ip-format=decimal | head -4 sIPv4| dIPv4| sum-packets| sum-bytes| 168047851|3232295339| 255| 18260| 168159227|3232293505| 331| 536169| 168381813|3232282689| 563| 55386|
To change the format of timestamps:
$ rwaggbag --key=stime,etime --counter=sum-pack,sum-byte data.rw \ | rwaggbagcat --timestamp-format=epoch | head -4 sTime| eTime| sum-packets| sum-bytes| 1234396802|1234396802| 2| 259| 1234396802|1234398594| 526| 38736| 1234396803|1234396803| 9| 504|
This environment variable is used as the value for --ip-format when that switch is not provided.
This environment variable is used as the value for --timestamp-format when that switch is not provided.
The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.
When set to a non-empty string, rwaggbagcat automatically invokes this program to display its output a screen at a time. If set to an empty string, rwaggbagcat does not automatically page its output.
When set and SILK_PAGER is not set, rwaggbagcat automatically invokes this program to display its output a screen at a time.
This environment variable is used as the value for the --site-config-file when that switch is not provided.
This environment variable specifies the root directory of data repository. As described in the FILES section, rwaggbagcat may use this environment variable when searching for the SiLK site configuration file.
This environment variable gives the root of the install tree. When searching for configuration files and plug-ins, rwaggbagcat may use this environment variable. See the FILES section for details.
When the argument to the --timestamp-format switch includes local or when a SiLK installation is built to use the local timezone, the value of the TZ environment variable determines the timezone in which rwaggbagcat displays timestamps. (If both of those are false, the TZ environment variable is ignored.) If the TZ environment variable is not set, the machine’s default timezone is used. Setting TZ to the empty string or 0 causes timestamps to be displayed in UTC. For system information on the TZ variable, see tzset(3) or environ(7). (To determine if SiLK was built with support for the local timezone, check the Timezone support value in the output of rwaggbagcat --version.)
Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.
The --fields, --missing-field, and --help-fields switches were added in SiLK 3.22.0.
rwaggbagcat and the other Aggregate Bag tools were introduced in SiLK 3.15.0.
rwaggbag(1), rwaggbagbuild(1), rwaggbagtool(1), silk(7), tzset(3), environ(7)
Manipulate binary Aggregate Bag files
rwaggbagtool [{ --remove-fields=REMOVE_LIST | --select-fields=SELECT_LIST | --to-bag=BAG_KEY,BAG_COUNTER | --to-ipset=FIELD [--ipset-record-version=VERSION] }] [--insert-field=FIELD=VALUE [--insert-field=FIELD2=VALUE2...]] [{ --add | --subtract | --divide }] [--zero-divisor-result={error | remove | maximum | VALUE}] [--scalar-multiply={VALUE | FIELD=VALUE} [--scalar-multiply={VALUE | FIELD=VALUE}...]] [--min-field=FIELD=VALUE [--min-field=FIELD=VALUE...]] [--max-field=FIELD=VALUE [--max-field=FIELD=VALUE...]] [--set-intersect=FIELD=FILE [--set-intersect=FIELD=FILE...]] [--set-complement=FIELD=FILE [--set-complement=FIELD=FILE...]] [--output-path=PATH [--modify-inplace [--backup-path=BACKUP]]] [--note-strip] [--note-add=TEXT] [--note-file-add=FILE] [--compression-method=COMP_METHOD] [--site-config-file=FILENAME] [AGGBAG_FILE [AGGBAG_FILE ...]]
rwaggbagtool --help
rwaggbagtool --help-fields
rwaggbagtool --version
rwaggbagtool performs operations on one or more Aggregate Bag files and creates a new Aggregate Bag file, a new Bag file, or an new IPset file. An Aggregate Bag is a binary file that maps a key to a counter, where the key and the counter are both composed of one or more fields. rwaggbag(1) and rwaggbagbuild(1) are the primary tools used to create an Aggregate Bag file. rwaggbagcat(1) prints a binary Aggregate Bag file as text.
The operations that rwaggbagtool supports are field manipulation (inserting or removing keys or counters), adding, subtracting, and dividing counters (all files must have the same keys and counters) across multiple Aggregate Bag files, multiplying all counters or only selected counters by a value, intersecting with an IPset, selecting rows based on minimum and maximum values of keys and counters, and creating a new IPset or Bag file.
rwaggbagtool processes the Aggregate Bag files listed on the command line. When no file names are specified, rwaggbagtool attempts to read an Aggregate Bag from the standard input. To read the standard input in addition to the named files, use - or stdin as a file name. If any input is not an Aggregate Bag file, rwaggbagtool prints an error to the standard error and exits with an error status.
By default, rwaggbagtool’s output is written to the standard output. Use --output-path to specify a different location. As of SiLK 3.21.0, rwaggbagtool supports the --modify-inplace switch which correctly handles the case when an input file is also used as the output file. That switch causes rwaggbagtool to write the output to a temporary file first and then replace the original output file. The --backup-path switch may be used in conjunction with --modify-inplace to set the pathname where the original output file is copied.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
The options are presented here in the order in which rwaggbagtool performs them: Field manipulation switches are applied to each file when it is read; multi-file operation switches combine the Aggregate Bags together; single-file operation switches are applied; filtering switches remove rows from the Aggregate Bag; the result is output as an Aggregate Bag, a standard Bag, or as an IPset.
The following switches allow modification of the fields in the Aggregate Bag file. The --remove-fields and --select-fields switches are mutually exclusive, and they reduce the number of fields in the Aggregate Bag input files. Those switches also conflict with --to-ipset and --to-bag which resemble field selectors. The --insert-field switch is applied after --remove-fields or --select-fields, and it adds a field unless that field is already present.
Remove the fields specified in REMOVE_LIST from each of the Aggregate Bag input files, where REMOVE_LIST is a comma-separated list of field names. This switch may include field names that are not in an Aggregate Bag input, and those field names are ignored. If a field name is included in this list and in a --insert-field switch, the field is given the value specified by the --insert-field switch, and the field is included in the output Aggregate Bag file. If removing a key field produces multiple copies of a key, the counters of those keys are merged. rwaggbagbuild exits with an error when this switch is used with --select-fields, --to-ipset, or --to-bag.
For each Aggregate Bag input file, only use the fields in SELECT_LIST, a comma-separated list of field names. Alternatively, consider this switch as removing all fields that are not included in SELECT_LIST. This switch may include field names that are not in an Aggregate Bag input, and those field names are ignored. When a field name is included in this list and in a --insert-field switch, the field uses its value from the input Aggregate Bag file if present, and it uses the value specified in the --insert-field switch otherwise. If selecting only some key fields produces multiple copies of a key, the counters of those keys are merged. rwaggbagbuild exits with an error when this switch is used with --remove-fields, --to-ipset, or --to-bag.
For each entry read from an Aggregate Bag input file, insert a field named FIELD and set its value to VALUE if one of the following is true: (1)the input file does not contain a field named FIELD or (2)the input file does have a field named FIELD but it was removed by either (2a)being listed in the --remove-fields list or (2b)not being listed in the --select-fields list. That is, this switch only inserts FIELD when FIELD is not present in the input Aggregate Bag, but specifying FIELD in --remove-fields removes it from the input. VALUE is a textual representation of the field’s value as described in the description of the --fields switch in the rwaggbagbuild(1) tool. This switch may be repeated in order to insert multiple fields. If --to-ipset or --to-bag is specified, --insert-field may only name a field that is an argument to that switch.
The following operations act on multiple Aggregate Bag files. These operations require all of the Aggregate Bag files to have the same set of key fields and counter fields. (Use the field manipulation switches to ensure this.) The values of the keys may differ, but the set of fields that comprise the key must match. It is an error if multiple operations are specified.
Sum each of the counters for each key for all the Aggregate Bag input files. The keys in the result are the union of the set of keys that appear in all input files. Addition operations that overflow an unsigned 64-bit value are set to the maximum (18446744073709551615). If no other operation is specified, the add operation is the default.
Subtract from the counters in the first Aggregate Bag file the counters in the second Aggregate Bag file, and repeat the process for each additional Aggregate Bag file. The keys in the result are a subset of the keys that appear in the first file: If a key does not appear in the first Aggregate Bag file, its counters are ignored in subsequent files. If a key does not appear in the second file, its counters in the first file are unchanged. Subtraction operations that result in a negative value are set to zero. If all counters for a key are zero, the key does not appear in the output.
Divide the counters in first Aggregate Bag file by the second Aggregate Bag file, and repeat the process for each additional Aggregate Bag file. The keys in the result are a subset of the keys that appear in the first file: If a key does not appear in the first Aggregate Bag file, its counters are ignored in subsequent files. If a key does not appear in the second file, its counters are treated as zero and the outcome is determined by the action specified by --zero-divisor-result. That option also determines the result when the two Aggregate Bag files have matching keys but a counter in the second bag is zero. If --zero-divisor-result is not given, rwaggbagtool exits with error if division by zero is detected. Since Aggregate Bag files do not support floating point numbers, the result of the division is rounded to the nearest integer (values ending in .5 are rounded up). Since SiLK 3.22.0.
While not an operation, the next switch is related to --divide and is described here.
Specify how to handle division by zero in the --divide operation, which can occur either because the first Aggregate Bag file (the dividend) contains a key that does not exist in the second file (the divisor) or because an individual counter in the divisor is zero. The supported arguments are:
Causes rwaggbagtool to exit with an error. This is the default when --zero-divisor-result is not given.
Tells rwaggbagtool to remove this key from the output.
Tells rwaggbagtool to leave the individual counter in the first Aggregate Bag unchanged.
Sets the individual counter to the maximum value supported, which is the maximum unsigned 64-bit value (18446744073709551615).
Sets the individual counter to VALUE, which can be any unsigned 64-bit value (0 to 18446744073709551615 inclusive).
This switch has no effect when --divide is not used. Since SiLK 3.22.0.
The following switch modifies the counters in an Aggregate Bag file. The operation may be combined with any of those from the previous section. This operation occurs after the above and before any filtering operation.
Multiply all counter fields or one counter field by a value. If the argument is a positive integer value (1 or greater), multiply all counters by that value. If the argument contains an equals sign, treat the part to the left as a counter’s field name and the part to the right as the multiplier for that field: a non-negative integer value (0 or greater). The maximum VALUE is 18446744073709551615. This switch may be repeated; when a counter name is repeated or the all-counters form is repeated, the final multiplier is the product of all the values. Since SiLK 3.22.0.
The following switches remove entries from the Aggregate Bag file based on a field’s value. These switches are applied immediately before the output is generated.
Remove from the Aggregate Bag file all entries where the value of the field FIELD is less than VALUE, where VALUE is a textual representation of the field’s value as described in the description of the --fields switch in the rwaggbagbuild(1) tool. This switch is ignored if FIELD is not present in the Aggregate Bag. This switch may be repeated. Since SiLK 3.17.0.
Remove from the Aggregate Bag file all entries where the value of the field FIELD is greater than VALUE, where VALUE is a textual representation of the field’s value as described in the description of the --fields switch in the rwaggbagbuild(1) tool. This switch is ignored if FIELD is not present in the Aggregate Bag. This switch may be repeated. Since SiLK 3.17.0.
Read an IPset from the stream SET_FILE, and remove from the Aggregate Bag file all entries where the value of the field FIELD is not present in the IPset. SET_FILE may be the name a file or the string - or stdin to read the IPset from the standard input. This switch is ignored if FIELD is not present in the Aggregate Bag. This switch may be repeated. Since SiLK 3.17.0.
Read an IPset from the stream SET_FILE, and remove from the Aggregate Bag file all entries where the value of the field FIELD is present in the IPset. SET_FILE may be the name a file or the string - or stdin to read the IPset from the standard input. This switch is ignored if FIELD is not present in the Aggregate Bag. This switch may be repeated. Since SiLK 3.17.0.
The following switches control the output.
After operating on the Aggregate Bag input files, create a (normal) Bag file from the resulting Aggregate Bag. Use the BAG_KEY field as the key of the Bag, and the BAG_COUNTER field as the counter of the Bag. Write the Bag to the standard output or the destination specified by --output-path. When this switch is used, the only legal field names that may be used in the --insert-field switch are BAG_KEY and BAG_COUNTER. rwaggbagbuild exits with an error when this switch is used with --remove-fields, --select-fields, or --to-ipset.
After operating on the Aggregate Bag input files, create an IPset file from the resulting Aggregate Bag by treating the values in the field named FIELD as IP addresses, inserting the IP addresses into the IPset, and writing the IPset to the standard output or the destination specified by --output-path. When this switch is used, the only legal field name that may be used in the --insert-field switch is FIELD. rwaggbagbuild exits with an error when this switch is used with --remove-fields, --select-fields, or --to-bag.
Specify the format of the IPset records that are written to the output when the --to-ipset switch is used. VERSION may be 2, 3, 4, 5 or the special value 0. When the switch is not provided, the SILK_IPSET_RECORD_VERSION environment variable is checked for a version. The default version is 0.
Use the default version for an IPv4 IPset and an IPv6 IPset. Use the --help switch to see the versions used for your SiLK installation.
Create a file that may hold only IPv4 addresses and is readable by all versions of SiLK.
Create a file that may hold IPv4 or IPv6 addresses and is readable by SiLK 3.0 and later.
Create a file that may hold IPv4 or IPv6 addresses and is readable by SiLK 3.7 and later. These files are more compact that version 3 and often more compact than version 2.
Create a file that may hold only IPv6 addresses and is readable by SiLK 3.14 and later. When this version is specified, IPsets containing only IPv4 addresses are written in version 4. These files are usually more compact that version 4.
Write the resulting Aggregate Bag, IPset (see --to-ipset), or Bag (see --to-bag) to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwaggbagtool exits with an error unless the --modify-inplace switch is given or the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If --output-path is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwaggbagtool to exit with an error.
Allow rwaggbagtool to overwrite an existing file and properly account for the output file (PATH) also being an input file. When this switch is given, rwaggbagtool writes the output to a temporary location first, then overwrites PATH. rwaggbagtool attempts to copy the permission, owner, and group from the original file to the new file. The switch is ignored when PATH does not exist or the output is the standard output or standard error. rwaggbagtool exits with an error when this switch is given and PATH is not a regular file. If rwaggbagtool encounters an error or is interrupted prior to closing the temporary file, the temporary file is removed. See also --backup-path. Since SiLK 3.21.0.
Move the file named by --output-path (PATH) to the path BACKUP immediately prior to moving the temporary file created by --modify-inplace over PATH. If BACKUP names a directory, the file is moved into that directory. This switch will overwrite an existing file. If PATH and BACKUP point to the same location, the output is written to PATH and no backup is created. If BACKUP cannot be created, the output is left in the temporary file and rwaggbagtool exits with a message and an error. rwaggbagtool exits with an error if this switch is given without --modify-inplace. Since SiLK 3.21.0.
Do not copy the notes (annotations) from the input files to the output file. Normally notes from the input files are copied to the output.
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.
Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.
Do not compress the output using an external library.
Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.
Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.
Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.
Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwaggbagtool searches for the site configuration file in the locations specified in the FILES section.
Print the available options and exit.
Print the names and descriptions of the fields that may be used in the command line options that require a field name. Since SiLK 3.22.0.
Print the version number and information about how SiLK was configured, then exit the application.
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.
Read today’s incoming flow records by type and use rwaggbag(1) to create an Aggregate Bag file for each, in.aggbag and inweb.aggbag, that count records using the protocol and both ports as the key. Add the counters in the two files to create total.aggbag. Use rwaggbagcat(1) to display the result.
$ rwfilter --type=in --all=- \ | rwaggbag --key=sport,dport,proto --counter=records \ --output-path=in.aggbag $ rwfilter --type=inweb --all=- \ | rwaggbag --key=sport,dport,proto --counter=records \ --output-path=inweb.aggbag $ rwaggbagtool --add in.aggbag inweb.aggbag --output-path=total.aggbag $ rwaggbagcat total.aggbag
Subtract inweb.aggbag from total.aggbag.
$ rwaggbagtool --subtract total.aggbag inweb.aggbag \ | rwaggbagcat
Compute the percent of all incoming traffic per protocol and ports that was stored in the inweb type by multiplying the counters in inweb.aggbag by 100 and dividing by total.aggbag.
$ rwaggbagtool --scalar-multiply=100 inweb.aggbag \ | rwaggbagtool --divide stdin total.aggbag \ | rwaggbagcat
Create an Aggregate Bag file from data.rw where the ports are the key and that sums the bytes and packets.
$ rwaggbag --key=sport,dport \ --counter=sum-bytes,sum-packets data.rw \ --output-path=my-ab.aggbag
Using the previous file, get just the source port and byte count from the file my-ab.aggbag. One approach is to remove the destination port and packet count.
$ rwaggbagtool --remove=dport,sum-packets my-ab.aggbag \ --output-path=source-bytes.aggbag
The other approach selects the source port and byte count.
$ rwaggbagtool --select=sport,sum-bytes my-ag.aggbag \ --output-path=source-bytes.aggbag
To replace the packet count in my-ab.aggbag with zeros, remove the field and insert it with the value you want.
$ rwaggbagtool --remove=sum-packets --insert=sum-packets=0 \ my-ab.aggbag --output-path=zero-packets.aggbag
To create a regular Bag with the source port and byte count from my-ab.aggbag, use the --to-bag switch:
$ rwaggbagtool --to-bag=sport,sum-bytes my-ab.aggbag \ --output-path=sport-byte.bag
The --to-ipset switch works similarly:
$ rwaggbag --key=sipv6,dipv6 --counter=records data-v6.rw \ --output-path=ips.aggbag $ rwaggbagtool --to-ipset=dipv6 --output-path=dip.set
This environment variable is used as the value for the --ipset-record-version when that switch is not provided.
The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.
This environment variable is used as the value for --compression-method when that switch is not provided.
This environment variable is used as the value for the --site-config-file when that switch is not provided.
This environment variable specifies the root directory of data repository. As described in the FILES section, rwaggbagtool may use this environment variable when searching for the SiLK site configuration file.
This environment variable gives the root of the install tree. When searching for configuration files, rwaggbagtool may use this environment variable. See the FILES section for details.
Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.
The Aggregate Bag tools were added in SiLK 3.15.0.
SiLK 3.17.0 added the --min-field, --max-field, --set-intersect, and --set-complement switches.
Support for country codes was added in SiLK 3.19.0.
The --modify-inplace switch was added in SiLK 3.21. When --backup-path is also given, there is a small time window when the original file does not exist: the time between moving the original file to the backup location and moving the temporary file into place.
rwaggbag(1), rwaggbagbuild(1), rwaggbagcat(1), rwfilter(1), rwfileinfo(1), silk(7), zlib(3)
Append SiLK Flow file(s) to an existing SiLK Flow file
rwappend [--create=[TEMPLATE_FILE]] [--print-statistics] [--site-config-file=FILENAME] TARGET_FILE SOURCE_FILE [SOURCE_FILE...]
rwappend --help
rwappend --version
rwappend reads SiLK Flow records from the specified SOURCE_FILEs and appends them to the TARGET_FILE. If stdin is used as the name of one of the SOURCE_FILEs, SiLK flow records will be read from the standard input.
When the TARGET_FILE does not exist and the --create switch is not provided, rwappend will exit with an error. When --create is specified and TARGET_FILE does not exist, rwappend will create the TARGET_FILE using the same format, version, and byte-order as the specified TEMPLATE_FILE. If no TEMPLATE_FILE is given, the TARGET_FILE is created in the default format and version (the same format that rwcat(1) would produce).
The TARGET_FILE must be an actual file---it cannot be a named pipe or the standard output. In addition, the header of TARGET_FILE must not be compressed; that is, you cannot append to a file whose entire contents has been compressed with gzip (those files normally end in the .gz extension).
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
Create the TARGET_FILE if it does not exist. The file will have the same format, version, and byte-order as the TEMPLATE_FILE if it is provided; otherwise the defaults are used. The TEMPLATE_FILE will NOT be appended to TARGET_FILE unless it also appears in as the name of a SOURCE_FILE.
Print to the standard error the number of records read from each SOURCE_FILE and the total number of records appended to the TARGET_FILE.
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwappend searches for the site configuration file in the locations specified in the FILES section.
Print the available options and exit.
Print the version number and information about how SiLK was configured, then exit the application.
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.
Standard usage where the file to append to, results.rw, exists:
$ rwappend results.rw sample5.rw sample6.rw
To append files sample*.rw to results.rw, or to create results.rw using the same format as the first file argument (note that sample1.rw must be repeated):
$ rwappend results.rw --create=sample1.rw \ sample1.rw sample2.rw
If results.rw does not exist, the following two commands are equivalent:
$ rwappend --create results.rw sample1.rw sample2.rw
$ rwcat sample1.rw sample2.rw > results.rw
This environment variable is used as the value for the --site-config-file when that switch is not provided.
This environment variable specifies the root directory of data repository. As described in the FILES section, rwappend may use this environment variable when searching for the SiLK site configuration file.
This environment variable gives the root of the install tree. When searching for configuration files, rwappend may use this environment variable. See the FILES section for details.
Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.
rwcat(1), silk(7)
When a SOURCE_FILE contains IPv6 flow records and the TARGET_FILE only supports IPv4 records, rwappend converts IPv6 records that contain addresses in the ::ffff:0:0/96 prefix to IPv4 and writes them to the TARGET_FILE. rwappend silently ignores IPv6 records having addresses outside of that prefix.
rwappend makes some attempts to avoid appending a file to itself (which would eventually exhaust the disk space) by comparing the names of files it is given; it should be smarter about this.
Build a binary Bag from SiLK Flow records
rwbag --bag-file=KEY,COUNTER,OUTPUTFILE [--bag-file=KEY,COUNTER,OUTPUTFILE ...] [{ --pmap-file=PATH | --pmap-file=MAPNAME:PATH }] [--note-strip] [--note-add=TEXT] [--note-file-add=FILE] [--invocation-strip] [--print-filenames] [--copy-input=PATH] [--compression-method=COMP_METHOD] [--ipv6-policy={ignore,asv4,mix,force,only}] [--site-config-file=FILENAME] {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwbag --help
rwbag --legacy-help
rwbag --version
LEGACY SYNOPSIS
rwbag [--sip-flows=OUTPUTFILE] [--dip-flows=OUTPUTFILE] [--sport-flows=OUTPUTFILE] [--dport-flows=OUTPUTFILE] [--proto-flows=OUTPUTFILE] [--sensor-flows=OUTPUTFILE] [--input-flows=OUTPUTFILE] [--output-flows=OUTPUTFILE] [--nhip-flows=OUTPUTFILE] [--sip-packets=OUTPUTFILE] [--dip-packets=OUTPUTFILE] [--sport-packets=OUTPUTFILE] [--dport-packets=OUTPUTFILE] [--proto-packets=OUTPUTFILE] [--sensor-packets=OUTPUTFILE] [--input-packets=OUTPUTFILE] [--output-packets=OUTPUTFILE] [--nhip-packets=OUTPUTFILE] [--sip-bytes=OUTPUTFILE] [--dip-bytes=OUTPUTFILE] [--sport-bytes=OUTPUTFILE] [--dport-bytes=OUTPUTFILE] [--proto-bytes=OUTPUTFILE] [--sensor-bytes=OUTPUTFILE] [--input-bytes=OUTPUTFILE] [--output-bytes=OUTPUTFILE] [--nhip-bytes=OUTPUTFILE] [--note-add=TEXT] [--note-file-add=FILE] [--print-filenames] [--copy-input=PATH] [--compression-method=COMP_METHOD] [--ipv6-policy={ignore,asv4,mix,force,only}] [--site-config-file=FILENAME] {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwbag reads SiLK Flow records and builds one or more Bag files. A Bag is similar to a set but each key is associated with a counter. Usually the key is some aspect of a flow record (an IP address, a port, the protocol, et cetera), and the counter is a volume (such as the number of flow records or the sum or bytes or packets) for the flow records that match that key. A Bag file supports a single key field and a single counter field; use the Aggregate Bag tools (e.g., rwaggbag(1)) when the key or counter contains multiple fields.
The --bag-file switch is required and it specifies how to create a Bag file. The argument to the switch names the key field to use for the bag, the counter field, and the location where the bag file is to be written. The switch may be repeated to create multiple Bag files.
rwbag reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwbag reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.
If adding a value to a key would cause the value to overflow the maximum value that Bags support, the key’s value will be set to the maximum and processing will continue. In addition, if this is the first value to overflow in this Bag, a warning will be printed to the standard error.
If rwbag runs out of memory, it will exit immediately. The output Bag files will remain behind, each with a size of 0 bytes.
Use rwbagcat(1) to see the contents of a bag. To create a bag from textual input or from an IPset, use rwbagbuild(1). rwbagtool(1) allows you to manipulate binary bag files.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
Bin flow records by unique KEY, compute the COUNTER for each bin, and write the result to OUTPUTFILE. The list of available KEY and COUNTER values are given immediately below. OUTPUTFILE is the name of a non-existent file, a named pipe, or the keyword stdout or - to write the binary Bag to the standard output. Repeat the --bag-file switch to create multiple Bag files in a single pass over the data. Only one OUTPUTFILE may use the standard output. See LEGACY BAG CREATION SWITCHES for deprecated methods to create Bag files. This switch or one of legacy equivalents is required. Since SiLK 3.12.0.
rwbag supports the following names for KEY. The case of KEY is ignored.
source IP address, either IPv4 or IPv6
source IP address, either IPv4 or IPv6
destination IP address, either IPv4 or IPv6
destination IP address, either IPv4 or IPv6
source port for TCP or UDP, or equivalent
destination port for TCP or UDP, or equivalent
IP protocol
count of packets recorded for this flow record
count of bytes recorded for this flow record
bit-wise OR of TCP flags over all packets in the flow
starting time of the flow, in seconds resolution
duration of the flow, in seconds resolution
ending time of the flow, in seconds resolution
numeric ID of the sensor where the flow was collected
router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))
router SNMP output interface or postVlanId
router next hop IP address, either IPv4 or IPv6
router next hop IP address, either IPv4 or IPv6
TCP flags on first packet in the flow
bit-wise OR of TCP flags over all packets except the first in the flow
flow attributes set by the flow generator
guess as to the content of the flow
the country code of the source IP address. Uses the mapping file specified by the SILK_COUNTRY_CODES environment variable or the country_codes.pmap mapping file, as described in FILES. (See also ccfilter(3).) Since SiLK 3.12.0.
an alias for sip-country
the country code of the destination IP address
an alias for dip-country
the value that the source IP address maps to in the mapping file whose map-name is MAPNAME. The type of that prefix map must be IPv4-address or IPv6-address. Use --pmap-file to load the mapping file and optionally set its map-name. Since the MAPNAME must be known when the --bag-file switch is parsed, the --pmap-file switch(es) should precede the --bag-file switch(es).
the value that the destination IP address maps to in the mapping file whose map-name is MAPNAME. See sip-pmap:MAPNAME.
the value that the protocol/source-port pair maps to in the mapping file whose map-name is MAPNAME. The type of that prefix map must be proto-port. Use --pmap-file to load the mapping file and optionally set its map-name. Since the MAPNAME must be known when the --bag-file switch is parsed, the --pmap-file switch(es) should precede the --bag-file switch(es).
the value that the protocol/destination-port pair maps to in the mapping file whose map-name is MAPNAME. See sport-pmap:MAPNAME.
rwbag supports the following names for COUNTER. The case of COUNTER is ignored.
count of the number of flow records that match the key
an alias for records
the sum of the packet counts for flow records that match the key
an alias for sum-packets
the sum of the byte counts for flow records that match the key
an alias for sum-bytes
Load the the prefix map file from PATH for use when the key part of the argument to the --bag-file switch is one of sip-pmap, dip-pmap, sport-pmap, or dport-pmap. Specify PATH as - or stdin to read from the standard input. If MAPNAME is specified, it overrides the map-name contained in the prefix map file itself. If no map-name is available, rwbag exits with an error. The switch may be repeated to load multiple prefix map files; each file must have a unique map-name. To create a prefix map file, use rwpmapbuild(1). Since SiLK 3.12.0.
Do not copy the notes (annotations) from the input files to the output file(s). When this switch is not specified, notes from the input files are copied to the output. Since SiLK 3.12.2.
Add the specified TEXT to the header of every output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
Open FILENAME and add the contents of that file to the header of every output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.
Do not record any command line history: do not copy the invocation history from the input files to the output file(s), and do not record the current command line invocation in the output. The invocation may be viewed with rwfileinfo(1). Since SiLK 3.12.0.
Print to the standard error the names of input files as they are opened.
Copy all binary SiLK Flow records read as input to the specified file or named pipe. PATH may be stdout or - to write flows to the standard output as long as no Bag file is being written there.
Determine how IPv4 and IPv6 flows are handled when SiLK has been compiled with IPv6 support. When the switch is not provided, the SILK_IPV6_POLICY environment variable is checked for a policy. If it is also unset or contains an invalid policy, the POLICY is mix. When SiLK has not been compiled with IPv6 support, IPv6 flows are always ignored, regardless of the value passed to this switch or in the SILK_IPV6_POLICY variable. The supported values for POLICY are:
Ignore any flow record marked as IPv6, regardless of the IP addresses it contains. Only IP addresses contained in IPv4 flow records will be added to the bag(s).
Convert IPv6 flow records that contain addresses in the ::ffff:0:0/96 netblock (that is, IPv4-mapped IPv6 addresses) to IPv4 and ignore all other IPv6 flow records.
Process the input as a mixture of IPv4 and IPv6 flow records. When creating a bag whose key is an IP address and the input contains IPv6 addresses outside of the ::ffff:0:0/96 netblock, this policy is equivalent to force; otherwise it is equivalent to asv4.
Convert IPv4 flow records to IPv6, mapping the IPv4 addresses into the ::ffff:0:0/96 netblock.
Process only flow records that are marked as IPv6. Only IP addresses contained in IPv6 flow records will be added to the bag(s).
Regardless of the IPv6 policy, when all IPv6 addresses in the bag are in the ::ffff:0:0/96 netblock, rwbag treats them as IPv4 addresses and writes an IPv4 bag. When any other IPv6 addresses are present in the bag, the IPv4 addresses in the bag are mapped into the ::ffff:0:0/96 netblock and rwbag writes an IPv6 bag.
Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.
Do not compress the output using an external library.
Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.
Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.
Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.
Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwbag searches for the site configuration file in the locations specified in the FILES section.
Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwbag opens each named file in turn and reads records from it as if the filenames had been listed on the command line.
Print the available options and exit.
Print help, including legacy switches. See the LEGACY BAG CREATION SWITCHES section below for these switches.
Print the version number and information about how SiLK was configured, then exit the application.
The following switches are deprecated as of SiLK 3.12.0. These switches may be used in conjunction with the --bag-file switch.
Equivalent to --bag-file=sIPv4,records,OUTPUTFILE. Count number of flows by unique source IP.
Equivalent to --bag-file=sIPv4,sum-packets,OUTPUTFILE. Count number of packets by unique source IP.
Equivalent to --bag-file=sIPv4,sum-bytes,OUTPUTFILE. Count number of bytes by unique source IP.
Equivalent to --bag-file=dIPv4,records,OUTPUTFILE. Count number of flows by unique destination IP.
Equivalent to --bag-file=dIPv4,sum-packets,OUTPUTFILE. Count number of packets by unique destination IP.
Equivalent to --bag-file=dIPv4,sum-bytes,OUTPUTFILE. Count number of bytes by unique destination IP.
Equivalent to --bag-file=sPort,records,OUTPUTFILE. Count number of flows by unique source port.
Equivalent to --bag-file=sPort,sum-packets,OUTPUTFILE. Count number of packets by unique source port.
Equivalent to --bag-file=sPort,sum-bytes,OUTPUTFILE. Count number of bytes by unique source port.
Equivalent to --bag-file=dPort,records,OUTPUTFILE. Count number of flows by unique destination port.
Equivalent to --bag-file=dPort,sum-packets,OUTPUTFILE. Count number of packets by unique destination port.
Equivalent to --bag-file=dPort,sum-bytes,OUTPUTFILE. Count number of bytes by unique destination port.
Equivalent to --bag-file=protocol,records,OUTPUTFILE. Count number of flows by unique protocol.
Equivalent to --bag-file=protocol,sum-packets,OUTPUTFILE. Count number of packets by unique protocol.
Equivalent to --bag-file=protocol,sum-bytes,OUTPUTFILE. Count number of bytes by unique protocol.
Equivalent to --bag-file=sensor,records,OUTPUTFILE. Count number of flows by unique sensor ID.
Equivalent to --bag-file=sensor,sum-packets,OUTPUTFILE. Count number of packets by unique sensor ID.
Equivalent to --bag-file=sensor,sum-bytes,OUTPUTFILE. Count number of bytes by unique sensor ID.
Equivalent to --bag-file=input,records,OUTPUTFILE. Count number of flows by unique input interface index.
Equivalent to --bag-file=input,sum-packets,OUTPUTFILE. Count number of packets by unique input interface index.
Equivalent to --bag-file=input,sum-bytes,OUTPUTFILE. Count number of bytes by unique input interface index.
Equivalent to --bag-file=output,records,OUTPUTFILE. Count number of flows by unique output interface index.
Equivalent to --bag-file=output,sum-packets,OUTPUTFILE. Count number of packets by unique output interface index.
Equivalent to --bag-file=output,sum-bytes,OUTPUTFILE. Count number of bytes by unique output interface index.
Equivalent to --bag-file=nhIPv4,records,OUTPUTFILE. Count number of flows by unique next hop IP.
Equivalent to --bag-file=nhIPv4,sum-packets,OUTPUTFILE. Count number of packets by unique next hop IP.
Equivalent to --bag-file=nhIPv4,sum-bytes,OUTPUTFILE. Count number of bytes by unique next hop IP.
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.
Read the SiLK Flow file data.rw and create the Bag proto-byte.bag that contains the total byte-count seen for each protocol by using protocol as the key and sum-bytes as the counter:
$ rwbag --bag-file=protocol,sum-bytes,proto-byte.bag data.rw
Use rwbagcat(1) to view the result:
$ rwbagcat proto-byte.bag 1| 10695328| 6| 120536195111| 17| 24500079|
Specify the output path as - to pass the Bag file from rwbag directly into rwbagcat.
$ rwbag --bag-file=protocol,sum-bytes,- data.rw \ | rwbagcat 1| 10695328| 6| 120536195111| 17| 24500079|
Compare that to this rwuniq(1) command.
$ rwuniq --field=protocol --value=bytes --sort-output data.rw pro| Bytes| 1| 10695328| 6| 120536195111| 17| 24500079|
One advantage of Bag files over rwuniq is that the data remains in binary form where it can be manipulated by rwbagtool(1).
Read records from rwfilter(1) and build Bag files sip-flow.bag and dip-flow.bag that count the number of flows seen for each source address and for each destination address, respectively.
$ rwfilter ... --pass=stdout \ | rwbag --bag-file=sipv4,records,sip-flow.bag \ --bag-file=dipv4,records,dip-flow.bag
To create sip16-byte.bag that contains the number of bytes seen for each /16 found in the source address field, use the rwnetmask(1) tool prior to feeding the input to rwbag:
$ rwfilter ... --pass=stdout \ | rwnetmask --4sip-prefix-length=16 \ | rwbag --bag-file=sipv4,sum-bytes,sip16-byte.bag
$ rwbagcat sip16-byte.bag | head -4 10.4.0.0| 18260| 10.5.0.0| 536169| 10.9.0.0| 55386| 10.11.0.0| 5110438|
To print the IP addresses of an existing Bag into /16 prefixes, use the --network-structure switch of rwbagcat(1).
$ rwfilter ... --pass=stdout \ | rwbag --bag-file=sipv4,sum-bytes,- \ | rwbagcat --network-structure=B \ | head -4 10.4.0.0/16| 18260| 10.5.0.0/16| 536169| 10.9.0.0/16| 55386| 10.11.0.0/16| 5110438|
As of SiLK 3.12.0, a Bag file may contain a country code as its key. Create scc-pkt.bag that sums the packet count by country.
$ rwbag --bag-file=sip-country,sum-packets,scc-pkt.bag $ rwbagcat scc-pkt.bag --| 840| a1| 284| a2| 1| ae| 8|
rwbag and rwbagbuild(1) can use a prefix map file as the key in a Bag file as of SiLK 3.12.0. For example, to lookup each source address in the prefix map file ip-map.pmap that maps from address to ”type of service”, use the --pmap-file switch to specify the prefix map file, and specify the Bag’s key as sip-pmap:map-name, where map-name is either the map-name stored in the prefix map file or a name that is provided as part of the --pmap-file argument. (A prefix map’s map-name is available via the rwfileinfo(1) command.)
$ rwfileinfo --field=prefix-map ip-map.pmap ip-map.pmap: prefix-map v1: service-host $ $ rwbag --pmap-file=ip-map.pmap \ --bag-file=sip-pmap:service-host,bytes,srvhost.bag \ data.rw
Multiple --pmap-file switches may be specified which may be useful when generating multiple Bag files in a single invocation. On the command line, the --pmap-file switch that defines the map-name must preceded the --bag-file where the map-name is used.
The prefix map file is not stored as part of the Bag, so you must provide the name of the prefix map when running rwbagcat.
$ rwbagcat srvhost.bag rwbagcat: The --pmap-file switch is required for \ Bags containing sip-pmap keys $ rwbagcat --pmap-file=ip-map.pmap srvhost.bag external| 59950837766| internal| 60602999159| ntp| 588316| dns| 14404581| dhcp| 2560696|
rwbag also has support for prefix map files that map from a protocol-port pair to a label. The proto-port.pmap file does not have a map-name so a name must be provided on the rwbag command line.
$ rwfileinfo --field=prefix-map proto-port.pmap proto-port.pmap: $ $ rwbag --pmap-file=srvport:proto-port.pmap \ --bag-file=sip-pmap:srvport,flows,srvport.bag \ data.rw $ rwbagcat --pmap-file=proto-port.pmap srvport.bag | head -4 ICMP| 15622| UDP| 62216| UDP/DNS| 62216| UDP/DHCP| 15614|
This environment variable allows the user to specify the country code mapping file that rwbag uses when mapping an IP to a country for the sip-country and dip-country keys. The value may be a complete path or a file relative to the SILK_PATH. See the FILES section for standard locations of this file.
This environment variable is used as the value for --ipv6-policy when that switch is not provided.
The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.
This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.
This environment variable is used as the value for the --site-config-file when that switch is not provided.
This environment variable specifies the root directory of data repository. As described in the FILES section, rwbag may use this environment variable when searching for the SiLK site configuration file.
This environment variable gives the root of the install tree. When searching for configuration files, rwbag may use this environment variable. See the FILES section for details.
Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.
Possible locations for the country code mapping file required by the sip-country and dip-country keys.
rwbagbuild(1), rwbagcat(1), rwbagtool(1), rwaggbag(1), rwfileinfo(1), rwfilter(1), rwnetmask(1), rwpmapbuild(1), rwuniq(1), ccfilter(3), sensor.conf(5), silk(7), zlib(3)
Create a binary Bag from non-flow data
rwbagbuild { --set-input=SETFILE | --bag-input=TEXTFILE } [--delimiter=C] [--proto-port-delimiter=C] [--default-count=DEFAULTCOUNT] [--key-type=FIELD_TYPE] [--counter-type=FIELD_TYPE] [{ --pmap-file=PATH | --pmap-file=MAPNAME:PATH }] [--note-add=TEXT] [--note-file-add=FILE] [--invocation-strip] [--compression-method=COMP_METHOD] [--output-path=PATH]
rwbagbuild --help
rwbagbuild --version
rwbagbuild builds a binary Bag file from an IPset file or from textual input. A Bag is a set of keys where each key is associated with a counter. Usually the key is some aspect of a flow record (an IP address, a port, the protocol, et cetera), and the counter is a volume (such as the number of flow records or the sum or bytes or packets) for the flow records that match that key.
Either --set-input or --bag-input must be provided to specify the type and the location of the input file. To read from the standard input, specify stdin or - as the argument to the switch.
Each occurrence of a unique key adds a counter value to the Bag file for that key, where the counter is the value specified by --default-count, a value specified on a line in the textual input, or a fallback value of 1. If the addition causes an overflow of the maximum counter value (18446744073709551614), the counter is set to the maximum. A message is printed to the standard error the first time an overflow condition is detected.
When creating a Bag from an IPset, the count associated with each IP address is the value specified by the --default-count switch or 1 if the switch is not provided.
If the --key-type is sip-country, dip-country, or any-country, each IP address is mapped to its country code using the country code mapping file (see FILES) and that key is added to the Bag file with the --default-count value.
If the --key-type is sip-pmap, dip-pmap, or any-ip-pmap, each IP address is mapped to a value found in the prefix map file specified in --pmap-file and that value is added to the Bag file with the --default-count value.
The textual input read from the argument to the --bag-input switch is processed a line at a time. Comments begin with a ’#’-character and continue to the end of the line; they are stripped from each line. Any line that is blank or contains only whitespace is ignored. All other lines must contain a valid key or key-counter pair; whitespace around the key and counter is ignored. The key and counter are separated by a one-character delimiter. The default delimiter is vertical bar (’|’); use --delimiter to specify a different delimiter.
Each line that is not ignored must begin with a key. The accepted formats of the key are described below.
When the --default-count switch is given, rwbagtool only parses the key and ignores everything on a line to the right of the first delimiter. To re-iterate, the --default-count switch overrides any counter present on the line.
If the delimiter is not present on a line, rwbagtool parses the key and adds the --default-count value (or the fallback value of 1) to the Bag for that key.
When --default-count is not given, any text between the first delimiter and optional second delimiter on a line is treated as the counter. If the counter contains only whitespace, the counter for the key is incremented by 1; otherwise, the counter must be a (decimal) number from 0 to 18446744073709551614 inclusive. If a second delimiter is present, it and any text that follows it is ignored.
rwbagbuild prints an error and exits when a key or counter cannot be parsed.
Format of the counter
The counter is any non-negative (decimal) integer value from 0 to 18446744073709551614 inclusive (the maximum is one less than the maximum unsigned 64-bit value). When writing the Bag file, keys whose counter is zero are not written to the file.
Format of the Key
The key is a 32-bit integer, an IP address, a CIDR block, a SiLK IPWildcard, or a pair of numbers when the key-type is a protocol-port prefix map file.
For key-types that use fewer than 32-bits, rwbagbuild does not verify the validity of the key. For example, it is possible to have 257 as a key in Bag whose key-type is protocol.
rwbagbuild parses specific key-types as follows:
key is an IPv4 address or a 32-bit value; key-type set to corresponding IPv6 type when an IPv6 address is present. A CIDR block or SiLK IPWildcard representing multiple addresses adds multiple entries to the Bag
key is an IPv6 address. An IPv4 address is mapped into the ::ffff:0:0/96 netblock. All keys must be IP addresses (integers are not allowed).
key is the numeric value of the flags, 17 = FIN|ACK
key is seconds since the UNIX epoch
key represents seconds
key is the numeric sensor ID
key is an IP address; the country_codes.pmap prefix map file is used to map the IP to a country code that is stored in the Bag
key is an IP address; the specified --prefix-map file is used to map the IP to a value that is stored in the Bag
key is comprised of two numbers separated by a delimiter: a protocol (8-bit number) and a port (16-bit number). Those values are looked up in the specified --prefix-map file and the result is stored in the Bag. The delimiter separating the protocol and port may be set by --proto-port-delimiter. If not explicitly set, it is the same as the delimiter specified to --delimiter. The default delimiter is ’|’.
these bits of the key are relevant, though any 32-bit value is accepted: 0x08=F, 0x10=S, 0x20=T, 0x40=C
key is treated as a number
An IP address or integer key must be expressed in one of the following formats. rwbagbuild complains if the key field contains a mixture of IPv6 addresses and integer values.
Dotted decimal---all 4 octets are required:
10.1.2.4
An unsigned 32-bit integer:
167838212
An IPv6 address in canonical format (when SiLK has been compiled with IPv6 support):
2001:db8:a:1::2:4 ::ffff:10.1.2.4
Any of the above with a CIDR designation---for dotted decimal all four octets are still required:
10.1.2.4/31 167838212/31 2001:db8:a:1::2:4/127 ::ffff:10.1.2.4/31
SiLK IP wildcard notation. A SiLK IP Wildcard can represent multiple IPv4 or IPv6 addresses. An IP Wildcard contains an IP in its canonical format, except each part of the IP (where part is an octet for IPv4 or a hexadectet for IPv6) may be a single value, a range, a comma separated list of values and ranges, or the letter x to signify all values for that part of the IP (that is, 0-255 for IPv4). You may not specify a CIDR suffix when using the IP Wildcard notation.
10.x.1-2.4,5 2001:db8:a:x::1-2:4,5
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
The first two switches control the type of input; exactly one must be provided:
Create a Bag from an IPset. SETFILE is a filename, a named pipe, or the keyword stdin or - to read the IPset from the standard input. Counts have a volume of 1 when the --default-count switch is not specified. (IPsets are typically created by rwset(1) or rwsetbuild(1).)
Create a Bag from a delimited text file. TEXTFILE is a filename, a named pipe, or the keyword stdin or - to read the text from the standard input. See the DESCRIPTION section for the syntax of the TEXTFILE.
Expect the character C between each key-counter pair in the TEXTFILE read by the --bag-input switch. The default delimiter is the vertical pipe (’|’). The delimiter is ignored if the --set-input switch is specified. When the delimiter is a whitespace character, any amount of whitespace may surround and separate the key and counter. Since ’#’ is used to denote comments and newline is used to denote records, neither is a valid delimiter character.
Expect the character C between the protocol and port that comprise a key when the --key-type is sport-pmap, dport-pmap, or any-port-pmap. Unless this switch is specified, rwbagbuild expects the key-counter delimiter to appear between the protocol and port.
Override the counts of all values in the input text or IPset with the value of DEFAULTCOUNT. DEFAULTCOUNT must be a positive integer from 1 to 18446744073709551614 inclusive.
Write a entry into the header of the Bag file that specifies the key contains FIELD_TYPE values. When this switch is not specified, the key type of the Bag is set to custom. The FIELD_TYPE is case insensitive. The supported FIELD_TYPEs are:
source IP address, IPv4 only
destination IP address, IPv4 only
source port
destination port
IP protocol
packets, see also sum-packets
bytes, see also sum-bytes
an unsigned bitwise OR of TCP flags
starting time of the flow record, seconds resolution
duration of the flow record, seconds resolution
ending time of the flow record, seconds resolution
sensor ID
SNMP input
SNMP output
next hop IP address, IPv4 only
TCP flags on first packet in the flow
bitwise OR of TCP flags on all packets in the flow except the first
flow attributes set by the flow generator
guess as to the content of the flow, as set by the flow generator
class of the sensor
type of the sensor
an encoded version of the ICMP type and code, where the type is in the upper byte and the code is in the lower byte
source IP, IPv6
destination IP, IPv6
next hop IP, IPv6
count of flows
sum of packet counts
sum of byte counts
sum of duration values
a generic IPv4 address
a generic IPv6 address
a generic port
a generic SNMP value
a generic time value, in seconds resolution
the country code of the source IP address. For textual input, the key column must contain an IP address or an integer. rwbagbuild maps the IP address to a country code and stores the country code in the bag. Uses the mapping file specified by the SILK_COUNTRY_CODES environment variable or the country_codes.pmap mapping file, as described in FILES. (See also ccfilter(3).) Since SiLK 3.12.0.
the country code of the destination IP. See sip-country. Since SiLK 3.12.0.
the country code of any IP address. See sip-country. Since SiLK 3.12.0.
a prefix map value found from a source IP address. Maps each IP address in the key column to a value from a prefix map file and stores the value in the bag. The type of the prefix map must be IPv4-address or IPv4-address. Use the --pmap-file switch to specify the path to the file. Since SiLK 3.12.0.
a prefix map value found from a destination IP address. See sip-pmap. Since SiLK 3.12.0.
a prefix map value found from any IP address. See sip-pmap. Since SiLK 3.12.0.
a prefix map value found from a protocol/source-port pair. Each key must contain two values, a protocol and a port. Maps each protocol/port pair to a value from a prefix map file and stores the value in the bag. The type of the prefix map must be proto-port. Use the --pmap-file switch to specify the path to the file. Since SiLK 3.12.0.
a prefix map value found from a protocol/destination-port pair. See sport-pmap. Since SiLK 3.12.0.
a prefix map value found from a protocol/port pair. See sport-pmap. Since SiLK 3.12.0.
a number
Write a entry into the header of the Bag file that specifies the counter contains FIELD_TYPE values. When this switch is not specified, the counter type of the Bag is set to custom. Although the supported FIELD_TYPEs are the same as those for the key, the value is always treated as a number that can be summed. rwbagbuild does not use the country code or prefix map when parsing the value field.
When the key-type is one of sip-pmap, dip-pmap, any-ip-pmap, sport-pmap, dport-pmap, or any-port-pmap, use the prefix map file located at PATH to map the key to a string. Specify PATH as - or stdin to read from the standard input. A map-name may be included in the argument to the switch, but rwbagbuild currently does not use the map-name. To create a prefix map file, use rwpmapbuild(1). Since SiLK 3.12.0.
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.
Do not record the command used to create the Bag file in the output. When this switch is not given, the invocation is written to the file’s header, and the invocation may be viewed with rwfileinfo(1). Since SiLK 3.12.0.
Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.
Do not compress the output using an external library.
Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.
Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.
Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.
Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.
Write the binary Bag output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwbagtool exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwbagtool to exit with an error.
Print the available options and exit.
Print the version number and information about how SiLK was configured, then exit the application.
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.
Assume the file mybag.txt contains the following lines, where each line contains an IP address, a comma as a delimiter, a count, and ends with a newline.
192.168.0.1,5 192.168.0.2,500 192.168.0.3,3 192.168.0.4,14 192.168.0.5,5
To build a bag with it:
$ rwbagbuild --bag-input=mybag.txt --delimiter=, > mybag.bag
Use rwbagcat(1) to view its contents:
$ rwbagcat mybag.bag 192.168.0.1| 5| 192.168.0.2| 500| 192.168.0.3| 3| 192.168.0.4| 14| 192.168.0.5| 5|
To create a Bag of protocol data from the text file myproto.txt:
1| 4| 6| 138| 17| 131|
use
$ rwbagbuild --key-type=proto --bag-input=myproto.txt > myproto.bag $ rwbagcat myproto.bag 1| 4| 6| 138| 17| 131|
When the --key-type switch is specified, rwbagcat knows the keys should be printed as integers, and rwfileinfo(1) shows the type of the key:
$ rwfileinfo --fields=bag myproto.bag myproto.bag: bag key: protocol @ 4 octets; counter: custom @ 8 octets
Without the --key-type switch, rwbagbuild assumes the integers in myproto.txt represent IP addresses:
$ rwbagbuild --bag-input=myproto.txt | rwbagcat 0.0.0.1| 4| 0.0.0.6| 138| 0.0.0.17| 131|
Although the --key-format switch on rwbagcat may be used to choose how the keys are displayed, it is generally better to use the --key-type switch when creating the bag.
$ rwbagbuild --bag-input=myproto.txt | rwbagcat --key-format=decimal 1| 4| 6| 138| 17| 131|
To ignore the counts that exist in myproto.txt and set the counts for each protocol to 1, use the --default-count switch which overrides the existing value:
$ rwbagbuild --key-type=protocol --bag-input=myproto.txt \ --default-count=1 --output-path=myproto1.bag $ rwbagcat myproto1.bag 1| 1| 6| 1| 17| 1|
To create a bag from multiple text files (X.txt, Y.txt, and Z.txt), use the UNIX cat(1) utility to concatenate the files and have rwbagbuild read the combined input. To avoid creating a temporary file, feed the output of cat as the standard input to rwbagbuild.
$ cat X.txt Y.txt Z.txt \ | rwbagbuild --bag-input=- --output-path=xyz.bag
For each key that appears in multiple input files, rwbagbuild sums the counters for the key.
Given the IP set myset.set, create a bag where every entry in the bag has a count of 3:
$ rwbagbuild --set-input=myset.set --default-count=3 \ --out=mybag2.bag
Suppose we have three IPset files, A.set, B.set, and C.set:
$ rwsetcat A.set 10.0.0.1 10.0.0.2 $ rwsetcat B.set 10.0.0.2 10.0.0.3 $ rwsetcat C.set 10.0.0.1 10.0.0.2 10.0.0.4
We want to create a bag file from these IPset files where the count for each IP address is the number of files that IP appears in. rwbagbuild accepts a single file as an argument, so we cannot do the following:
$ rwbagbuild --set-input=A.set --set-input=B.set ... # WRONG!
(Even if we could repeat the --set-input switch, specifying it multiple times would be annoying if we had 300 files instead of only 3.)
Since IPset files are (mathematical) sets, joining them together first with rwsettool(1) and then running rwbagbuild causes each IP address to get a count of 1:
$ rwsettool --union A.set B.set C.set \ | rwbagbuild --set-input=- \ | rwbagcat 10.0.0.1| 1| 10.0.0.2| 1| 10.0.0.3| 1| 10.0.0.4| 1|
When rwbagbuild is processing textual input, it sums the counters for keys that appear in the input multiple times. We can use rwsetcat(1) to convert each IPset file to text and feed that as single textual stream to rwbagbuild. Use the --cidr-blocks switch on rwsetcat to reduce the amount of input that rwbagbuild must process. This is probably the best approach to the problem:
$ rwsetcat --cidr-block *.set | rwbagbuild --bag-input=- > total1.bag $ rwbagcat total1.bag 10.0.0.1| 2| 10.0.0.2| 3| 10.0.0.3| 1| 10.0.0.4| 1|
A less efficient solution is to convert each IPset to a bag and then use rwbagtool(1) to add the bags together:
$ for i in *.set ; do rwbagbuild --set-input=$i --output-path=/tmp/$i.bag ; done $ rwbagtool --add /tmp/*.set.bag > total2.bag $ rm /tmp/*.set.bag
There is no need to create a bag file for each IPset; we can get by with only two bag files, the final bag file, total3.bag, and a temporary file, tmp.bag. We initialize total3.bag to an empty bag. As we loop over each IPset, rwbagbuild converts the IPset to a bag on its standard output, rwbagtool creates tmp.bag by adding its standard input to total3.bag, and we rename tmp.bag to total3.bag:
$ rwbagbuild --bag-input=/dev/null --output-path=total3.bag $ for i in *.set ; do rwbagbuild --set-input=$i \ | rwbagtool --output-path=tmp.bag --add total3.bag stdin ; /bin/mv tmp.bag total3.bag ; done $ rwbagcat total3.bag 10.0.0.1| 2| 10.0.0.2| 3| 10.0.0.3| 1| 10.0.0.4| 1|
As of SiLK 3.12.0, a Bag file may contain a country code as its key. In rwbagbuild, specify the --key-type as sip-country, dip-country, or any-country. That key-type works with either textual input or IPset input. The form of the textual input when mapping an IP address to a country code is identical to that when building an ordinary bag.
$ rwbagbuild --bag-input=mybag.txt --delimiter=, \ --key-type=any-country --output-path=scc1.bag $ rwbagcat scc1.bag --| 527|
$ rwbagbuild --set-input=A.set --key-type=any-country \ --output-path=scc2.bag $ rwbagcat scc2.bag --| 2|
rwbagbuild and rwbag(1) can use a prefix map file as the key in a Bag file as of SiLK 3.12.0. Use the --pmap-file switch to specify the prefix map file, and specify the --key-type using one of the types that end in -pmap.
For a prefix map that maps by IP addresses, use a key-type of sip-pmap, dip-pmap, or any-ip-pmap. The input may be an IPset or text. The form of the textual input is the same as for a normal bag file.
$ rwbagbuild --set-input=A.set --key-type=sip-pmap \ --pmap-file=ip-map.pmap --output=test1.bag
$ rwbagbuild --bag-input=mybag.txt --delimiter=, \ --key-type=sip-pmap --pmap-file=ip-map.pmap \ --output-path=test2.bag
The prefix map file is not stored as part of the Bag, so you must provide the name of the prefix map when running rwbagcat(1).
$ rwbagcat --pmap-file=ip-map.pmap test2.bag internal| 527|
For a prefix map file that maps by protocol-port pairs, the textual input must contain either three column (protocol, port, counter) or two columns (protocol and port) which uses the --default-counter.
$ cat proto-port-count.txt 6| 25| 800| 6| 80| 5642| 6| 22 $ rwbagbuild --key-type=sport-pmap \ --bag-input=proto-port-count.txt \ --pmap-file=proto-port-map.pmap \ --output-path=service.bag $ rwbagcat --pmap-file=port-map.pmap service.bag TCP/SSH| 1| TCP/SMTP| 800| TCP/HTTP| 5642|
A single value followed by an optional delimiter is treated as a key. The counter for those keys is set to 1. A delimiter may follow the count, and any text after that delimiter is ignored. When the counter is 0, the key is not inserted into the Bag.
$ cat sport.txt 0 1| 2|3 4|5| 6|7|8| 9|10||||| 11|0 $ rwbagbuild --bag-input=sport.txt --key-type=sport \ | rwbagcat 0| 1| 1| 1| 2| 3| 4| 5| 6| 7| 9| 10|
The --default-counter switch overrides the count.
$ rwbagbuild --bag-input=sport.txt --key-type=sport --default-count=1 \ | rwbagcat 0| 1| 1| 1| 2| 1| 4| 1| 6| 1| 9| 1| 11| 1|
In fact, the --default-counter switch causes rwbagbuild to ignore all text after the delimiter that follows the key.
$ echo ’12|13 14’ | rwbagbuild --bag-input=- --output=/dev/null rwbagbuild: Error parsing line 1: Extra text after count rwbagbuild: Error creating bag from text bag
$ echo ’12|13 14’ | rwbagbuild --bag-input=- --default-count=1 \ | rwbagcat --key-format=decimal 12| 1|
This environment variable allows the user to specify the country code mapping file that rwbagbuild uses when mapping an IP to a country for the sip-country, dip-country, or any-country keys. The value may be a complete path or a file relative to the SILK_PATH. See the FILES section for standard locations of this file.
The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.
This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.
This environment variable gives the root of the install tree. When searching for the country code mapping file, rwbagbuild may use this environment variable. See the FILES section for details.
Possible locations for the country code mapping file required by the sip-country, dip-country, and any-country key-types.
rwbag(1), rwbagcat(1), rwbagtool(1), rwfileinfo(1), rwpmapbuild(1), rwset(1), rwsetbuild(1), rwsetcat(1), rwsettool(1), ccfilter(3), silk(7), zlib(3), cat(1)
rwbagbuild should verify the key’s value is within the allowed range for the specified --key-type.
rwbagbuild should accept non-numeric values for some fields, such as times and TCP flags.
The --default-count switch is poorly named.
Output a binary Bag file as text
rwbagcat [ --network-structure[=STRUCTURE] | --bin-ips[=SCALE] | --sort-counters[=ORDER]] [--print-statistics[=OUTFILE]] [--minkey=VALUE] [--maxkey=VALUE] [--mask-set=PATH] [--mincounter=VALUE] [--maxcounter=VALUE] [--zero-counts] [{ --pmap-file=PATH | --pmap-file=MAPNAME:PATH }] [--key-format=FORMAT] [--integer-keys] [--zero-pad-ips] [--no-columns] [--column-separator=C] [--no-final-delimiter] [{--delimited | --delimited=C}] [--output-path=PATH] [--pager=PAGER_PROG] [--site-config-file=FILENAME] [BAGFILE [BAGFILE...]]
rwbagcat --help
rwbagcat --version
rwbagcat reads a binary Bag as created by rwbag(1) or rwbagbuild(1), converts it to text, and writes it to the standard output, to the pager, or to the specified output file. It can also print various statistics and summary information about the Bag.
As of SiLK 3.12.0, rwbagcat uses information in the Bag file’s header to determine how to display the key column.
A key that is an IP address is printed in the canonical format. Specifically, IPs are printed in the IPv4 canonical format if the Bag contains only IPv4 addresses; otherwise, in the IPv6 canonical format (with IPv4 mapped into the ::ffff:0:0/96 netblock). May be modified by --key-format.
A key that is a time is printed as a human-readable timestamp. May be modified by --key-format.
A sensor key prints the name of the sensor. The decimal and hexadecimal arguments to --key-format may be used.
A key holding TCP Flags is printed using the characters F,S,R,P,A,U,E,C. The decimal and hexadecimal arguments to --key-format may be used.
A key holding SiLK attributes is printed using the characters T,C,F,S. The decimal and hexadecimal arguments to --key-format may be used.
A country code key uses the abbreviations defined by ISO 3166-1 (see for example https://www.iso.org/iso-3166-country-codes.html or https://en.wikipedia.org/wiki/ISO\3166-1\alpha-2) or the following special codes: -- N/A (e.g. private and experimental reserved addresses); a1 anonymous proxy; a2 satellite provider; o1 other.
A key holding a value from prefix map requires that the --pmap-file switch be specified to display the value.
In addition, rwbagcat exits with an error when asked to use an IP format to display keys that are not IP addresses.
rwbagcat reads the BAGFILEs specified on the command line; if no BAGFILE arguments are given, rwbagcat attempts to read the Bag from the standard input. BAGFILE may be the keyword stdin or a hyphen (-) to allow rwbagcat to print data from both files and piped input. If any input does not contain a Bag, rwbagcat prints an error to the standard error and exits abnormally.
When multiple BAGFILEs are specified on the command line, each is handled individually. To process the files as a single Bag, use rwbagtool(1) to combine the bags and pipe the output of rwbagtool into rwbagcat.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
For each numeric value in STRUCTURE, group the IPs in the Bag into a netblock of that size and print the number of hosts, the sum of the counters, and, optionally, print the number of smaller, occupied netblocks that each larger netblock contains. When STRUCTURE begins with v6:, the IPs in the Bag are treated as IPv6 addresses, and any IPv4 addresses are mapped into the ::ffff:0:0/96 netblock. Otherwise, the IPs are treated as IPv4 addresses, and any IPv6 address outside the ::ffff:0:0/96 netblock is ignored. Aside from the initial v6: (or v4:, for consistency), STRUCTURE has one of following forms:
NETBLOCK_LIST/SUMMARY_LIST. Group IPs into the sizes specified in either NETBLOCK_LIST or SUMMARY_LIST. rwbagcat prints a row for each occupied netblock specified in NETBLOCK_LIST, where the row lists the base IP of the netblock, the sum of the counters for that netblock, the number of hosts, and the number of smaller, occupied netblocks having a size that appears in either NETBLOCK_LIST or SUMMARY_LIST. (The values in SUMMARY_LIST are only summarized; they are not printed.)
NETBLOCK_LIST/. Similar to the first form, except all occupied netblocks are printed, and there are no netblocks that are only summarized.
NETBLOCK_LISTS. When the character S appears anywhere in the NETBLOCK_LIST, rwbagcat provides a default value for the SUMMARY_LIST. That default is 8,16,24,27 for IPv4, and 48,64 for IPv6.
NETBLOCK_LIST. When neither S nor / appear in STRUCTURE, the output does not include the number of smaller, occupied netblocks.
Empty. When STRUCTURE is empty or only contains v6: or v4:, the NETBLOCK_LIST prints a single row for the total network (the /0 netblock) giving the number of hosts, the sum of the counters, and the number of smaller, occupied netblocks using the same default list specified in form 3.
NETBLOCK_LIST and SUMMARY_LIST contain a comma separated list of numbers between 0 (the total network) and the size for an individual host (32 for IPv4 or 128 for IPv6). The characters T and H may be used as aliases for 0 and the host netblock, respectively. In addition, when parsing the lists as IPv4 netblocks, the characters A, B, C, and X are supported as aliases for 8, 16, 24, and 27, respectively. A comma is not required between adjacent letters. The --network-structure switch disables printing of the IPs in the Bag file; specify the H argument to the switch to print each individual IP address and its counter.
The --network-structure switch may not be combined with the --bin-ips or --sort-counters switches. As of SiLK 3.12.0, rwbagcat exits with an error if the --network-structure switch is used on a Bag file whose key-type is neither custom nor an IP address type.
Invert the bag and count the total number of unique keys for a given value of the volume bin. For example, turn a Bag {sip:flow} into {flow:count(sip)}. SCALE is a string containing the value linear, binary, or decimal.
The default behavior is linear: Each distinct counter gets its own bin. Any counter in the input Bag file that is larger than the maximum possible key will be attributed to the maximum key; to prevent this, specify --maxcounter=4294967295 which discards bins whose counter value does not fit into a key.
binary creates a bag of {log2(flow):count(sip)}. Bin n contains counts in the range [ 2^n, 2^(n+1) ).
decimal creates one hundred bins for each counter in the range [1,100), and one hundred bins for each counter in the range [100,1000), each counter in the range [1000,10000), etc. Counters are logarithmically distributed among the bins.
The --bin-ips switch may not be combined with the --network-structure or --sort-counters switches. See also the --invert switch on rwbagtool(1) which inverts a bag using a linear scale and creates a new binary bag file.
Sort the output so the counters are presented in either decreasing or increasing order. Typically the output is sorted by the keys. If the ORDER argument is not given to the switch, the counters are printed in decreasing order. Valid values for ORDER are
Print the maximum counter first. This is the default.
Print the minimum counter first.
When two counters have the same value, the smaller key is displayed first. The --sort-counters switch may not be combined with the --network-structure or --bin-ips switches. Since SiLK 3.12.2.
Print a breakdown of the network hosts seen, and print general statistics about the keys and counters. When --print-statistics is specified, no other output is produced unless one of --sort-counters, --network-structure, or --bin-ips is also specified. When the OUTFILE argument is not given, the statistics are written to the standard output or to the pager if output is to a terminal. OUTFILE is a filename, named pipe, the keyword stderr to write to the standard error, or the keyword stdout or - to write to the standard output. If OUTFILE names an existing file, rwbagcat exits with an error unless the SILK_CLOBBER environment variable is set, in which case OUTFILE is overwritten. The output statistics produced by this switch are:
count of unique keys
sum of all the counters
minimum key
maximum key
minimum counter
maximum counter
mean of counters
variance of counters
standard deviation of counters
skew of counters
kurtosis of counters
count of nodes allocated
total bytes allocated for nodes
count of leaves allocated
total bytes allocated for leaves
density of the data
Output records whose key value is at least VALUE. VALUE may be an IP address or an integer in the range 0 to 4294967295 inclusive. The default is to print all records with a non-zero counter.
Output records whose key value is not more than VALUE. VALUE may be an IP address or an integer in the range 0 to 4294967295 inclusive. The default is to print all records with a non-zero counter.
Output records whose key appears in the binary IPset read from the file PATH. (To build an IPset, use rwset(1) or rwsetbuild(1).) When used with --minkey and/or --maxkey, output records whose key is in the IPset and is also within when the specified range. As of SiLK 3.12.0, rwbagcat exits with an error if the --mask-set switch is used on a Bag file whose key-type is neither custom nor an IP address type.
Output records whose counter value is at least VALUE. VALUE is an integer in the range 1 to 18446744073709551615. The default is to print all records with a non-zero counter; use --zero-counts to show records whose counter is 0.
Output records whose counter value is not more than VALUE. VALUE is an integer in the range 1 to 18446744073709551615, with the default being the maximum counter value.
Print keys whose counter is zero. Normally, keys with a counter of zero are suppressed since all keys have a default counter of zero. In order to use this flag, either --mask-set or both --minkey and --maxkey must be specified. When this switch is specified, any counter limit explicitly set by the --maxcounter switch is also applied.
Use the prefix map file located at PATH to map the key to a string when the type of the Bag’s key is one of sip-pmap, dip-pmap, any-ip-pmap, sport-pmap, dport-pmap, or any-port-pmap. This switch is required for Bag files whose key was derived from a prefix map file. The type of the prefix map file must match the key’s type, but a different prefix map file may be used. Specify PATH as - or stdin to read from the standard input. A map-name may be included in the argument to the switch, but rwbagcat currently does not use the map-name. To create a prefix map file, use rwpmapbuild(1). Since SiLK 3.12.0.
Specify the format to use when printing a key, where FORMAT is a comma-separated list of the arguments described below. When this switch is not specified, rwbagcat uses the key’s type to determine how to format the key, and a key whose type is unknown or custom is assumed to be an IP address. rwbagcat exits with an error if the specified format is incompatible with the key’s type (for example, attempting to format a timestamp as an IP address).
Print keys as integers in decimal format. For example, print 192.0.2.1 and 2001:db8::1 as 3221225985 and 42540766411282592856903984951653826561, respectively. May be combined with zero-padded and either map-v4 or unmap-v6. rwbagcat exits with an error when this format is used on a Bag file whose key-type is a timestamp.
Print keys as integers in hexadecimal format. For example, print 192.0.2.1 and 2001:db8::1 as c00000201 and 20010db8000000000000000000000001, respectively. May be combined with zero-padded and either map-v4 or unmap-v6. rwbagcat exits with an error when this format is used on a Bag file whose key-type is a timestamp. Note: This setting does not apply to CIDR prefix values which are printed as decimal.
Print keys as IP addresses in the canonical format. If the key is an IPv4 address, use dotted decimal (192.0.2.1). If the key is an IPv6 address, use colon-separated hexadecimal (2001:db8::1) or a mixed IPv4-IPv6 representation for IPv4-mapped IPv6 addresses (the ::ffff:0:0/96 netblock, e.g., ::ffff:192.0.2.1) and IPv4-compatible IPv6 addresses (the ::/96 netblock other than ::/127, e.g., ::192.0.2.1). May be combined with zero-padded and either map-v4 or unmap-v6. As of SiLK 3.12.0, rwbagcat exits with an error when this format is used on a Bag file whose key-type is neither custom nor an IP address type.
Print keys as IP addresses in the canonical format (192.0.2.1 or 2001:db8::1) but do not used the mixed IPv4-IPv6 representations. For example, use ::ffff:c000:201 instead of ::ffff:192.0.2.1. May be combined with zero-padded and either map-v4 or unmap-v6. rwbagcat exits with an error when this format is used on a Bag file whose key-type is neither custom nor an IP address type. Since SiLK 3.17.0.
When the Bag’s key is an IPv4 address, change all IPv4 addresses to IPv4-mapped IPv6 addresses (addresses in the ::ffff:0:0/96 netblock) prior to formatting. May be combined with one of the above settings. rwbagcat exits with an error when this format is used on a Bag file whose key-type is neither custom nor an IP address type. Since SiLK 3.17.0.
When the Bag’s key is an IPv6 address, change any IPv4-mapped IPv6 addresses (addresses in the ::ffff:0:0/96 netblock) to IPv4 addresses prior to formatting. May be combined with any one of the above settings except map-v4. rwbagcat exits with an error when this format is used on a Bag file whose key-type is neither custom nor an IP address type. Since SiLK 3.17.0.
Make all formatted key strings contain the same number of characters by padding numbers with leading zeros. For example, print 192.0.2.1 and 2001:db8::1 as 192.000.002.001 and 2001:0db8:0000:0000:0000:0000:0000:0001, respectively. For IPv6 addresses, this setting implies no-mixed, so that ::ffff:192.0.2.1 is printed as 0000:0000:0000:0000:0000:ffff:c000:0201. As of SiLK 3.17.0, may be combined with any of the above, including decimal and hexadecimal. As of SiLK 3.18.0, the values of CIDR prefix are also zero-padded. rwbagcat exits with an error when this format is used on a Bag file whose key-type is a timestamp.
Print keys using the format map-v4,no-mixed. May be combined with zero-padded. As of SiLK 3.12.0, rwbagcat exits with an error when this format is used on a Bag file whose key-type is neither custom nor an IP address type.
Print keys as time in standard SiLK format: yyyy/mm/ddThh:mm:ss. May be combined with utc or localtime. May only be used on keys whose type is custom or a time value. Since SiLK 3.12.0.
Print keys as time in the ISO time format yyyy-mm-dd hh:mm:ss. May be combined with utc or localtime. May only be used on keys whose type is custom or a time value. Since SiLK 3.12.0.
Print keys as time in the format mm/dd/yyyy hh:mm:ss. May be combined with utc or localtime. May only be used on keys whose type is custom or a time value. Since SiLK 3.12.0.
Print the keys as time in UTC. If no other time-related key-format is provided, formats the time using the timestamp format. May only be used on keys whose type is custom or a time value. Since SiLK 3.12.0.
Print as the keys as time and get the timezone from either the TZ environment variable or local machine. If no other time-related key-format is provided, formats the time using the timestamp format. May only be used on keys whose type is custom or a time value. Since SiLK 3.12.0.
Print keys as seconds since UNIX epoch. May only be used on keys whose type is custom or a time value. Since SiLK 3.12.0.
This switch is equivalent to --key-format=decimal, it is deprecated as of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.
This switch is equivalent to --key-format=zero-padded, it is deprecated as of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.
Disable fixed-width columnar output.
Use specified character between columns and after the final column. When this switch is not specified, the default of ’|’ is used.
Do not print the column separator after the final column. Normally a delimiter is printed. When the network summary is requested (--network-structure=S), the separator is always printed before the summary column and never after that column.
Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default ’|’.
Write the textual output of the --network-structure, --bin-ips, or --sort-counters switch to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output (and bypass the paging program). If PATH names an existing file, rwbagcat exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this option is not given, the output is either sent to the pager or written to the standard output.
When output is to a terminal, invoke the program PAGER_PROG to view the output one screen full at a time. This switch overrides the SILK_PAGER environment variable, which in turn overrides the PAGER variable. If the --output-path switch is given or if the value of the pager is determined to be the empty string, no paging is performed and all output is written to the terminal.
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwbagcat searches for the site configuration file in the locations specified in the FILES section. Since SiLK 3.15.0.
Print the available options and exit.
Print the version number and information about how SiLK was configured, then exit the application.
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line.
To print the contents of the bag file mybag.bag:
$ rwbagcat mybag.bag 172.23.1.1| 5| 172.23.1.2| 231| 172.23.1.3| 9| 172.23.1.4| 19| 192.168.0.100| 1| 192.168.0.101| 1| 192.168.0.160| 15| 192.168.20.161| 1| 192.168.20.162| 5| 192.168.20.163| 5|
To print the bag with a full network breakdown:
$ rwbagcat --network-structure=TABCHX mybag.bag 172.23.1.1 | 5| 172.23.1.2 | 231| 172.23.1.3 | 9| 172.23.1.4 | 19| 172.23.1.0/27 | 264| 172.23.1.0/24 | 264| 172.23.0.0/16 | 264| 172.0.0.0/8 | 264| 192.168.0.100 | 1| 192.168.0.101 | 1| 192.168.0.96/27 | 2| 192.168.0.160 | 15| 192.168.0.160/27 | 15| 192.168.0.0/24 | 17| 192.168.20.161 | 1| 192.168.20.162 | 5| 192.168.20.163 | 5| 192.168.20.160/27 | 11| 192.168.20.0/24 | 11| 192.168.0.0/16 | 28| 192.0.0.0/8 | 28| TOTAL | 292|
In the above, lines that include a CIDR prefix display the sum of the preceding hosts. For example, there are 264 hosts in the 172.23.1.0/27 net-block.
To show an abbreviated network structure by class A and C only, including summary information:
$ rwbagcat --network-structure=ACS mybag.bag 172.23.1.0/24 | 264| 4 hosts in 1 /27 172.0.0.0/8 | 264| 4 hosts in 1 /16, 1 /24, and 1 /27 192.168.0.0/24 | 17| 3 hosts in 2 /27s 192.168.20.0/24 | 11| 3 hosts in 1 /27 192.0.0.0/8 | 28| 6 hosts in 1 /16, 2 /24s, and 3 /27s
Suppose a key-type of a bag file is duration:
$ rwfileinfo --field=bag Bag2.bag Bag2.bag: bag key: duration @ 4 octets; counter: custom @ 8 octets
rwbagcat complains when the --key-format switch lists a format that it thinks is ”nonsensical” for that type of key.
$ rwbagcat --key-format=utc Bag2.bag rwbagcat: Invalid key-format ’utc’: Nonsensical for Bag containing duration keys
$ rwbagcat --key-format=canonical Bag2.bag rwbagcat: Invalid key-format ’canonical’: Nonsensical for Bag containing duration keys
To use the --key-format one time and leave the key-type in the Bag file unchanged, you may merge the bag with an empty bag file: Use rwbagbuild(1) to create an empty bag that uses the custom key type, add the empty bag to Bag2.bag using rwbagtool(1), then display the result:
$ rwbagbuild --bag-input=/dev/null \ | rwbagtool --add Bag2.bag stdin \ | rwbagcat --key-format=utc 1970/01/01T00:00:01| 1| 1970/01/01T00:00:04| 2| 1970/01/01T00:00:07| 32| 1970/01/01T00:00:08| 2|
$ rwbagbuild --bag-input=/dev/null \ | rwbagtool --add Bag2.bag - \ | rwbagcat --key-format=canonical 0.0.0.1| 1| 0.0.0.4| 2| 0.0.0.7| 32| 0.0.0.8| 2|
To rewrite the bag file with a different key type, print the bag file as text and use rwbagbuild to build a new bag file:
$ rwbagcat Bag2.bag \ | rwbagbuild --bag-input=- --key-type=sipv4
Inverting a bag means counting the number of times each counter appears in the bag.
To bin the number of IP addresses that had each flow count:
$ rwbagcat --bin-ips mybag.bag 1| 3| 5| 3| 9| 1| 15| 1| 19| 1| 231| 1|
The output shows that the bag contains 3 source hosts that had a single flow, 3 hosts that had 5 flows, and four hosts that each had a unique flow count (9, 15, 19, and 231).
For a log2 breakdown of the counts:
$ rwbagcat --bin-ips=binary mybag.bag 2^0 to 2^1-1| 3| 2^2 to 2^3-1| 3| 2^3 to 2^4-1| 2| 2^4 to 2^5-1| 1| 2^7 to 2^8-1| 1|
rwbagcat normally presents the data in order of increasing key value. To sort based on the counter value, specify the --sort-counter switch. When sorting by the counter value, the default order is from maximum counter to minimum counter.
$ rwbagcat --sort-counter mybag.bag 172.23.1.2| 231| 172.23.1.4| 19| 192.168.0.160| 15| 172.23.1.3| 9| 172.23.1.1| 5| 192.168.20.162| 5| 192.168.20.163| 5| 192.168.0.100| 1| 192.168.0.101| 1| 192.168.20.161| 1|
To change the sort order, specify the increasing argument to the --sort-counter switch:
$ rwbagcat --sort-counter=increasing mybag.bag 192.168.0.100| 1| 192.168.0.101| 1| 192.168.20.161| 1| 172.23.1.1| 5| 192.168.20.162| 5| 192.168.20.163| 5| 172.23.1.3| 9| 192.168.0.160| 15| 172.23.1.4| 19| 172.23.1.2| 231|
For keys have the same counter value, the order of the keys is consistent (always from low to high) regardless how the counters are sorted. The following output is limited to those keys whose value is 5. The output is first shown without the --sort-counter switch, then with the data sorted by increasing and decreasing counter value.
$ rwbagcat --delim=, mybag.bag | grep ,5 172.23.1.1,5 192.168.20.162,5 192.168.20.163,5
$ rwbagcat --delim=, --sort-counter=increasing mybag.bag | grep ,5 172.23.1.1,5 192.168.20.162,5 192.168.20.163,5
$ rwbagcat --delim=, --sort-counter=decreasing mybag.bag | grep ,5 172.23.1.1,5 192.168.20.162,5 192.168.20.163,5
rwbag(1) and rwbagbuild(1) can use a prefix map file as the key in a bag file as of SiLK 3.12.0. When attempting to display these Bag files, you must specify the --pmap-file switch on the rwbagcat command line for it to map each prefix map value to its label. If the --pmap-file is not given, rwbagcat displays an error.
$ rwbagcat service.bag rwbagcat: The --pmap-file switch is required for \ Bags containing sport-pmap keys
In addition, the type of the prefix map file must match the key-type in the bag file: a prefix map type of IPv4-address or IPv6-address when the key was mapped from an IP address, and a prefix map type of proto-port when the key was mapped from a protocol-port pair. The type of key in a bag may be determined by rwfileinfo(1).
$ rwfileinfo --fields=bag service.bag service.bag: bag key: sport-pmap @ 4 octets; counter: custom @ 8 octets
$ rwbagcat --pmap-file=ip-map.pmap service.bag rwbagcat: Cannot use IPv4-address prefix map for \ Bag containing sport-pmap keys
$ rwbagcat --pmap-file=port-map.pmap service.bag TCP/SSH| 1| TCP/SMTP| 800| TCP/HTTP| 5642|
The only check rwbagcat makes is whether the prefix map file is the correct type. A different prefix map file may be used. If a value in the bag file does not have an index in the prefix map file, the numeric index of the label is displayed as shown in the following example which creates a prefix map with a single label.
$ echo ’label 1 none’ \ | rwpmapbuild --mode=proto-port --input-path=- \ --output-path=tmp.pmap $ rwbagcat --pmap-file=tmp.pmap service.bag 7| 1| 8| 800| 9| 5642|
$ rwbagcat --print-statistics mybag.bag
Statistics number of keys: 10 sum of counters: 292 minimum key: 172.23.1.1 maximum key: 192.168.20.163 minimum counter: 1 maximum counter: 231 mean: 29.2 variance: 5064 standard deviation: 71.16 skew: 2.246 kurtosis: 8.1 nodes allocated: 0 (0 bytes) counter density: inf%
The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.
When set to a non-empty string, rwbagcat automatically invokes this program to display its output a screen at a time. If set to an empty string, rwbagcat does not automatically page its output.
When set and SILK_PAGER is not set, rwbagcat automatically invokes this program to display its output a screen at a time.
This environment variable is used as the value for the --site-config-file when that switch is not provided.
This environment variable specifies the root directory of data repository. As described in the FILES section, rwbagcat may use this environment variable when searching for the SiLK site configuration file.
This environment variable gives the root of the install tree. When searching for configuration files, rwbagcat may use this environment variable. See the FILES section for details.
When the argument to the --key-format switch includes localtime or when a SiLK installation is built to use the local timezone, the value of the TZ environment variable determines the timezone in which rwbagcat displays timestamps. (If both of those are false, the TZ environment variable is ignored.) If the TZ environment variable is not set, the machine’s default timezone is used. Setting TZ to the empty string or 0 causes timestamps to be displayed in UTC. For system information on the TZ variable, see tzset(3) or environ(7). (To determine if SiLK was built with support for the local timezone, check the Timezone support value in the output of rwbagcat --version.)
Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.
rwbag(1), rwbagbuild(1), rwbagtool(1), rwpmapbuild(1), rwfileinfo(1), rwset(1), rwsetbuild(1), silk(7), tzset(3), environ(7)
Perform high-level operations on binary Bag files
rwbagtool { --add | --subtract | --minimize | --maximize | --divide | --scalar-multiply=VALUE | --compare={lt | le | eq | ge | gt} } [--intersect=SETFILE | --complement-intersect=SETFILE] [--mincounter=VALUE] [--maxcounter=VALUE] [--minkey=VALUE] [--maxkey=VALUE] [--invert] [--coverset] [--ipset-record-version=VERSION] [--output-path=PATH [--modify-inplace [--backup-path=BACKUP]]] [--note-strip] [--note-add=TEXT] [--note-file-add=FILE] [--compression-method=COMP_METHOD] [BAGFILE[ BAGFILE...]]
rwbagtool --help
rwbagtool --version
rwbagtool performs various operations on binary Bag files (key-counter associations) and creates a new Bag file or an IPset file. rwbagtool can add Bags together, subtract a subset of data from a Bag, divide a Bag by another, compare the counters of two Bag files, perform key intersection of a Bag with an IPset, extract the keys of a Bag as an IPset, or filter Bag entries based on their key or counter values.
rwbagtool reads Bags from the files and named pipes specified on the command line. If no file names are given on the command line, rwbagtool attempts to read a Bag from the standard input. The names stdin or - may be used to force rwbagtool to read from the standard input. The resulting Bag or IPset is written to the location specified by the --output-path switch or to the standard output if that switch is not provided. If a BAGFILE does not contain a Bag or an attempt is made to read binary input or write binary output to the terminal,, rwbagtool prints an error to the standard error and exits abnormally.
In SiLK 3.21.0, rwbagtool added the --modify-inplace switch which correctly handles the case when an input file is also used as the output file. That switch causes rwbagtool to write the output to a temporary file first and then replace the original output file. The --backup-path switch may be used in conjunction with --modify-inplace to set the pathname where the original output file is copied.
A Bag is a set where each key is associated with a counter. rwbag(1) and rwbagbuild(1) are the primary tools used to create a Bag file. rwbagcat(1) prints a binary Bag file as text.
SiLK 3.15.0 introduced Aggregate Bags that are capable of storing multiple keys and counters. See rwaggbag(1), rwaggbagbuild(1), rwaggbagcat(1), and rwaggbagtool(1) for more information.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
The first set of options are mutually exclusive; only one may be specified. If none are specified, the counters in the Bag files are summed.
Sum the counters for each key for all Bag files given on the command line. At least one Bag file must be specified, and any number of additional Bag files may be given. If a key is not present in an input file, a counter of zero is used. The result contains the union of the keys from the input Bag files. When no operation switch is specified on the command line, the add operation is the default. If addition causes a counter to exceed the maximum value, rwbagtool exits with an error.
Subtract from the first Bag file all subsequent Bag files. At least one Bag file must be specified, and any number of additional Bag files may be given. If a key does not appear in the first Bag file, rwbagtool assumes it has a value of 0. If subtracting a key’s counters results in a non-positive number, the key does appear in the resulting Bag file. The result contains a subset of the keys in the first Bag file.
Cause the output to contain the minimum counter seen for each key. Keys that do not appear in all input Bags do not appear in the output. At least one Bag file must be specified, and any number of additional Bag files may be given.
Cause the output to contain the maximum counter seen for each key. The output contains each key that appears in any input Bag. At least one Bag file must be specified, and any number of additional Bag files may be given.
Divide the first Bag file by the second Bag file. It is an error if only one Bag file or more than two Bag files are given. Every key in the first Bag file must appear in the second file; the second Bag may have keys that do not appear in the first, and those keys do not appear in the output. Since Bags do not support floating point numbers, the result of the division is rounded to the nearest integer (values ending in .5 are rounded up). If the result of the division is less than 0.5, the key does not appear in the output.
Multiply each counter in the Bag file by the scalar VALUE, where VALUE is an integer in the range 1 to 18446744073709551614. This switch requires a single Bag as input. On overflow, the lower 64-bits of the result are used as the counter’s value.
Compare the key/counter pairs in exactly two Bag files. It is an error if only one Bag file or more than two Bag files are specified. The keys in the output Bag are only those for which the comparison denoted by OPERATION is true when comparing the key’s counter in the first Bag with the key’s counter in the second Bag. The counters for all keys in the output have the value 1. Any key that does not appear in both input Bag files does not appear in the result. The possible OPERATION values are the strings:
GetCounter(Bag1, key) < GetCounter(Bag2, key)
GetCounter(Bag1, key) <= GetCounter(Bag2, key)
GetCounter(Bag1, key) == GetCounter(Bag2, key)
GetCounter(Bag1, key) >= GetCounter(Bag2, key)
GetCounter(Bag1, key) > GetCounter(Bag2, key)
The result of the above operation is an intermediate Bag file. The following switches are applied next to remove entries from the intermediate Bag:
Mask the keys in the intermediate Bag using the set in SETFILE. SETFILE is the name of a file or a named pipe containing an IPset, or the name stdin or - to have rwbagtool read the IPset from the standard input. If SETFILE does not contain an IPset, rwbagtool prints an error to stderr and exits abnormally. Only key/counter pairs where the key matches an entry in SETFILE are written to the output. (IPsets are typically created by rwset(1) or rwsetbuild(1).)
As --intersect, but only writes key/counter pairs for keys which do not match an entry in SETFILE.
Cause the output to contain only those entries whose counter value is VALUE or higher. The allowable range is 1 to the maximum counter value (18446744073709551614); the default is 1.
Cause the output to contain only those entries whose counter value is VALUE or lower. The allowable range is 1 to the maximum counter value; the default is the maximum counter value.
Cause the output to contain only those entries whose key value is VALUE or higher. Default is 0 (or 0.0.0.0). Accepts input as an integer or as an IP address in dotted decimal notation.
Cause the output to contain only those entries whose key value is VALUE or higher. Default is 4294967295 (or 255.255.255.255). Accepts input as an integer or as an IP address in dotted decimal notation.
The following switches control the output.
Generate a new Bag whose keys are the counters in the intermediate Bag and whose counter is the number of times the counter was seen. For example, this turns the Bag {sip:flow} into the Bag {flow:count(sip)}. Any counter in the intermediate Bag that is larger than the maximum possible key is attributed to the counter for the maximum key; to prevent this, specify --maxcounter=4294967295 which removes all key-counter pairs whose counters do not fit into a key. (The --bin-ips switch on rwbagcat(1) allows one to invert a Bag file as it is being printed.) If inverting the Bag causes a counter to exceed the maximum value, rwbagtool exits with an error.
Instead of creating a Bag file as the output, write an IPset which contains the keys contained in the intermediate Bag.
Specify the format of the IPset records that are written to the output when the --coverset switch is used. VERSION may be 2, 3, 4, 5 or the special value 0. When the switch is not provided, the SILK_IPSET_RECORD_VERSION environment variable is checked for a version. The default version is 0. Since SiLK 3.11.0.
Use the default version for an IPv4 IPset and an IPv6 IPset. Use the --help switch to see the versions used for your SiLK installation.
Create a file that may hold only IPv4 addresses and is readable by all versions of SiLK.
Create a file that may hold IPv4 or IPv6 addresses and is readable by SiLK 3.0 and later.
Create a file that may hold IPv4 or IPv6 addresses and is readable by SiLK 3.7 and later. These files are more compact that version 3 and often more compact than version 2.
Create a file that may hold only IPv6 addresses and is readable by SiLK 3.14 and later. When this version is specified, IPsets containing only IPv4 addresses are written in version 4. These files are usually more compact that version 4.
Write the resulting Bag or IPset to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwbagtool exits with an error unless the --modify-inplace switch is given or the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If --output-path is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwbagtool to exit with an error.
Allow rwbagtool to overwrite an existing file and properly account for the output file (PATH) also being an input file. When this switch is given, rwbagtool writes the output to a temporary location first, then overwrites PATH. rwbagtool attempts to copy the permission, owner, and group from the original file to the new file. The switch is ignored when PATH does not exist or the output is the standard output or standard error. rwbagtool exits with an error when this switch is given and PATH is not a regular file. If rwbagtool encounters an error or is interrupted prior to closing the temporary file, the temporary file is removed. See also --backup-path. Since SiLK 3.21.0.
Move the file named by --output-path (PATH) to the path BACKUP immediately prior to moving the temporary file created by --modify-inplace over PATH. If BACKUP names a directory, the file is moved into that directory. This switch will overwrite an existing file. If PATH and BACKUP point to the same location, the output is written to PATH and no backup is created. If BACKUP cannot be created, the output is left in the temporary file and rwbagtool exits with a message and an error. rwbagtool exits with an error if this switch is given without --modify-inplace. Since SiLK 3.21.0.
Do not copy the notes (annotations) from the input files to the output file. Normally notes from the input files are copied to the output.
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.
Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.
Do not compress the output using an external library.
Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.
Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.
Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.
Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.
Print the available options and exit.
Print the version number and information about how SiLK was configured, then exit the application.
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.
The examples assume the following contents for the files:
Bag1.bag Bag2.bag Bag3.bag Bag4.bag Mask.set 3| 10| 1| 1| 2| 8| 1| 1| 2 4| 7| 4| 2| 4| 10| 4| 3| 4 6| 14| 7| 32| 6| 14| 6| 4| 6 7| 23| 8| 2| 7| 12| 7| 4| 8 8| 2| 9| 8| 8| 6|
The examples use rwbagcat(1) to print the contents of the Bag files.
Adding Bag files produces a Bag whose keys are the set union of the keys in the input Bags. The counter for each key is the sum of the key’s counters in each input Bag.
$ rwbagtool --add Bag1.bag Bag2.bag > Bag-sum.bag $ rwbagcat --key-format=decimal Bag-sum.bag 1| 1| 3| 10| 4| 9| 6| 14| 7| 55| 8| 4|
$ rwbagtool --add Bag1.bag Bag2.bag Bag3.bag > Bag-sum2.bag $ rwbagcat --key-format=decimal Bag-sum2.bag 1| 1| 2| 8| 3| 10| 4| 19| 6| 28| 7| 67| 8| 4| 9| 8|
The --subtract switch subtracts from the key/counter pairs in the first Bag file the key/counter pairs in all other Bag file arguments. Keys that are not present in the first argument are ignored. If subtraction results in a counter value of zero or less, the key is removed from the result.
$ rwbagtool --subtract Bag1.bag Bag2.bag > Bag-diff.bag $ rwbagcat --key-format=decimal Bag-diff.bag 3| 10| 4| 5| 6| 14|
$ rwbagtool --subtract Bag2.bag Bag1.bag > Bag-diff2.bag $ rwbagcat --key-format=decimal Bag-diff2.bag 1| 1| 7| 9|
The output produced by the --minimize switch contains only the keys that appear in all of input Bags. For each key, the counter is the minimum value for that key in any input Bag.
$ rwbagtool --minimize Bag1.bag Bag2.bag Bag3.bag > Bag-min.bag $ rwbagcat --key-format=decimal Bag-min.bag 4| 2| 7| 12|
The keys of the Bag file produced by --maximize are the same as the keys produced by --add; that is, the union of all keys in the input files. For each key, its counter is the maximum value seen for that key in any single input Bag file.
$ rwbagtool --maximize Bag1.bag Bag2.bag Bag3.bag > Bag-max.bag $ rwbagcat --key-format=decimal Bag-max.bag 1| 1| 2| 8| 3| 10| 4| 10| 6| 14| 7| 32| 8| 2| 9| 8|
The --divide switch requires exactly two Bag files as input. The keys in the first Bag argument must be either the same as or a subset of those in the second argument. The counter for each key in the first Bag file is divided by that key’s counter in the second file. If the result of the division is less than 0.5, the key is not included in the output.
$ rwbagtool --divide Bag2.bag Bag4.bag > Bag-div1.bag $ rwbagcat --key-format=decimal Bag-div1.bag 1| 1| 4| 1| 7| 8|
When the order of the Bag file arguments is reversed an error is reported.
$ rwbagtool --divide Bag4.bag Bag2.bag > Bag-div2.bag rwbagtool: Error dividing bags; key 6 not in divisor bag
To work around this issue, use the --coverset switch to create a copy of Bag4.bag that contains only the keys in Bag2.bag.
$ rwbagtool --coverset Bag2.bag > Bag2-keys.set $ rwbagtool --intersect=Bag2-keys.set Bag4.bag > Bag4-small.bag $ rwbagtool --divide Bag4-small.bag Bag2.bag > Bag-div2.bag $ rwbagcat --key-format=decimal Bag-div2.bag 1| 1| 4| 2| 8| 3|
The following command is the same as the above except the IPset and Bag files are piped between the tools instead of being written to disk:
$ rwbagtool --coverset Bag2.bag \ | rwbagtool --intersect=- Bag4.bag \ | rwbagtool --divide - Bag2.bag \ | rwbagcat --key-format=decimal 1| 1| 4| 2| 8| 3|
The --scalar-multiply switch multiplies each counter in the input Bag by the specified value. Exactly one Bag file argument is required.
$ rwbagtool --scalar-multiply=7 Bag1.bag > Bag-multiply.bag $ rwbagcat --key-format=decimal Bag-multiply.bag 3| 70| 4| 49| 6| 98| 7| 161| 8| 14|
Use two rwbagtool commands if multiple operations are desired.
$ rwbagtool --add Bag1.bag Bag2.bag \ | rwbagtool --scalar-multiply=3 --output-path=Bag12-multi.bag $ rwbagcat --key-format=decimal Bag12-multi.bag 1| 3| 3| 30| 4| 27| 6| 42| 7| 165| 8| 12|
The --compare switch takes an argument that specifies how to compare the counters in two Bag files, and it requires exactly two Bag files as input. For each key that appears in both Bag files, the counter value in the first file is compared to counter value in the second file. If the comparison is true, the key appears in the resulting Bag file with a counter of 1. If the comparison is false, the key is not present in the output file. Keys that appear in only one of the input files are ignored.
The following comparisons operate on Bag1.bag and Bag2.bag which have as common keys 4, 7, and 8.
Find counters in Bag1.bag that are less than those in Bag2.bag:
$ rwbagtool --compare=lt Bag1.bag Bag2.bag > Bag-lt.bag $ rwbagcat --key-format=decimal Bag-lt.bag 7| 1|
Find counters in Bag1.bag that are less than or equal to those in Bag2.bag:
$ rwbagtool --compare=le Bag1.bag Bag2.bag > Bag-le.bag $ rwbagcat --key-format=decimal Bag-le.bag 7| 1| 8| 1|
Find counters in Bag1.bag that are equal to those in Bag2.bag:
$ rwbagtool --compare=eq Bag1.bag Bag2.bag > Bag-eq.bag $ rwbagcat --key-format=decimal Bag-eq.bag 8| 1|
Find counters in Bag1.bag that are greater than or equal to those in Bag2.bag:
$ rwbagtool --compare=ge Bag1.bag Bag2.bag > Bag-ge.bag $ rwbagcat --key-format=decimal Bag-ge.bag 4| 1| 8| 1|
Find counters in Bag1.bag that are greater than those in Bag2.bag:
$ rwbagtool --compare=gt Bag1.bag Bag2.bag > Bag-gt.bag $ rwbagcat --key-format=decimal Bag-gt.bag 4| 1|
A cover set is an IPset file that contains the keys that are present in any of the input Bag files. In other words, it is the union of the keys converted to an IPset. Since an operation switch is not provided in this command, an implicit --add operation is performed on the Bag files prior to creating the cover set. (rwsetcat(1) prints the contents of an IPset file as text.)
$ rwbagtool --coverset Bag1.bag Bag2.bag Bag3.bag > Cover.set $ rwsetcat --key-format=decimal Cover.set 1 2 3 4 6 7 8 9
One use of a cover set is to limit the contents of a Bag file to keys that are present in a second Bag file:
$ rwbagtool --coverset --output-path=Cover.set Bag1.bag $ rwbagtool --intersect=Cover.set Bag2.bag > Bag1-mask-Bag2.bag $ rwbagcat --key-format=decimal Bag1-mask-Bag2.bag 4| 2| 7| 32| 8| 2|
To mask the contents of Bag2.bag by the keys that are not present in Bag1.bag:
$ rwbagtool --complement-intersect=Cover.set Bag2.bag \ > Bag1-notmask-Bag2.bag $ rwbagcat --key-format=decimal Bag1-notmask-Bag2.bag 1| 1|
The output of the --invert switch is a Bag file that counts the number of times each counter is present in the input Bag file.
$ rwbagtool --invert Bag1.bag > Bag-inv1.bag $ rwbagcat --key-format=decimal Bag-inv1.bag 2| 1| 7| 1| 10| 1| 14| 1| 23| 1|
$ rwbagtool --invert Bag2.bag > Bag-inv2.bag $ rwbagcat --key-format=decimal Bag-inv2.bag 1| 1| 2| 2| 32| 1|
$ rwbagtool --invert Bag3.bag > Bag-inv3.bag $ rwbagcat --key-format=decimal Bag-inv3.bag 8| 2| 10| 1| 12| 1| 14| 1|
When multiple Bag files are specified on the command line, the files are added prior to creating the inverted Bag. Even though the counter 2 appears three times in the files Bag1.bag and Bag2.bag, the key 2 is not present in the following since the add operation is performed first.
$ rwbagtool --invert Bag1.bag Bag2.bag \ | rwbagcat --key-format=decimal 1| 1| 4| 1| 9| 1| 10| 1| 14| 1| 55| 1|
The --intersect switch takes an IPset file as an argument and limits the keys of the Bag produced by rwbagtool to only those keys that appear in the IPset file.
$ rwbagtool --intersect=Mask.set Bag1.bag > Bag-mask.bag $ rwbagcat --key-format=decimal Bag-mask.bag 4| 7| 6| 14| 8| 2|
The --complement-intersect switch limits the output to only those keys that do not appear in the IPset file.
$ rwbagtool --complement-intersect=Mask.set Bag1.bag > Bag-mask2.bag $ rwbagcat --key-format=decimal Bag-mask2.bag 3| 10| 7| 23|
See also the next section.
In addition to limiting the result of rwbagtool to keys that appear or do not appear in an IPset file (cf. previous section), numeric limits may be used to restrict the keys or counters that in the resulting Bag file with use of the --minkey, --maxkey, --mincounter, and --maxcounter switches.
$ rwbagtool --add --maxkey=5 Bag1.bag Bag2.bag > Bag-res1.bag $ rwbagcat --key-format=decimal Bag-res1.bag 1| 1| 3| 10| 4| 9|
$ rwbagtool --minkey=3 --maxkey=6 Bag1.bag > Bag-res2.bag $ rwbagcat --key-format=decimal Bag-res2.bag 3| 10| 4| 9| 6| 14|
$ rwbagtool --mincounter=20 Bag1.bag Bag2.bag > Bag-res3.bag $ rwbagcat --key-format=decimal Bag-res3.bag 7| 55|
$ rwbagtool --subtract --maxcounter=9 Bag1.bag Bag2.bag \ > Bag-res4.bag $ rwbagcat --key-format=decimal Bag-res4.bag 4| 5|
To share a Bag file with a user who has a version of SiLK that includes different compression libraries, it may be necessary to change the the compression-method of the Bag.
It is not possible to change the compression-method directly. A new file must be created first, and then you may then replace the old file with the new file.
To create a new file that uses a different compression-method of the Bag file A.bag, use rwbagtool with the --add switch and specify the desired argument:
$ rwbagtool --add --compression=none --output-path=A1.bag A.bag
Unfortunately, the Bag tools do not allow changing the key type or counter type of a Bag file. To change the types, use rwbagcat(1) to write the Bag as text and rwbagbuild(1) to convert the text back to a Bag file.
$ rwbagcat Bag1.bag \ | rwbagbuild --bag-input=- --output-path=Bag1-typed.bag \ --key-type=sport --counter-type=sum-bytes
Use rwfileinfo(1) to see the type of the key and counter.
$ rwfileinfo --field=bag Bag1-typed.bag Bag1-typed.bag: bag key: sPort @ 4 octets; counter: sum-bytes @ 8 octets
Alternatively, one may use PySiLK (see pysilk(3)) to modify the key type and counter type.
$ cat bag-type.py import sys from silk import *
key_type = sys.argv[1] counter_type = sys.argv[2] old_file = sys.argv[3] new_file = sys.argv[4]
old = Bag.load(old_file, key_type=IPv4Addr) new = Bag(old, key_type=key_type, counter_type=counter_type) new.save(new_file) $ $ python bag-type.py sipv4 sum-packets Bag1.bag Bag1-type2.bag $ rwfileinfo --field=bag Bag1-type2.bag Bag1-type2.bag: bag key: sIPv4 @ 4 octets; counter: sum-packets @ 8 octets
This environment variable is used as the value for the --ipset-record-version when that switch is not provided. Since SiLK 3.7.0.
The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.
This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.
The --modify-inplace switch was added in SiLK 3.21. When --backup-path is also given, there is a small time window when the original file does not exist: the time between moving the original file to the backup location and moving the temporary file into place.
rwbagtool should handle counter overflow more consistently and gracefully.
rwbag(1), rwbagbuild(1), rwbagcat(1), rwfileinfo(1), rwset(1), rwsetbuild(1), rwsetcat(1), rwaggbag(1), rwaggbagbuild(1), rwaggbagcat(1), rwaggbagtool(1), silk(7), zlib(3)
Concatenate SiLK Flow files into single stream
rwcat [--output-path=PATH] [--note-add=TEXT] [--note-file-add=FILE] [--print-filenames] [--byte-order={big | little | native}] [--ipv4-output] [--milliseconds] [--compression-method=COMP_METHOD] [--site-config-file=FILENAME] {[--xargs] | [--xargs=FILENAME] | [FILE [FILE...]]}
rwcat --help
rwcat --version
rwcat reads SiLK Flow records and writes the records in the standard binary SiLK format to the specified output-path; rwcat writes the records to the standard output when stdout is not the terminal and --output-path is not provided.
rwcat reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwcat reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.
rwcat does not copy the invocation history and annotations (notes) from the header(s) of the source file(s) to the destination file. The --note-add or --note-file-add switch may be used to add a new annotation to the destination file.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
Write the binary SiLK Flow records to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwcat exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. When PATH ends in .gz, the output is compressed using the library associated with gzip(1). If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwcat to exit with an error.
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.
Set the byte order for the output SiLK Flow records. The argument is one of the following:
Use the byte order of the machine where rwcat is running. This is the default.
Use network byte order (big endian) for the output.
Write the output in little endian format.
Force the output to contain only IPv4 flow records. When this switch is specified, IPv6 flow records that contain addresses in the ::ffff:0:0/96 prefix are converted to IPv4 and written to the output, and all other IPv6 records are ignored. When SiLK has not been compiled with IPv6 support, rwcat acts as if this switch were always in effect.
Force the output to use record formats and versions that use millisecond timestamps. This makes the output compatible with releases of SiLK prior to SiLK 3.23.0. To read the output, SiLK 3.10.0 or later is required, and if the byte-count, packet-count, or SNMP values (in and out) exceed the maximum supported that version of SiLK, the value is set to its maximum. Since SiLK 3.23.0.
Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.
Do not compress the output using an external library.
Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.
Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.
Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.
Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.
Print the names of input files and the number of records each file contains as the files are read.
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwcat searches for the site configuration file in the locations specified in the FILES section.
Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwcat opens each named file in turn and reads records from it as if the filenames had been listed on the command line.
Print the available options and exit.
Print the version number and information about how SiLK was configured, then exit the application.
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.
To combine the results of several rwfilter(1) runs---stored in the files run1.rw, run2.rw, ... runN.rw---together to create the file combined.rw, you can use:
$ rwcat --output=combined.rw *.rw
If the shell complains about too many arguments, you can use the UNIX find(1) function and pipe its output to rwcat:
$ find . -name ’*.rw’ -print \ | rwcat --xargs --output=combined.rw
The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.
This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.
This environment variable is used as the value for the --site-config-file when that switch is not provided.
This environment variable specifies the root directory of data repository. As described in the FILES section, rwcat may use this environment variable when searching for the SiLK site configuration file.
This environment variable gives the root of the install tree. When searching for configuration files, rwcat may use this environment variable. See the FILES section for details.
Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.
rwfilter(1), rwfileinfo(1), silk(7), gzip(1), find(1), zlib(3)
Although rwcat will read from the standard input, this feature should be used with caution. rwcat will treat the standard input as a single file, as it has no way to know when one file ends and the next begins. The following will not work:
$ cat run1.rw run2.rw | rwcat --output=combined.rw # WRONG!
The header of run2.rw will be treated as data of run1.rw, resulting in corrupt output.
Combine flows denoting a long-lived session into a single flow
rwcombine [--actions=ACTIONS] [--ignore-fields=FIELDS] [--max-idle-time=NUM] [{--print-statistics | --print-statistics=FILENAME}] [--temp-directory=DIR_PATH] [--buffer-size=SIZE] [--note-add=TEXT] [--note-file-add=FILE] [--compression-method=COMP_METHOD] [--print-filenames] [--output-path=PATH] [--site-config-file=FILENAME] {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwcombine --help
rwcombine --help-fields
rwcombine --version
rwcombine reads SiLK Flow records from one or more input sources, searches for flow records where the attributes field denotes records that were prematurely created or were continuations of prematurely created flows, and attempts to combine those records into a single record. All the unmodified SiLK records and the combined records are written to the file specified by the --output-path switch or to the standard output when the --output-path switch is not provided and the standard output is not connected to a terminal.
Some flow exporters, such as yaf(1), provide fields that describe characteristics about the flow record, and these characteristics are stored in the attributes field of SiLK Flow records. The two flags that rwcombine considers are:
The flow generator prematurely created a record for a long-lived session due to the connection’s lifetime reaching the active timeout of the flow generator. (Also, when yaf is run with the --silk switch, it prematurely creates a flow and marks it with T if the byte count of the flow cannot be stored in a 32-bit value.)
The flow generator created this flow as a continuation of long-running connection, where the previous flow for this connection met a timeout. (yaf only sets this flag when it is invoked with the --silk switch.)
A very long-running session may be represented by multiple flow records, where the first record is marked with the T flag, the final record is marked with the C flag, and intermediate records are marked with both C (this record continues an earlier flow) and T (this record also met the active time-out). rwcombine attempts to combine these multiple flow records into a single record.
The input to rwcombine does not need to be sorted. As part of its processing, rwcombine may re-order the records before writing them.
rwcombine reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwcombine reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.
The algorithm rwcombine uses to combine records is
rwcombine reads SiLK flow records, examines the attributes field on each record, and immediately writes to the destination stream all records where both the time-out flag (T) and the continuation flag (C) are not set. Records where one or both of those flags are set are stored until all input records have been read.
rwcombine groups the stored records into bins where the following fields for each record in each bin are identical: sIP, dIP, sPort, dPort, protocol, sensor, in, out, nhIP, application, class, and type.
For each bin, the records are stored by time (sTime and eTime).
Within a bin, rwcombine combines two records into a single record when the attributes field of the first record has the T (time-out) flag set and the second record has the C (continuation) flag set. When combining records, the bytes field and packets fields are summed, the initialFlags from the first record is used, the sessionFlags field becomes the bit-wise OR of both sessionFlags fields and the second record’s initialFlags field, and the eTime is set to that of the second flow.
If the second record’s T flag was set, rwcombine checks to see if the third record’s C flag is set. If it is, the third record becomes part of the new record.
The previous step repeats for the records in the bin until the bin contains a single record, the most recently added record did not have the T flag set, or the next record in the bin does not have the C flag set.
After examining a bin, rwcombine writes the record(s) the bin contains to the destination stream.
Steps 3 through 7 are repeated for each bin.
The --ignore-fields switch allows the user to remove fields from the set that rwcombine uses when grouping records in Step 2.
When combining two records into one (Step 4), rwcombine completely disregards the difference between the first record’s end-time and the second record’s start-time (the idle time). To tell rwcombine not to combine those records when the difference is greater than a limit, specify that value as the argument to the --max-idle-time switch.
To see information on the number of flows combined and the minimum and maximum idle times, specify the --print-statistics switch.
During its processing, rwcombine will try to allocate a large (near 2GB) in-memory array to hold the records. (You may use the --buffer-size switch to change this maximum buffer size.) If more records are read than will fit into memory, the in-core records are temporarily stored on disk as described by the --temp-directory switch. When all records have been read, the on-disk files are merged to produce the output.
By default, the temporary files are stored in the /tmp directory. Because the sizes of the temporary files may be large, it is strongly recommended that /tmp not be used as the temporary directory, and rwcombine will print a warning when /tmp is used. To modify the temporary directory used by rwcombine, provide the --temp-directory switch, set the SILK_TMPDIR environment variable, or set the TMPDIR environment variable.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
Select the type of action(s) that rwcombine should take to combine the input records. The default action is all, and the following actions are supported:
Perform all the actions described below.
Combine into a single flow record those records where the timeout flags in the attributes field indicate that the flow exporter has divided a long-lived session into multiple flow records.
This switch is provided for future expansion of rwcombine, since at present rwcombine supports a single action. When writing a script that uses rwcombine, specify --action=timeout for compatibility with future versions of rwcombine.
Ignore the fields listed in FIELDS when determining if two flow records should be grouped into the same bin; that is, treat FIELDS as being identical across all flows. By default, rwcombine puts records into a bin when the records have identical values for the following fields: sIP, dIP, sPort, dPort, protocol, sensor, in, out, nhIP, application, class, and type.
FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Field-names are case-insensitive. Example:
--ignore-fields=sensor,12-15
The list of supported fields are:
source IP address
destination IP address
source port for TCP and UDP, or equivalent
destination port for TCP and UDP, or equivalent
IP protocol
name or ID of sensor at the collection point
router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))
router SNMP output interface or postVlanId
router next hop IP
class and type of sensor at the collection point (represented internally by a single value)
guess as to the content of the flow. Some software that generates flow records from packet data, such as yaf(1), will inspect the contents of the packets that make up a flow and use traffic signatures to label the content of the flow. SiLK calls this label the application; yaf refers to it as the appLabel. The application is the port number that is traditionally used for that type of traffic (see the /etc/services file on most UNIX systems). For example, traffic that the flow generator recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard HTTP/web port (80).
Do not combine flow records when the start time of the second flow record begins NUM seconds after the end time of the first flow record. NUM may be a floating point value. If not specified, the maximum idle time may be considered infinite.
Print to the standard error or to the specified FILENAME the number of flows records read and written, the number of flows that did not require combining, the number of flows combined, the number that could not be combined, and minimum and maximum idle time between combined flow records.
Specify the name of the directory in which to store data files temporarily when more records have been read that will fit into RAM. This switch overrides the directory specified in the SILK_TMPDIR environment variable, which overrides the directory specified in the TMPDIR variable, which overrides the default, /tmp.
Set the maximum size of the buffer to use for holding the records, in bytes. A larger buffer means fewer temporary files need to be created, reducing the I/O wait times. The default maximum for this buffer is near 2GB. The SIZE may be given as an ordinary integer, or as a real number followed by a suffix K, M or G, which represents the numerical value multiplied by 1,024 (kilo), 1,048,576 (mega), and 1,073,741,824 (giga), respectively. For example, 1.5K represents 1,536 bytes, or one and one-half kilobytes. (This value does not represent the absolute maximum amount of RAM that rwcombine will allocate, since additional buffers will be allocated for reading the input and writing the output.)
Write the binary SiLK Flow records to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwcombine exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwcombine to exit with an error.
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.
Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.
Do not compress the output using an external library.
Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.
Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.
Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.
Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.
Print to the standard error the names of input files as they are opened.
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwcombine searches for the site configuration file in the locations specified in the FILES section.
Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwcombine opens each named file in turn and reads records from it as if the filenames had been listed on the command line.
Print the available options and exit.
Print the description and alias(es) of each field and exit.
Print the version number and information about how SiLK was configured, then exit the application.
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.
Use rwfilter(1) to find ssh flow records that involve the host 192.168.126.252. The output from rwcut(1) shows the flow exporter split this long-lived ssh session into multiple flow records:
$ rwfilter --saddr=192.168.126.252 --dport=22 --pass=- data.rw \ | rwcut --fields=flags,attributes,stime,etime flags|attribut| sTime| eTime| S PA |T |2009/02/13T00:29:59.563|2009/02/13T00:59:39.668| PA |TC |2009/02/13T00:59:39.668|2009/02/13T01:29:19.478| PA |TC |2009/02/13T01:29:19.478|2009/02/13T01:58:48.890| PA |TC |2009/02/13T01:58:48.891|2009/02/13T02:28:43.599| F PA | C |2009/02/13T02:28:43.600|2009/02/13T02:32:58.272|
Here is the other half of that conversation:
$ rwfilter --daddr=192.168.126.252 --sport=22 --pass=- data.rw \ | rwcut --fields=flags,attributes,stime,etime flags|attribut| sTime| eTime| S PA |T |2009/02/13T00:30:00.060|2009/02/13T00:59:39.667| PA |TC |2009/02/13T00:59:39.670|2009/02/13T01:29:19.478| PA |TC |2009/02/13T01:29:19.481|2009/02/13T01:58:48.890| PA |TC |2009/02/13T01:58:48.893|2009/02/13T02:28:43.599| F PA | C |2009/02/13T02:28:43.600|2009/02/13T02:32:58.271|
Use rwuniq(1) to compute the byte and packet counts for that ssh session:
$ rwfilter --any-addr=192.168.126.252 --aport=22 --pass=- data.rw \ | rwuniq --fields=sip,dip,sport,dport --values=records,byte,packets sIP| dIP|sPort|dPort|Records| Bytes|Packets| 10.11.156.107|192.168.126.252| 22|28975| 5|4677240| 3881| 192.168.126.252| 10.11.156.107|28975| 22| 5| 281939| 3891|
Invoke rwcombine on these records and store the result in the file combined.rw:
$ rwfilter --any-addr=192.168.126.252 --aport=22 --pass=- data.rw \ | rwcombine --print-statistics --output-path=combined.rw FLOW RECORD COUNTS: Read: 10 Initially Complete: - 0 * Sorted & Examined: = 10 Missing end: - 0 * Missing start & end: - 0 * Missing start: - 0 * Prior to combining: = 10 Eliminated: - 8 Made complete: = 2 * Written: 2 (sum of *)
IDLE TIMES: Minimum: 0:00:00:00.000 Penultimate: 0:00:00:00.000 Maximum: 0:00:00:00.003
View the resulting records:
$ rwcut --fields=sip,dip,sport,dport,bytes,packets,flags combined.rw sIP| dIP|sPort|dPort| bytes|packets| flags| 10.11.156.107|192.168.126.252| 22|28975|4677240| 3881|FS PA | 192.168.126.252| 10.11.156.107|28975| 22| 281939| 3891|FS PA |
$ rwcut --fields=sip,attributes,stime,etime combined.rw sIP|attribut| sTime| eTime| 10.11.156.107| |2009/02/13T00:30:00.060|2009/02/13T02:32:58.271| 192.168.126.252| |2009/02/13T00:29:59.563|2009/02/13T02:32:58.272|
When set and --temp-directory is not specified, rwcombine writes the temporary files it creates to this directory. SILK_TMPDIR overrides the value of TMPDIR.
When set and SILK_TMPDIR is not set, rwcombine writes the temporary files it creates to this directory.
The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.
This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.
This environment variable is used as the value for the --site-config-file when that switch is not provided.
This environment variable specifies the root directory of data repository. As described in the FILES section, rwcombine may use this environment variable when searching for the SiLK site configuration file.
This environment variable gives the root of the install tree. When searching for configuration files, rwcombine may use this environment variable. See the FILES section for details.
When set to 1, rwcombine prints debugging messages to the standard error as it creates, re-opens, and removes temporary files.
Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.
Directory in which to create temporary files.
rwfilter(1), rwcut(1), rwuniq(1), rwfileinfo(1), sensor.conf(5), silk(7), yaf(1), zlib(3)
The first release of rwcombine occurred in SiLK 3.9.0.
Compare the records in two SiLK Flow files
rwcompare [--quiet] [--site-config-file] FILE1 FILE2
rwcompare --help
rwcompare --version
rwcompare opens the two files named on the command and compares the SiLK Flow records they contain. If the records are identical, rwcompare exits with status 0. If any of the records differ, rwcompare prints a message and exits with status 1. If there is an issue reading either file, an error is printed and the exit status is 2. Use the --quiet switch to suppress all output (error messages included). You may use - or stdin for one of the file names, in which case rwcompare reads from the standard input.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
Do not print a message if the files differ, and do not an print error message if a file cannot be opened or read.
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwcombine searches for the site configuration file in the locations specified in the FILES section.
Print the available options and exit.
Print the version number and information about how SiLK was configured, then exit the application.
In the following examples, the dollar sign ($) represents the shell prompt. Some input lines are split over multiple lines in order to improve readability, and a backslash (\) is used to indicate such lines. The examples assume the existence of the file data.rw that contains SiLK Flow records. The exit status of the most recent command is available in the shell variable $?.
Compare a file with itself:
$ rwcompare data.rw data.rw $ echo $? 0
Compare a file with itself, where one instance of the file is read from the standard input:
$ rwcat data.rw | rwcompare - data.rw $ echo $? 0
Use rwsort(1) to modify one instance of the file and compare the results:
$ rwsort --fields=proto data.rw | rwcompare - data.rw - data.rw differ: record 1 $ echo $? 1
Run the command again and use the --quiet switch:
$ rwsort --fields=proto data.rw | rwcompare --quiet - data.rw $ echo $? 1
Compare the file with input containing two copies of the file:
$ rwcat data.rw data.rw | rwcompare data.rw - data.rw - differ: EOF data.rw $ echo $? 1
Compare the file with /dev/null:
$ rwcompare --quiet /dev/null data.rw $ echo $? 2
rwcompare checks whether two files have the same records in the same order. To compare two arbitrary files, use rwsort(1) to reorder the records. Make certain to provide enough fields to the rwsort command so that the records are in the same order.
$ rwsort --fields=1-10,12-15,20-29 data.rw > /tmp/sorted-data.rw $ rwsort --fields=1-10,12-15,20-29 other-data.rw \ | rwcompare /tmp/sorted-data.rw - /tmp/sorted-data.rw - differ: record 103363
This environment variable is used as the value for the --site-config-file when that switch is not provided.
This environment variable specifies the root directory of data repository. As described in the FILES section, rwcombine may use this environment variable when searching for the SiLK site configuration file.
This environment variable gives the root of the install tree. When searching for configuration files, rwcombine may use this environment variable. See the FILES section for details.
Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.
rwfileinfo(1), rwcat(1), rwsort(1), silk(7)
Print traffic summary across time
rwcount [--bin-size=SIZE] [--load-scheme=LOADSCHEME] [--start-time=START_TIME] [--end-time=END_TIME] [--skip-zeroes] [--bin-slots] [--epoch-slots] [--timestamp-format=FORMAT] [--no-titles] [--no-columns] [--column-separator=CHAR] [--no-final-delimiter] [{--delimited | --delimited=CHAR}] [--print-filenames] [--copy-input=PATH] [--output-path=PATH] [--pager=PAGER_PROG] [--site-config-file=FILENAME] [{--legacy-timestamps | --legacy-timestamps={1,0}}] {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwcount --help
rwcount --version
rwcount summarizes SiLK flow records across time. It counts the records in the input stream, and groups their byte and packet totals into time bins. rwcount produces textual output with one row for each bin.
rwcount reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwcount reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.
rwcount splits each flow record into bins whose size is determined by the argument to the --bin-size switch. When that switch is not provided, rwcount uses 30-second bins by default.
By default, the first row of data rwcount prints is the bin containing the starting time of the earliest record that appears in the input. rwcount then prints a row for every bin until it reaches the bin containing the most recent ending time. Rows whose counts are zero are printed unless the --skip-zero switch is specified.
The --start-time and --end-time switches tell rwcount to use a specific time for the first row and the final row. The --start-time switch always sets the time stamp on the first bin to the specified time. With the --end-time switch, rwcount computes a maximum end-time by setting any unspecified hour, minute, second, and millisecond field to its maximum value, and the final bin is that which contains the maximum end-time.
When --start-time and --end-time are both specified, rwcount reserves the memory for the bins before it begins processing the records. If the memory cannot be allocated, rwcount exits. If this happens, try reducing the time span or increasing the bin-size.
A router or other flow generator summarizes the traffic it sees into records. In addition to the five-tuple (source port and address, destination port and address, and protocol), the record has its start time, end time, total byte count, and total packet count. There is no way to know how the bytes and packets were distributed during the duration of the record: their distribution could be front-loaded, back-loaded, uniform, et cetera.
When the start and end times of a individual flow record put that record into a single bin, rwcount can simply add that record’s volume (byte and packet counts) to the bin.
When the duration of a flow record causes it to span multiple bins, rwcount must to told how to allocate the volume among the bins. The --load-scheme switch determines this, and it has supports the following allocation schemes:
Each bin a flow spans is allocated a percentage of the flow’s volume proportional to the amount of the flow’s active time that spans the bin. Specifically, rwcount divides the total volume of the flow by the duration of the flow, and multiplies the quotient by the time spent in the bin. This models a flow where the volume/second ratio is uniform throughout the flow.
Each bin a flow spans is allocated an equal portion of the flow’s volume. rwcount divides the volume of the flow by the number of bins the flow spans, and adds the quotient to each of the bins. In this scheme, the volume/bin ratio is uniform.
The bin that contains the flow’s start time is allocated all of the flow’s volume regardless of the flow’s duration. rwcount adds the total volume for the flow into the bin containing the start time of the flow. This models a flow that is front-loaded to the point where the entire volume is a single spike occurring in the initial millisecond of flow.
The bin that contains the midpoint between the flow’s start time and end time is allocated all of the flow’s volume regardless of the flow’s duration.
The bin that contains the flow’s end time is allocated all of the flow’s volume regardless of the flow’s duration. This models a flow that is back-loaded to the point where the entire volume is a single spike occurring in final millisecond of the flow.
Each bin the flow spans is allocated all of the flow’s volume. rwcount adds the entire volume for the flow into every bin that contains any part of the flow. In theory, the distribution of the bytes in the record could be a spike that occurs at any point during the flow’s duration. This scheme allows one to determine, in aggregate, the maximum possible volume that could have occurred during this bin. In this scheme, the Records column gives the number of records that were active during the bin.
For a record that spans multiple bins, each bin is allocated none of the flow’s volume. That is, rwcount acts as though the volume for the flow occurred in some other bin. Since it is possible that a record that spans multiple bins did not contribute any volume to the current bin, this scheme allows one to determine, in aggregate, the minimum possible volume that may have occurred during this bin. The Records column in this scheme, as in the maximum-volume scheme, gives the number of flow records that were active during the bin.
Be aware that the ”spike” load-schemes allocate the entire flow to a single bin. This can create the impression that there is more traffic occurring during a particular time window that the physical network supports.
The maximum-volume and minimum-volume schemes are used to compute the maximum and minimum volumes that could have been transferred during any one bin. maximum-volume intentionally over-counts the flow volume and minimum-volume intentionally under-counts.
To see the effect of the various load-schemes, suppose rwcount is using 60-second bins and the input contains two records. The first record begins at 12:03:50, ends at 12:06:20, and contains 9,000 bytes (60 bytes/second for 150 seconds). This record may contribute to bins at 12:03, 12:04, 12:05, and 12:06. The second record begins at 12:04:05 and lasts 15 seconds; this record’s volume always contributes its 200 bytes to the 12:04 bin. The --load-scheme option splits the byte-counts of the records as follows:
BIN 12:03:00 12:04:00 12:05:00 12:06:00
time-proportional 600 3800 3600 1200 bin-uniform 2250 2450 2250 2250 start-spike 9000 200 0 0 middle-spike 0 200 9000 0 end-spike 0 200 0 9000 maximum-volume 9000 9200 9000 9000 minimum-volume 0 200 0 0
For the record that spans multiple bins: the time-proportional scheme assumes 60 bytes/second, the bin-uniform scheme divides the volume evenly by the four bins, the middle-spike scheme assumes all the volume occurs at 12:05:05, the maximum-volume scheme adds the volume to every bin, and the minimum-volume scheme ignores the record.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
Denote the size of each time bin, in seconds; defaults to 30 seconds. rwcount supports millisecond size bins; SIZE may be a floating point value equal to or greater than than 0.001.
Specify how a flow record that spans multiple bins allocates its bytes and packets among the bins. The default scheme is time-proportional, which assumes the volume/second ratio of the flow record is constant. See the Load Scheme section for additional information on the load-scheme choices. The LOADSCHEME may be one of the following names or numbers; names may be abbreviated to the shortest prefix that is unique.
Allocate the volume in proportion to the amount of time the flow spent in the bin.
Allocate the volume evenly across the bins that contain any part of the flow’s duration.
Allocate the entire volume to the bin containing the start time of the flow.
Allocate the entire volume to the bin containing the time at the midpoint of the flow.
Allocate the entire volume to the bin containing the end time of the flow.
Allocate the entire volume to all of the bins containing any part of the flow.
Allocate the flow’s volume to a bin only if the flow is completely contained within the bin; otherwise ignore the flow.
Set the time of the first bin to START_TIME. When this switch is not given, the first bin is one that holds the starting time of the earliest record. The START_TIME may be specified in a format of yyyy/mm/dd[:HH[:MM[:SS[.sss]]]] (or T may be used in place of : to separate the day and hour). The time must be specified to at least day precision, and unspecified hour, minute, second, and millisecond values are set to zero. Whether the date strings represent times in UTC or the local timezone depend on how SiLK was compiled, which can be determined from the Timezone support setting in the output from rwcount --version. Alternatively, the time may be specified as seconds since the UNIX epoch, and an unspecified milliseconds value is set to 0.
Set the time of the final bin to END_TIME. When this switch is not given, the final bin is one that holds the ending time of the latest record. The format of END_TIME is the same as that for START_TIME. Unspecified hour, minute, second, and millisecond values are set to 23, 59, 59, and 999 respectively. When END_TIME is specified as seconds since the UNIX epoch, an unspecified milliseconds value is set to 999. When both --start-time and --end-time are used, the END_TIME is adjusted so that the final bin represents a complete interval.
Disable printing of bins with no traffic. By default, all bins are printed.
Use the internal bin index as the label for each bin in the output; the default is to label each bin with the time in a human-readable format.
Use the UNIX epoch time (number of seconds since midnight UTC on 1970-01-01) as the label for each bin in the output; the default is to label each bin with the time in a human-readable format. This switch is equivalent to --timestamp-format=epoch. This switch is deprecated as of SiLK 3.11.0, and it will be removed in the SiLK 4.0 release.
Specify the format and/or timezone to use when printing timestamps. When this switch is not specified, the SILK_TIMESTAMP_FORMAT environment variable is checked for a default format and/or timezone. If it is empty or contains invalid values, timestamps are printed in the default format, and the timezone is UTC unless SiLK was compiled with local timezone support. FORMAT is a comma-separated list of a format and/or a timezone. The format is one of:
Print the timestamps as YYYY/MM/DDThh:mm:ss .
Print the timestamps as YYYY-MM-DD hh:mm:ss .
Print the timestamps as MM/DD/YYYY hh:mm:ss .
Print the timestamps as the number of seconds since 00:00:00 UTC on 1970-01-01.
When a timezone is specified, it is used regardless of the default timezone support compiled into SiLK. The timezone is one of:
Use Coordinated Universal Time to print timestamps.
Use the TZ environment variable or the local timezone.
Turn off column titles. By default, titles are printed.
Disable fixed-width columnar output.
Use specified character between columns and after the final column. When this switch is not specified, the default of ’|’ is used.
Do not print the column separator after the final column. Normally a delimiter is printed.
Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default ’|’.
Print to the standard error the names of input files as they are opened.
Copy all binary SiLK Flow records read as input to the specified file or named pipe. PATH may be stdout or - to write flows to the standard output as long as the --output-path switch is specified to redirect rwcount’s textual output to a different location.
Write the textual output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output (and bypass the paging program). If PATH names an existing file, rwcount exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is either sent to the pager or written to the standard output.
When output is to a terminal, invoke the program PAGER_PROG to view the output one screen full at a time. This switch overrides the SILK_PAGER environment variable, which in turn overrides the PAGER variable. If the --output-path switch is given or if the value of the pager is determined to be the empty string, no paging is performed and all output is written to the terminal.
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwcount searches for the site configuration file in the locations specified in the FILES section.
When NUM is not specified or is 1, this switch is equivalent to --timestamp-format=m/d/y. Otherwise, the switch has no effect. This switch is deprecated as of SiLK 3.0.0, and it will be removed in the SiLK 4.0 release.
Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwcount opens each named file in turn and reads records from it as if the filenames had been listed on the command line.
Print the available options and exit.
Print the version number and information about how SiLK was configured, then exit the application.
Alias the --start-time switch. This switch is deprecated as of SiLK 3.8.0.
Alias the --end-time switch. This switch is deprecated as of SiLK 3.8.0.
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.
To count all web traffic on Feb 12, 2009, into 1 hour bins:
$ rwfilter --pass=stdout --start-date=2009/02/12:00 \ --end-date=2009/02/12:23 --proto=6 --aport=80 \ | rwcount --bin-size=3600 Date| Records| Bytes| Packets| 2009/02/12T00:00:00| 1490.49| 578270918.16| 463951.55| 2009/02/12T01:00:00| 1459.33| 596455716.52| 457487.80| 2009/02/12T02:00:00| 1529.06| 562602842.44| 451456.41| 2009/02/12T03:00:00| 1503.89| 562683116.38| 455554.81| 2009/02/12T04:00:00| 1561.89| 590554569.78| 489273.81| ....
To bin the records according to their start times, use the --load-scheme switch:
$ rwfilter ... --pass=stdout \ | rwcount --bin-size=3600 --load-scheme=1 Date| Records| Bytes| Packets| 2009/02/12T00:00:00| 1494.00| 580350969.00| 464952.00| 2009/02/12T01:00:00| 1462.00| 596145212.00| 457871.00| 2009/02/12T02:00:00| 1526.00| 561629416.00| 451088.00| 2009/02/12T03:00:00| 1502.00| 563500618.00| 455262.00| 2009/02/12T04:00:00| 1562.00| 589265818.00| 489279.00| ...
To bin the records by their end times: $ rwfilter ... --pass=stdout \| rwcount --bin-size=3600 --load-scheme=2 Date| Records| Bytes| Packets| 2009/02/12T00:00:00| 1488.00| 577132372.00| 463393.00| 2009/02/12T01:00:00| 1458.00| 596956697.00| 457376.00| 2009/02/12T02:00:00| 1530.00| 562806395.00| 451551.00| 2009/02/12T03:00:00| 1506.00| 562101791.00| 455671.00| 2009/02/12T04:00:00| 1562.00| 591408602.00| 489371.00| ...
To force the hourly bins to run from 30 minutes past the hour, use the --start-time switch:
$ rwfilter ... --pass=stdout \ | rwcount --bin-size=3600 --start-time=2002/12/31:23:30 Date| Records| Bytes| Packets| 2009/02/12T00:30:00| 1483.26| 581251364.04| 456554.40| 2009/02/12T01:30:00| 1494.00| 575037453.00| 449280.00| 2009/02/12T02:30:00| 1486.36| 559700466.61| 447700.15| 2009/02/12T03:30:00| 1555.23| 588882400.58| 480724.48| 2009/02/12T04:30:00| 1537.79| 564756248.52| 472003.45| ...
This environment variable is used as the value for --timestamp-format when that switch is not provided. Since SiLK 3.11.0.
When set to a non-empty string, rwcount automatically invokes this program to display its output a screen at a time. If set to an empty string, rwcount does not automatically page its output.
When set and SILK_PAGER is not set, rwcount automatically invokes this program to display its output a screen at a time.
The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.
This environment variable is used as the value for the --site-config-file when that switch is not provided.
This environment variable specifies the root directory of data repository. As described in the FILES section, rwcount may use this environment variable when searching for the SiLK site configuration file.
This environment variable gives the root of the install tree. When searching for configuration files, rwcount may use this environment variable. See the FILES section for details.
When the argument to the --timestamp-format switch includes local or when a SiLK installation is built to use the local timezone, the value of the TZ environment variable determines the timezone in which rwcount displays timestamps. (If both of those are false, the TZ environment variable is ignored.) If the TZ environment variable is not set, the machine’s default timezone is used. Setting TZ to the empty string or 0 causes timestamps to be displayed in UTC. For system information on the TZ variable, see tzset(3) or environ(7). (To determine if SiLK was built with support for the local timezone, check the Timezone support value in the output of rwcount --version.) The TZ environment variable is also used when rwcount parses the timestamp specified in the --start-time or --end-time switches if SiLK is built with local timezone support.
Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.
rwfilter(1), rwuniq(1), silk(7), tzset(3), environ(7)
Unlike rwuniq(1), rwcount does not support counting the number of distinct IPs in a bin. However, using the --bin-time switch on rwuniq can provide time-based binning similar to what rwcount supports. Note that rwuniq always bins by the each record’s start-time (similar to rwcount --load-factor=1), and there is no support in rwuniq for dividing a SiLK record among multiple time bins.
Print selected fields of binary SiLK Flow records
rwcut [{--fields=FIELDS | --all-fields}] {[--start-rec-num=START_NUM] [--end-rec-num=END_NUM] | [--tail-recs=TAIL_START_NUM]} [--num-recs=REC_COUNT] [--dry-run] [--icmp-type-and-code] [--timestamp-format=FORMAT] [--epoch-time] [--ip-format=FORMAT] [--integer-ips] [--zero-pad-ips] [--integer-sensors] [--integer-tcp-flags] [--no-titles] [--no-columns] [--column-separator=CHAR] [--no-final-delimiter] [{--delimited | --delimited=CHAR}] [--print-filenames] [--copy-input=PATH] [--output-path=PATH] [--pager=PAGER_PROG] [--site-config-file=FILENAME] [--ipv6-policy={ignore,asv4,mix,force,only}] [{--legacy-timestamps | --legacy-timestamps={1,0}}] [--plugin=PLUGIN [--plugin=PLUGIN ...]] [--python-file=PATH [--python-file=PATH ...]] [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]] [--pmap-column-width=NUM] {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwcut [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]] [--plugin=PLUGIN ...] [--python-file=PATH ...] --help
rwcut [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]] [--plugin=PLUGIN ...] [--python-file=PATH ...] --help-fields
rwcut --version
rwcut reads binary SiLK Flow records and prints the user-selected record attributes (or fields) to the terminal in a textual, bar-delimited (|) format. See the EXAMPLES section below for sample output.
rwcut reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwcut reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.
The user may provide the --fields switch to select the record attributes to print. When --fields is not specified rwcut prints the source and destination IP address, source and destination port, protocol, packet count, byte count, TCP flags, start time, duration, end time, and the sensor name. The fields are printed in the order in which they occur in the --fields switch. Fields may be repeated.
A subset of the input records may be selected by using the --start-rec-num, --end-rec-num, --num-recs, and --tail-recs switches.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
FIELDS contains the list of flow attributes (a.k.a. fields or columns) to print. The columns will be displayed in the order the fields are specified. Fields may be repeated. FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Field-names are case-insensitive. Example:
--fields=stime,10,1-5
If the --fields switch is not given, FIELDS defaults to:
sIP,dIP,sPort,dPort,protocol,packets,bytes,flags,sTime,dur,eTime,sensor
The complete list of built-in fields that the SiLK tool suite supports follows, though note that not all fields are present in all SiLK file formats; when a field is not present, its value is 0.
source IP address
destination IP address
source port for TCP and UDP, or equivalent
destination port for TCP and UDP, or equivalent
IP protocol
packet count
byte count
bit-wise OR of TCP flags over all packets
starting time of flow in microsecond resolution
duration of flow in microsecond resolution
end time of flow in microsecond resolution
name or ID of sensor at the collection point
class of sensor at the collection point
type of sensor at the collection point
the ICMP type value for ICMP or ICMPv6 flows and empty for non-ICMP flows. This field was introduced in SiLK 3.8.1.
the ICMP code value for ICMP or ICMPv6 flows and empty for non-ICMP flows. See note at iType.
equivalent to iType,iCode. This field is deprecated as of SiLK 3.8.1.
Many SiLK file formats do not store the following fields and their values will always be 0; they are listed here for completeness:
router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))
router SNMP output interface or postVlanId
router next hop IP
Enhanced flow metering software (such as yaf(1)) may provide flow information elements in addition to those found in NetFlow. SiLK stores some of these elements in the fields named below. For flows without this additional information, the field’s value is always 0.
TCP flags on first packet in the flow
bit-wise OR of TCP flags on the second through final packets in the flow
flow attributes set by the flow generator:
all the packets in this flow record are exactly the same size
flow generator saw additional packets in this flow following a packet with a FIN flag (excluding ACK packets)
flow generator prematurely created a record for a long-running connection due to a timeout. (When the flow generator yaf(1) is run with the --silk switch, it will prematurely create a flow and mark it with T if the byte count of the flow cannot be stored in a 32-bit value.)
flow generator created this flow as a continuation of long-running connection, where the previous flow for this connection met a timeout (or a byte threshold in the case of yaf).
Consider a long-running ssh session that exceeds the flow generator’s active timeout. (This is the active timeout since the flow generator creates a flow for a connection that still has activity). The flow generator will create multiple flow records for this ssh session, each spanning some portion of the total session. The first flow record will be marked with a T indicating that it hit the timeout. The second through next-to-last records will be marked with TC indicating that this flow both timed out and is a continuation of a flow that timed out. The final flow will be marked with a C, indicating that it was created as a continuation of an active flow.
guess as to the content of the flow. Some software that generates flow records from packet data, such as yaf, will inspect the contents of the packets that make up a flow and use traffic signatures to label the content of the flow. SiLK calls this label the application; yaf refers to it as the appLabel. The application is the port number that is traditionally used for that type of traffic (see the /etc/services file on most UNIX systems). For example, traffic that the flow generator recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard HTTP/web port (80).
The following fields provide a way to label the IPs or ports on a record. These fields require external files to provide the mapping from the IP or port to the label:
for the source IP address, the value 0 if the address is non-routable, 1 if it is internal, or 2 if it is routable and external. Uses the mapping file specified by the SILK_ADDRESS_TYPES environment variable, or the address_types.pmap mapping file, as described in addrtype(3).
as sType for the destination IP address
for the source IP address, a two-letter country code abbreviation denoting the country where that IP address is located. Uses the mapping file specified by the SILK_COUNTRY_CODES environment variable, or the country_codes.pmap mapping file, as described in ccfilter(3). The abbreviations are those defined by ISO 3166-1 (see for example https://www.iso.org/iso-3166-country-codes.html or https://en.wikipedia.org/wiki/ISO\3166-1\alpha-2) or the following special codes: -- N/A (e.g. private and experimental reserved addresses); a1 anonymous proxy; a2 satellite provider; o1 other
as scc for the destination IP
label contained in the prefix map file associated with map-name. If the prefix map is for IP addresses, the label is that associated with the source IP address. If the prefix map is for protocol/port pairs, the label is that associated with the protocol and source port. See also the description of the --pmap-file switch below and the pmapfilter(3) manual page.
as src-map-name for the destination IP address or the protocol and destination port.
as src-map-name when no map-name is associated with the prefix map file
as dst-map-name when no map-name is associated with the prefix map file
Finally, the list of built-in fields may be augmented by the run-time loading of PySiLK code or plug-ins written in C (also called shared object files or dynamic libraries), as described by the --python-file and --plugin switches.
Instruct rwcut to print all known fields. This switch may not be combined with the --fields switch. This switch suppresses error messages from the plug-ins.
Augment the list of fields by using run-time loading of the plug-in (shared object) whose path is PLUGIN. The switch may be repeated to load multiple plug-ins. The creation of plug-ins is described in the silk-plugin(3) manual page. When PLUGIN does not contain a slash (/), rwcut will attempt to find a file named PLUGIN in the directories listed in the FILES section. If rwcut finds the file, it uses that path. If PLUGIN contains a slash or if rwcut does not find the file, rwcut relies on your operating system’s dlopen(3) call to find the file. When the SILK_PLUGIN_DEBUG environment variable is non-empty, rwcut prints status messages to the standard error as it attempts to find and open each of its plug-ins.
Begin printing with the START_NUM’th record by skipping the first START_NUM-1 records. The default is 1; that is, to start printing at the first record; START_NUM must be a positive integer. If START_NUM is greater than the number of input records, rwcut only outputs the title. This switch may not be combined with the --tail-recs switch. When using multiple input files, records are treated as a single stream for the purposes of the --start-rec-num, --end-rec-num, --tail-recs, and --num-recs switches. This switch does not affect the records written to the stream specified by --copy-input.
Stop printing after the END_NUM’th record. When END_NUM is 0, the default, printing stops once all input records have been printed; that is, END_NUM is effectively infinity. If this value is non-zero, it must not be less than START_NUM. This switch may not be combined with the --tail-recs switch. When using multiple input files, records are treated as a single stream for the purposes of the --start-rec-num, --end-rec-num, --tail-recs, and --num-recs switches. This switch does not affect the records written to the stream specified by --copy-input.
Begin printing once rwcut is TAIL_START_NUM records from end of the input stream, where TAIL_START_NUM is a positive integer. rwcut will print the remaining records in the input stream unless --num-recs is also specified and is less than TAIL_START_NUM. The --tail-recs switch is similar to the --start-rec-num switch except it counts from the end of the input stream. This switch may not be combined with the --start-rec-num and --end-rec-num switches. When using multiple input files, records are treated as a single stream for the purposes of the --start-rec-num, --end-rec-num, --tail-recs, and --num-recs switches. This switch does not affect the records written to the stream specified by --copy-input.
Print no more than REC_COUNT records. Specifying a REC_COUNT of 0 will print all records, which is the default. This switch is ignored under the following conditions: When both --start-rec-num and --end-rec-num are specified; when only --end-rec-num is given and END_NUM is less than REC_COUNT; when --tail-recs is specified and TAIL_START_NUM is less than REC_COUNT. When using multiple input files, records are treated as a single stream for the purposes of the --start-rec-num, --end-rec-num, --tail-recs, and --num-recs switches. This switch does not affect the records written to the stream specified by --copy-input.
Causes rwcut to print the column headers and exit. Useful for testing.
Unlike TCP or UDP, ICMP messages do not use ports, but instead have types and codes. Specifying this switch will cause rwcut to print, for ICMP records, the message’s type and code in the sPort and dPort columns, respectively. Use of this switch has been discouraged since SiLK 0.9.10. As for SiLK 3.8.1, this switch is deprecated and it will be removed in SiLK 4.0; use the iType and iCode fields instead.
Specify the format, timezone, and/or precision (representation of fractional seconds) to use when printing timestamps and the duration. When this switch is not specified, the SILK_TIMESTAMP_FORMAT environment variable is checked for a format, timezone, and precision. If it is empty or contains invalid values, timestamps are printed in the default format with microseconds, and the timezone is UTC unless SiLK was compiled with local timezone support. FORMAT is a comma-separated list of a format, a timezone, and/or a precision in any order. The format is one of:
Print the timestamps as YYYY /MM/DDThh:mm:ss.sss.
Print the timestamps as YYYY -MM-DD hh:mm:ss.sss.
Print the timestamps as MM/DD/YYYY hh:mm:ss.sss.
Print the timestamps as the number of seconds since 00:00:00 UTC on 1970-01-01.
The --timestamp-format switch may change the representation of fractional seconds, or precision, of the timestamp and duration fields from their default of microseconds. Note: When using a precision less than that used by SiLK internally, the printed start time and duration may not equal the printed end time. The available precisions are:
Truncate the fractional seconds value on the timestamps and on the duration field. Previously this was called no-msec. Since SiLK 3.23.0.
Print the fractional seconds to 3 decimal places. Since SiLK 3.23.0.
Print the fractional seconds to 6 decimal places. Since SiLK 3.23.0.
Print the fractional seconds to 9 decimal places. Since SiLK 3.23.0.
Truncate the fractional seconds value on the timestamps and on the duration field. This is an alias for no-frac and is deprecated as of SiLK 3.23.0.
When a timezone is specified, it is used regardless of the default timezone support compiled into SiLK. The timezone is one of:
Use Coordinated Universal Time to print timestamps.
Use the TZ environment variable or the local timezone.
Print timestamps as epoch time (number of seconds since midnight GMT on 1970-01-01). This switch is equivalent to --timestamp-format=epoch, it is deprecated as of SiLK 3.0.0, and it will be removed in the SiLK 4.0 release.
Specify how IP addresses are printed, where FORMAT is a comma-separated list of the arguments described below. When this switch is not specified, the SILK_IP_FORMAT environment variable is checked for a value and that format is used if it is valid. The default FORMAT is canonical according to whether the individual flow record is marked as IPv4 or IPv6. Since SiLK 3.7.0.
Print IP addresses in the canonical format. For an IPv4 record, use dot-separated decimal (192.0.2.1). For an IPv6 record, use either colon-separated hexadecimal (2001:db8::1) or a mixed IPv4-IPv6 representation for IPv4-mapped IPv6 addresses (the ::ffff:0:0/96 netblock, e.g., ::ffff:192.0.2.1) and IPv4-compatible IPv6 addresses (the ::/96 netblock other than ::/127, e.g., ::192.0.2.1).
Print IP addresses in the canonical format (192.0.2.1 or 2001:db8::1) but do not used the mixed IPv4-IPv6 representations. For example, use ::ffff:c000:201 instead of ::ffff:192.0.2.1. Since SiLK 3.17.0.
Print IP addresses as integers in decimal format. For example, print 192.0.2.1 and 2001:db8::1 as 3221225985 and 42540766411282592856903984951653826561, respectively.
Print IP addresses as integers in hexadecimal format. For example, print 192.0.2.1 and 2001:db8::1 as c00000201 and 20010db8000000000000000000000001, respectively.
Make all IP address strings contain the same number of characters by padding numbers with leading zeros. For example, print 192.0.2.1 and 2001:db8::1 as 192.000.002.001 and 2001:0db8:0000:0000:0000:0000:0000:0001, respectively. For IPv6 addresses, this setting implies no-mixed, so that ::ffff:192.0.2.1 is printed as 0000:0000:0000:0000:0000:ffff:c000:0201. As of SiLK 3.17.0, may be combined with any of the above, including decimal and hexadecimal.
The following arguments modify certain IP addresses prior to printing. These arguments may be combined with the above formats.
Change IPv4 addresses to IPv4-mapped IPv6 addresses (addresses in the ::ffff:0:0/96 netblock) prior to formatting. Since SiLK 3.17.0.
Change any IPv4-mapped IPv6 addresses (addresses in the ::ffff:0:0/96 netblock) to IPv4 addresses prior to formatting. Since SiLK 3.17.0.
The following argument is also available:
Set FORMAT to map-v4,no-mixed.
Print IP addresses as integers. This switch is equivalent to --ip-format=decimal, it is deprecated as of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.
Print IP addresses as fully-expanded, zero-padded values in their canonical form. This switch is equivalent to --ip-format=zero-padded, it is deprecated as of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.
Print the integer ID of the sensor rather than its name.
Print the TCP flag fields (flags, initialFlags, sessionFlags) as an integer value. Typically, the characters F,S,R,P,A,U,E,C are used to represent the TCP flags.
Turn off column titles. By default, titles are printed.
Disable fixed-width columnar output.
Use specified character between columns and after the final column. When this switch is not specified, the default of ’|’ is used.
Do not print the column separator after the final column. Normally a delimiter is printed.
Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default ’|’.
Print to the standard error the names of input files as they are opened.
Copy all binary SiLK Flow records read as input to the specified file or named pipe. PATH may be stdout or - to write flows to the standard output as long as the --output-path switch is specified to redirect rwcut’s textual output to a different location.
Write the textual output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output (and bypass the paging program). If PATH names an existing file, rwcut exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is either sent to the pager or written to the standard output.
When output is to a terminal, invoke the program PAGER_PROG to view the output one screen full at a time. This switch overrides the SILK_PAGER environment variable, which in turn overrides the PAGER variable. If the --output-path switch is given or if the value of the pager is determined to be the empty string, no paging is performed and all output is written to the terminal.
Determine how IPv4 and IPv6 flows are handled when SiLK has been compiled with IPv6 support. When the switch is not provided, the SILK_IPV6_POLICY environment variable is checked for a policy. If it is also unset or contains an invalid policy, the POLICY is mix. When SiLK has not been compiled with IPv6 support, IPv6 flows are always ignored, regardless of the value passed to this switch or in the SILK_IPV6_POLICY variable. The supported values for POLICY are:
Ignore any flow record marked as IPv6, regardless of the IP addresses it contains. Only records marked as IPv4 will be printed.
Convert IPv6 flow records that contain addresses in the ::ffff:0:0/96 netblock (that is, IPv4-mapped IPv6 addresses) to IPv4 and ignore all other IPv6 flow records.
Process the input as a mixture of IPv4 and IPv6 flow records.
Convert IPv4 flow records to IPv6, mapping the IPv4 addresses into the ::ffff:0:0/96 netblock.
Print only flow records that are marked as IPv6 and ignore IPv4 flow records in the input.
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwcut searches for the site configuration file in the locations specified in the FILES section.
When NUM is not specified or is 1, this switch is equivalent to --timestamp-format=m/d/y,no-msec. Otherwise, the switch has no effect. This switch is deprecated as of SiLK 3.0.0, and it will be removed in the SiLK 4.0 release.
Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwcut opens each named file in turn and reads records from it as if the filenames had been listed on the command line.
Print the available options and exit. Specifying switches that add new fields or additional switches before --help will allow the output to include descriptions of those fields or switches.
Print the description and alias(es) of each field and exit. Specifying switches that add new fields before --help-fields will allow the output to include descriptions of those fields.
Print the version number and information about how SiLK was configured, then exit the application.
Load the prefix map file located at PATH and create fields named src-map-name and dst-map-name where map-name is either the MAPNAME part of the argument or the map-name specified when the file was created (see rwpmapbuild(1)). If no map-name is available, rwcut names the fields sval and dval. Specify PATH as - or stdin to read from the standard input. The switch may be repeated to load multiple prefix map files, but each prefix map must use a unique map-name. The --pmap-file switch(es) must precede the --fields switch. See also pmapfilter(3).
When printing a label associated with a prefix map, this switch gives the maximum number of characters to use when displaying the textual value of the field.
When the SiLK Python plug-in is used, rwcut reads the Python code from the file PATH to define additional fields for possible output. This file should call register_field() for each field it wishes to define. For details and examples, see the silkpython(3) and pysilk(3) manual pages.
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.
The standard output from rwcut resembles the following (with the text wrapped for readability):
sIP| dIP|sPort|dPort|pro|\ 10.30.30.31| 10.70.70.71| 80|36761| 6|\
packets| bytes| flags|\ 7| 3227|FS PA |\
sTime| duration| eTime|senso| 2003/01/01T00:00:14.625| 3.959|2003/01/01T00:00:18.584|EDGE1|
The first line of the output is the title line which shows the names of the selected fields; the --no-titles switch will disable the printing of the title line. The second line and onward will contain the printed representation of the records, with one line per record.
A common use of rwcut is to read the output of rwfilter(1). For example, to see representative TCP traffic:
$ rwfilter --start-date=2002/01/19:00 --end-date=2002/01/19:01 \ --proto=6 --pass=stdout \ | rwcut
To see only selected fields, use the --fields switch. For example, to print only the protocol for each record in the input file data.rw, use:
$ rwcut --fields=proto data.rw
The silkpython(3) manual page provides examples that use PySiLK to create and print arbitrary fields for rwcut.
The order of the FIELDS is significant, and fields can be repeated. For example, here is a case where in addition to the default fields of 1-12, you also to prefix each row with an integer form of the destination IP and the start time to make processing by another tool (e.g., a spreadsheet) easier. However, within the default fields of 1-12, you want to see dotted-decimal IP addresses. (The num2dot(1) tool converts the numeric fields in column positions three and four to dotted quad IPs.)
$ rwfilter ... --pass=stdout \ | rwcut --fields=2,9,1-12 --ip-format=decimal --timestamp-format=epoch \ | num2dot --ip-field=3,4
Both of the following commands print the title line and the first record in the input stream:
$ rwcut --num-recs=1 data.rw
$ rwcut --end-rec-num=1 data.rw
The following prints all records except the first (plus the title):
$ rwcut --start-rec-num=2 data.rw
These three commands print only the second record:
$ rwcut --no-title --start-rec-num=2 --num-recs=1 data.rw
$ rwcut --no-title --start-rec-num=2 --end-rec-num=2 data.rw
$ rwcut --no-title --end-rec-num=2 --num-recs=1 data.rw
This command prints the title line and the final record in the input stream:
$ rwcut --tail-recs=1 data.rw
This command prints the next to last record in the input stream:
$ rwcut --no-title --tail-recs=2 --num-recs=1 data.rw
Using the sIP and dIP fields can be confusing when the file you are examining contains both incoming and outgoing flow records. To make the output more clear, consider using the int-ext-fields(3) plug-in. The plug-in defines four additional fields representing the external IP address, the external port, the internal IP address, and the internal port. The plug-in requires the user to specify which class/type pairs are incoming and which are outgoing. See its manual page for additional information.
$ rwcut --fields=sip,sport,dip,dport,proto,type \ --num-rec=8 data.rw sIP|sPort| dIP|dPort|pro| type| 192.168.111.201|29617| 172.24.2.123| 53| 17| out| 172.24.2.123| 53|192.168.111.201|29617| 17| in| 192.168.111.201|29618| 10.252.217.50| 22| 6| out| 10.252.217.50| 22|192.168.111.201|29618| 6| in| 192.168.204.193| 68| 172.30.2.67| 67| 17| out| 172.30.2.67| 67|192.168.204.193| 68| 17| in| 10.239.85.193|29897|192.168.228.153| 25| 6| in| 192.168.228.153| 25| 10.239.85.193|29897| 6| out|
$ export INCOMING_FLOWTYPES=all/in,all/inweb $ export OUTGOING_FLOWTYPES=all/out,all/outweb $ rwcut --plugin=int-ext-fields.so \ --fields=int-ip,int-port,ext-ip,ext-port,proto,type \ --num-rec=8 data.rw int-ip|int-p| ext-ip|ext-p|pro| type| 192.168.111.201|29617| 172.24.2.123| 53| 17| out| 192.168.111.201|29617| 172.24.2.123| 53| 17| in| 192.168.111.201|29618| 10.252.217.50| 22| 6| out| 192.168.111.201|29618| 10.252.217.50| 22| 6| in| 192.168.204.193| 68| 172.30.2.67| 67| 17| out| 192.168.204.193| 68| 172.30.2.67| 67| 17| in| 192.168.228.153| 25| 10.239.85.193|29897| 6| in| 192.168.228.153| 25| 10.239.85.193|29897| 6| out|
This environment variable is used as the value for --ipv6-policy when that switch is not provided.
This environment variable is used as the value for --ip-format when that switch is not provided. Since SiLK 3.11.0.
This environment variable is used as the value for --timestamp-format when that switch is not provided. Since SiLK 3.11.0.
When set to a non-empty string, rwcut automatically invokes this program to display its output a screen at a time. If set to an empty string, rwcut does not automatically page its output.
When set and SILK_PAGER is not set, rwcut automatically invokes this program to display its output a screen at a time.
This environment variable is used by Python to locate modules. When --python-file is specified, rwcut must load the Python files that comprise the PySiLK package, such as silk/__init__.py. If this silk/ directory is located outside Python’s normal search path (for example, in the SiLK installation tree), it may be necessary to set or modify the PYTHONPATH environment variable to include the parent directory of silk/ so that Python can find the PySiLK module.
When set, Python plug-ins will output traceback information on Python errors to the standard error.
This environment variable allows the user to specify the country code mapping file that rwcut uses when computing the scc and dcc fields. The value may be a complete path or a file relative to the SILK_PATH. See the FILES section for standard locations of this file.
This environment variable allows the user to specify the address type mapping file that rwcut uses when computing the sType and dType fields. The value may be a complete path or a file relative to the SILK_PATH. See the FILES section for standard locations of this file.
The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.
This environment variable is used as the value for the --site-config-file when that switch is not provided.
This environment variable specifies the root directory of data repository. As described in the FILES section, rwcut may use this environment variable when searching for the SiLK site configuration file.
This environment variable gives the root of the install tree. When searching for configuration files and plug-ins, rwcut may use this environment variable. See the FILES section for details.
When the argument to the --timestamp-format switch includes local or when a SiLK installation is built to use the local timezone, the value of the TZ environment variable determines the timezone in which rwcut displays timestamps. (If both of those are false, the TZ environment variable is ignored.) If the TZ environment variable is not set, the machine’s default timezone is used. Setting TZ to the empty string or 0 causes timestamps to be displayed in UTC. For system information on the TZ variable, see tzset(3) or environ(7). (To determine if SiLK was built with support for the local timezone, check the Timezone support value in the output of rwcut --version.)
When set to 1, rwcut prints status messages to the standard error as it attempts to find and open each of its plug-ins. In addition, when an attempt to register a field fails, rwcut prints a message specifying the additional function(s) that must be defined to register the field in rwcut. Be aware that the output can be rather verbose.
Possible locations for the address types mapping file required by the sType and dType fields.
Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.
Possible locations for the country code mapping file required by the scc and dcc fields.
Directories that rwcut checks when attempting to load a plug-in.
Fields sTime+msec, eTime+msec, dur+msec, and their aliases (22, 23, 24) were removed in SiLK 3.23.0. Use fields sTime, eTime, and duration instead.
If you are interested in only a few fields, use the --fields option to reduce the volume of data to be produced. For example, if you are checking to see which internal host got hit with the slammer worm (signature: UDP, destPort 1434, pkt size 404), then the following rwfilter, rwcut combination will be much faster than simply using default values:
$ rwfilter --proto-17 --dport=1434 --bytes-per-packet=404-404 \ | rwcut --fields=dip,stime
rwfilter(1), num2dot(1), rwpmapbuild(1), addrtype(3), ccfilter(3), int-ext-fields(3), pmapfilter(3), silk-plugin(3), silkpython(3), pysilk(3), sensor.conf(5), silk(7), yaf(1), dlopen(3), tzset(3), environ(7)
Eliminate duplicate SiLK Flow records
rwdedupe [--ignore-fields=FIELDS] [--packets-delta=NUM] [--bytes-delta=NUM] [--stime-delta=FLOAT] [--duration-delta=FLOAT] [--temp-directory=DIR_PATH] [--buffer-size=SIZE] [--note-add=TEXT] [--note-file-add=FILE] [--compression-method=COMP_METHOD] [--print-filenames] [--output-path=PATH] [--site-config-file=FILENAME] {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwdedupe --help
rwdedupe --help-fields
rwdedupe --version
rwdedupe reads SiLK Flow records from one or more input sources. Records that appear in the input file(s) multiple times will only appear in the output stream once; that is, duplicate records are not written to the output. The SiLK Flows are written to the file specified by the --output-path switch or to the standard output when the --output-path switch is not provided and the standard output is not connected to a terminal.
Note: As part of its processing, rwdedupe re-orders the records before writing them.
rwdedupe reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwdedupe reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.
By default, rwdedupe will consider one record to be a duplicate of another when all the fields in the records match exactly. From another point on view, any difference in two records results in both records appearing in the output. Note that all means every field that exists on a SiLK Flow record. The complete list of fields is specified in the description of --ignore-fields in the OPTIONS section below.
To have rwdedupe ignore fields in the comparison, specify those fields in the --ignore-fields switch. When --ignore-fields=FIELDS is specified, a record is considered a duplicate of another if all fields except those in FIELDS match exactly. rwdedupe will treat FIELDS as being identical across all records. Put another way, if the only difference between two records is in the FIELDS fields, only one of those records will be written to the output.
The --packets-delta, --bytes-delta, --stime-delta and --duration-delta switches allow for ”fuzziness” in the input. For example, if --stime-delta=NUM is specified and the only difference between two records is in the sTime fields, and the fields are within NUM milliseconds of each other, only one record will be written to the output.
As of SiLK 3.23, the --stime-delta and --duration-delta switches accept a floating point number to allow for sub-millisecond differences to reflect the nanosecond resolution in added in that release. The argument is still specified in term of milliseconds: use --stime-delta=5000 for 5 seconds, --stime-delta=5 for 5 milliseconds, and --stime-delta=0.005 for 5 microseconds.
During its processing, rwdedupe will try to allocate a large (near 2GB) in-memory array to hold the records. (You may use the --buffer-size switch to change this maximum buffer size.) If more records are read than will fit into memory, the in-core records are temporarily stored on disk as described by the --temp-directory switch. When all records have been read, the on-disk files are merged to produce the output.
By default, the temporary files are stored in the /tmp directory. Because of the sizes of the temporary files, it is strongly recommended that /tmp not be used as the temporary directory, and rwdedupe will print a warning when /tmp is used. To modify the temporary directory used by rwdedupe, provide the --temp-directory switch, set the SILK_TMPDIR environment variable, or set the TMPDIR environment variable.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
Ignore the fields listed in FIELDS when determining if two flow records are identical; that is, treat FIELDS as being identical across all flows. By default, all fields are treated as significant.
FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Field-names are case-insensitive. Example:
--ignore-fields=stime,12-15
The list of supported fields are:
source IP address
destination IP address
source port for TCP and UDP, or equivalent
destination port for TCP and UDP, or equivalent
IP protocol
packet count
byte count
bit-wise OR of TCP flags over all packets
starting time of flow (microseconds resolution)
duration of flow (microseconds resolution)
name or ID of sensor at the collection point
router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))
router SNMP output interface or postVlanId
router next hop IP
class and type of sensor at the collection point (represented internally by a single value)
TCP flags on first packet in the flow
bit-wise OR of TCP flags over all packets except the first in the flow
flow attributes set by flow generator
guess as to the content of the flow. Some software that generates flow records from packet data, such as yaf(1), will inspect the contents of the packets that make up a flow and use traffic signatures to label the content of the flow. SiLK calls this label the application; yaf refers to it as the appLabel. The application is the port number that is traditionally used for that type of traffic (see the /etc/services file on most UNIX systems). For example, traffic that the flow generator recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard HTTP/web port (80).
Treat the packets field on two records as being the same if the values differ by NUM packets or less. If not specified, the default is 0.
Treat the bytes field on two records as being the same if the values differ by NUM bytes or less. If not specified, the default is 0.
Treat the start-time field on two records as being the same if the values differ by FLOAT milliseconds or less. As of SiLK 3.23, the argument may be floating point number to support sub-millisecond differences. If not specified, the default is 0.
Treat the duration field on two records as being the same if the values differ by FLOAT milliseconds or less. As of SiLK 3.23, the argument may be floating point number to support sub-millisecond differences. If not specified, the default is 0.
Specify the name of the directory in which to store data files temporarily when more records have been read that will fit into RAM. This switch overrides the directory specified in the SILK_TMPDIR environment variable, which overrides the directory specified in the TMPDIR variable, which overrides the default, /tmp.
Set the maximum size of the buffer to use for holding the records, in bytes. A larger buffer means fewer temporary files need to be created, reducing the I/O wait times. The default maximum for this buffer is near 2GB. The SIZE may be given as an ordinary integer, or as a real number followed by a suffix K, M or G, which represents the numerical value multiplied by 1,024 (kilo), 1,048,576 (mega), and 1,073,741,824 (giga), respectively. For example, 1.5K represents 1,536 bytes, or one and one-half kilobytes. (This value does not represent the absolute maximum amount of RAM that rwdedupe will allocate, since additional buffers will be allocated for reading the input and writing the output.)
Write the binary SiLK Flow records to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwdedupe exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwdedupe to exit with an error.
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.
Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.
Do not compress the output using an external library.
Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.
Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.
Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.
Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.
Print to the standard error the names of input files as they are opened.
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwdedupe searches for the site configuration file in the locations specified in the FILES section.
Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwdedupe opens each named file in turn and reads records from it as if the filenames had been listed on the command line.
Print the available options and exit.
Print the description and alias(es) of each field and exit.
Print the version number and information about how SiLK was configured, then exit the application.
When the temporary files and the final output are stored on the same file volume, rwdedupe will require approximately twice as much free disk space as the size of input data.
When the temporary files and the final output are on different volumes, rwdedupe will require between 1 and 1.5 times as much free space on the temporary volume as the size of the input data.
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line.
Suppose you have made several rwfilter(1) runs to find interesting traffic:
$ rwfilter --start-date=2008/02/04 ... --pass=data1.rw $ rwfilter --start-date=2008/02/04 ... --pass=data2.rw $ rwfilter --start-date=2008/02/04 ... --pass=data3.rw $ rwfilter --start-date=2008/02/04 ... --pass=data4.rw
You now want to merge that traffic into a single output file, but you want to ensure that any records appearing in multiple output files are only counted once. You can use rwdedupe to merge the output files to a single file, data.rw:
$ rwdedupe data1.rw data2.rw data3.rw data4.rw --output=data.rw
When set and --temp-directory is not specified, rwdedupe writes the temporary files it creates to this directory. SILK_TMPDIR overrides the value of TMPDIR.
When set and SILK_TMPDIR is not set, rwdedupe writes the temporary files it creates to this directory.
The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.
This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.
This environment variable is used as the value for the --site-config-file when that switch is not provided.
This environment variable specifies the root directory of data repository. As described in the FILES section, rwdedupe may use this environment variable when searching for the SiLK site configuration file.
This environment variable gives the root of the install tree. When searching for configuration files, rwdedupe may use this environment variable. See the FILES section for details.
When set to 1, rwdedupe prints debugging messages to the standard error as it creates, re-opens, and removes temporary files.
Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.
Directory in which to create temporary files.
rwfilter(1), rwfileinfo(1), sensor.conf(5), silk(7), yaf(1), zlib(3)
Print files that rwfilter’s File Selection switches will access
rwfglob { [--class=CLASS] [--type={all | TYPE[,TYPE ...]}] | [--flowtypes=CLASS/TYPE[,CLASS/TYPE ...]] } [--sensors=SENSOR[,SENSOR ...]] [--start-date=YYYY/MM/DD[:HH] [--end-date=YYYY/MM/DD[:HH]]] [--data-rootdir=ROOT_DIRECTORY] [--site-config-file=FILENAME] [--print-missing-files] [--no-block-check] [--no-file-names] [--no-summary]
rwfglob [--data-rootdir=ROOT_DIRECTORY] [--site-config-file=FILENAME] --help
rwfglob --version
rwfglob accepts the same File Selection Switches of rwfilter(1) and prints, to the standard output, the pathnames of the files that rwfilter would process, one file name per line. At the end, a summary is printed to the standard output of the number of files that rwfglob found. To suppress the printing of the file names and/or the summary, specify the --no-file-names and/or --no-summary switches, respectively.
By default, rwfglob only prints the names of files that exist. When the --print-missing-files switch is provided, rwfglob prints, to the standard error, the names of files that it did not find, one file name per line, preceded by the text ’Missing ’. To redirect the output of --print-missing-files to the standard output, use the following in a Bourne-compatible shell:
$ rwfglob --print-missing-files ... 2>&1
As of SiLK 3.20, the Selection Switches --class, --type, --flowtypes, and --sensors accept a value in the form ”@PATH”, where @ is the ”at” character (ASCII 0x40) and PATH names a file or a path to a file. For example, the following reads the name of types from the file t.txt and uses the sensors S3, S7, and the names and/or IDs read from /tmp/sensor.txt:
rwfglob --type=@t.txt --sensors=S3,@/tmp/sensor.txt,S7
Multiple @PATH values are allowed within a single argument. If the name of the file is -, the names are read from the standard input.
The file must be a text file. Blank lines are ignored as are comments, which begin with the # character and continue to the end of the line. Whitespace at the beginning and end of a line is ignored as is whitespace that surrounds commas; all other whitespace within a line is significant.
A file may contain a value on each line and/or multiple values on a line separated by commas and optional whitespace. For example:
# Sensor 4 S4 # The first sensors S0, S1,S2 S3 # Sensor 3
An attempt to use an @PATH directive in a file is an error.
When rwfglob is parsing the name of a file, it converts the sequences @, and @@ to , and @, respectively. For example, --class=@cl@@ss.txt@,v reads the class from the file cl@ss.txt,v. It is an error if any other character follows an embedded @ (--flowtypes=@f@il contains @i) or if a single @ occurs at the end of the name (--sensor=@errat@).
For each file it finds, rwfglob will check the size of the file and the number of blocks allocated to the file. If the block count is zero but the file size is non-zero, rwfglob treats the file as existing but as residing on tape. The names of these files are printed to the standard output, but each name is preceded by the text ’ \t*** ON_TAPE ***’ where ’\t’ represents a tab character. The summary line will include the number of files that rwfglob believes are on tape. To suppress this check and to remove the count from the summary line, use the --no-block-check switch.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
This set of switches are the same as those used by rwfilter to select the files to process. At least one of these switches must be provided.
The --class switch is used to specify a group of files to print. Only a single class may be selected with the --class switch; for multiple classes, use the --flowtypes switch. The argument may be ”@PATH” which causes rwfglob to open the file PATH and read the class name from it; see Read Selection Argument Values from a File for details. Classes are defined in the silk.conf(5) site configuration file. If neither the --class nor --flowtypes option is given, the default-class as specified in silk.conf is used. To see the available classes and the default class, either examine the output from rwfglob --help or invoke rwsiteinfo(1) with the switch --fields=class,default-class.
The --type predicate further specifies data within the selected CLASS by listing the TYPEs of traffic to process. The switch takes either the keyword all to select all types for CLASS or a comma-separated list of type names and ”@PATH” directives, where @PATH tells rwfglob to read type names from the file PATH; see Read Selection Argument Values from a File for details. Types are defined in silk.conf, they typically refer to the direction of the flow, and they may vary by class. When neither the --type nor --flowtypes switch is given, a list of default types is used: The default-type list is determined by the value of CLASS, and the default types often include only incoming traffic. To see the available types and the default types for each class, examine the --help output of rwfglob or run rwsiteinfo with --fields=class,type,default-type.
...]
The --flowtypes predicate provides an alternate way to specify class/type pairs. The --flowtypes switch allows a single rwfglob invocation to print filenames from multiple classes. The keyword all may be used for the CLASS and/or TYPE to select all classes and/or types. As of SiLK 3.20.0, the arguments may also include ”@PATH” which causes rwfglob to open the file PATH and read the class/type pairs from it; see Read Selection Argument Values from a File.
...]
The --sensors switch is used to select data from specific sensors. The parameter is a comma separated list of sensor names, sensor IDs (integers), ranges of sensor IDs, sensor group names, and/or ”@PATH” directives. As described in Read Selection Argument Values from a File, @PATH tells rwfglob to read the names of the sensors from the file PATH. Sensors and sensor groups are defined in the silk.conf(5) site configuration file, and the rwsiteinfo(1) command can be used to print a mapping of sensor names to IDs and classes (--fields=sensor,id-sensor,class:list). When the --sensors switch is not specified, the default is to use all sensors which are valid for the specified class(es). Support for using sensor group names was added in SiLK 3.21.0.
The date predicates indicate which days and hours to consider when creating the list of files. The dates may be expressed as seconds since the UNIX epoch or in YYYY/MM/DD[:HH] format, where the hour is optional. A T may be used in place of the : to separate the day and hour. Whether the YYYY/MM/DD[:HH] strings represent times in UTC or the local timezone depend on how SiLK was compiled. To determine how your version of SiLK was compiled, see the Timezone support setting in the output from rwfglob --version.
When times are expressed in YYYY/MM/DD[:HH] format:
When both --start-date and --end-date are specified to hour precision, all hours within that time range are processed.
When --start-date is specified to day precision, the hour specified in --end-date (if any) is ignored, and files for all dates between midnight on start-date and 23:59 on end-date are processed.
When --start-date is specified to hour precision and --end-date is specified to day precision, the hour of the start-date is used as the hour for the end-date.
When --end-date is not specified and --start-date is specified to day precision, files for that complete day are processed.
When --end-date is not specified and --start-date is specified to hour precision, files for that single hour are processed.
When at least one time is expressed as seconds since the UNIX epoch:
When --end-date is specified in epoch seconds, the given --start-date and --end-date are considered to be in hour precision.
When --start-date is specified in epoch seconds and --end-date is specified in YYYY/MM/DD[:HH] format, the start-date is considered to be in day precision if it divisible by 86400, and hour precision otherwise.
When --start-date is specified in epoch seconds and --end-date is not given, the start-date is considered to be in hour-precision.
When neither --start-date nor --end-date is given, rwfglob prints all files for the current day.
It is an error to specify --end-date without specifying --start-date.
Tell rwfglob to use ROOT_DIRECTORY as the root of the data repository, which overrides the location given in the SILK_DATA_ROOTDIR environment variable, which in turn overrides the location that was compiled into rwfglob (/data).
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwfglob searches for the site configuration file in the locations specified in the FILES section.
This option prints to the standard error the names of the files that rwfglob expected to find but did not. The file names are preceded by the text ’Missing ’; each file name appears on a separate line. This switch is useful for debugging, but the list of files it produces can be misleading. For example, suppose there is a decommissioned sensor that still appears in the silk.conf file; rwfglob considers these data files as missing even though their absence is expected. Use the output from this switch judiciously.
This option instructs rwfglob not to check whether the file exists on tape by checking whether the number of blocks allocated to the file is zero. By default, rwfglob precedes a file name that has a block count of 0 with the text ’ \t*** ON_TAPE ***’.
This option instructs rwfglob not to print the names of the files that it successfully finds. By default, rwfglob prints the names of the files it finds and a summary line showing the number of files it found. When both this switch and --print-missing-files are specified, rwfglob prints only the names of missing files (and the summary).
This option instructs rwfglob not to print the summary line (that is, the line that shows the number of files found). By default, rwfglob prints the names of the files it finds and a summary line showing the number of files it found.
Print the available options and exit. The available classes and types will be included in output; you may specify a different root directory or site configuration file before --help to see the classes and types available for that site.
Print the version number and information about how SiLK was configured, then exit the application.
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line.
Looking at a day on a single sensor:
$ rwfglob --start=2003/10/11 --sensor=2 /data/in/2003/10/11/in-GAMMA_20031011.23 /data/in/2003/10/11/in-GAMMA_20031011.22 /data/in/2003/10/11/in-GAMMA_20031011.21 /data/in/2003/10/11/in-GAMMA_20031011.20 /data/in/2003/10/11/in-GAMMA_20031011.19 /data/in/2003/10/11/in-GAMMA_20031011.18 /data/in/2003/10/11/in-GAMMA_20031011.17 /data/in/2003/10/11/in-GAMMA_20031011.16 /data/in/2003/10/11/in-GAMMA_20031011.15 /data/in/2003/10/11/in-GAMMA_20031011.14 /data/in/2003/10/11/in-GAMMA_20031011.13 /data/in/2003/10/11/in-GAMMA_20031011.12 /data/in/2003/10/11/in-GAMMA_20031011.11 /data/in/2003/10/11/in-GAMMA_20031011.10 /data/in/2003/10/11/in-GAMMA_20031011.09 /data/in/2003/10/11/in-GAMMA_20031011.08 /data/in/2003/10/11/in-GAMMA_20031011.07 /data/in/2003/10/11/in-GAMMA_20031011.06 /data/in/2003/10/11/in-GAMMA_20031011.05 /data/in/2003/10/11/in-GAMMA_20031011.04 /data/in/2003/10/11/in-GAMMA_20031011.03 /data/in/2003/10/11/in-GAMMA_20031011.02 /data/in/2003/10/11/in-GAMMA_20031011.01 /data/in/2003/10/11/in-GAMMA_20031011.00 globbed 24 files; 0 on tape
If you only want the summary, specify --no-file-names
$ rwfglob --start-date=2003/10/11 --sensor=2 --no-file-names globbed 24 files; 0 on tape
This environment variable is used as the value for the --site-config-file when that switch is not provided.
This environment variable specifies the root directory of data repository. This value overrides the compiled-in value, and rwfglob uses it unless the --data-rootdir switch is specified. In addition, rwfglob may use this value when searching for the SiLK site configuration file. See the FILES section for details.
This environment variable gives the root of the install tree. When searching for configuration files, rwfglob may use this environment variable. See the FILES section for details.
When a SiLK installation is built to use the local timezone (to determine if this is the case, check the Timezone support value in the output from rwfglob --version), the value of the TZ environment variable determines the timezone in which rwfglob parses timestamps. (The date on the filenames that rwfglob returns are always in UTC.) If the TZ environment variable is not set, the default timezone is used. Setting TZ to 0 or the empty string causes timestamps to be parsed as UTC. The value of the TZ environment variable is ignored when the SiLK installation uses utc. For system information on the TZ variable, see tzset(3) or environ(7).
Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided, where ROOT_DIRECTORY/ is the directory rwfglob is using as the root of the data repository.
Locations for the root directory of the data repository when the --data-rootdir switch is not specified.
rwfilter(1), rwsiteinfo(1), silk.conf(5), silk(7), tzset(3), environ(7)
The ability to use @PATH in --class, --type, --flowtypes, and --sensors was added in SiLK 3.20.0.
As of SiLK 3.20.0, --types is an alias for --type.
The --sensors switch also accepts the names of groups defined in the silk.conf(5) file as of SiLK 3.21.0.
The output of --print-missing-files goes to the standard error, while all other output goes to the standard output. To redirect the output of --print-missing-files to the standard output, use the following in a Bourne-compatible shell:
$ rwfglob --print-missing-files ... 2>&1
The --print-missing-files option needs to be smarter about what files are really missing.
The block count check is of unknown portability across different tape-farm systems.
Print information about a SiLK file
rwfileinfo [--fields=FIELDS] [--summary] [--no-titles] [--site-config-file=FILENAME] {--xargs | --xargs=FILENAME | FILE [FILE...]}
rwfileinfo --help
rwfileinfo --help-fields
rwfileinfo --version
rwfileinfo prints information about a binary SiLK file that can be determined by reading the file’s header and by moving quickly over the data blocks in the file.
rwfileinfo requires one or more filename arguments to be given on the command line or the use of the --xargs switch. When the --xargs switch is provided, rwfileinfo reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line. rwfileinfo does not read a SiLK file’s content from the standard input by default, but it does when either - or stdin is given as a filename argument.
When the --summary switch is given, rwfileinfo first prints the information for each individual file and then prints the number of files processed, the sum of the individual file sizes, and the sum of the individual record counts.
By default, rwfileinfo prints the following information for each file argument. Use the --fields switch to modify which pieces of information are printed.
(rwfileinfo prints each field in the order in which support for that field was added to SiLK. The field descriptions are presented here in a more logical order.)
The size of the file on disk as reported by the operating system. rwfileinfo prints 0 for the file-size when reading from the standard input.
Every binary file written by SiLK has a version number field. Since SiLK 1.0.0, the version number field has been used to indicate the general structure (or layout) of the file. The file structure adopted in SiLK 1.0.0 uses a version number of 16 and has a header section and a data section. The header section begins with 16 bytes that specify well-defined values, and those bytes are followed by one or more variably-sized header entries. The specifics of the data section depend on the content of the file.
The header-length field shows the number of octets required by header (i.e., the initial 16 bytes and the header entries). Since everything after the header is data, the header-length is the starting offset of the data section. The smallest header length is 24 bytes, but typically the header is padded to be an integer multiple of the record-length. The header-length that rwfileinfo prints for a file is determined dynamically by reading the file’s header.
When a SiLK tool creates a binary file, the tool writes the current SiLK release number (such as 3.9.0) into the file’s header as a way to help diagnose issues should a bug with a particular release of SiLK be discovered in the future.
Every SiLK file has a byte-order or endian field. SiLK uses the machine’s native representation of integers when writing data, and this field shows what representation the file contains. BigEndian is network byte order and littleEndian is used by Intel chips. The rwswapbytes(1) tool changes a file’s integer representation, and some tools have a --byte-order switch that allows the user to specify the integer representation of output files. The header-section of a file is always written in network byte order.
SiLK tools may use the zlib library ( http://zlib.net/), the LZO library (http://www.oberhumer.com/opensource/lzo/), or the snappy library (http://google.github.io/snappy/) to compress the data section of a file. The compression field specifies which library (if any) was used to compress the data section. If a file is compressed with a library that was not included in an installation of SiLK, SiLK is unable to read the data section of the file. Many SiLK tools accept the --compression-method switch to choose a particular compression method. (The compression field does not indicate whether the entire file has been compressed with an external compression utility such as gzip(1).)
Every binary file written by SiLK has two fields in the header that specify exactly what the file contains: the format and the record-version. In general, the format indicates the content type of the file and the record-version indicates the evolution of that content.
The contents of a file whose format is FT_IPSET, FT_RWBAG, or FT_PREFIXMAP is fairly obvious (an IPset, a Bag, a prefix map).
There are many different file formats for writing SiLK Flow records, but the SiLK analysis tools largely use a single Flow file format. That format is FT_RWIPV6ROUTING if SiLK has been compiled with IPv6 support, or FT_RWGENERIC otherwise. A file that uses the FT_RWGENERIC format is only capable of holding IPv4 addresses.
The other SiLK Flow file formats are created by rwflowpack(8) as it writes flow records to the repository. These formats often omit fields and use reduced bit-sizes for fields to reduce the space required for an individual flow record.
The record-version field indicates changes within the general type specified by the format field. For example, SiLK incremented the record-version of the formats that hold flow records when the resolution of record timestamps changed from seconds to milliseconds and again from milliseconds to nanoseconds.
Together with the format fields specifies the contents of the file. See the discussion of format for details.
Files created by SiLK 1.0.0 and later have a record length field. This field contains the length of an individual record, and this value is dependent on the format and record-version fields described above. Some files (such as those containing IPsets or prefix maps) do not write individual records to the output, and the record length is 1 for these files.
The count-records field is generated dynamically by determining the length the data section would require if it were completely uncompressed and dividing it by the record-length. When the record-length is 1 (such as for IPset files), the count-records field does not provide much information beyond the length of the uncompressed data. For an uncompressed file, adding header-length to the product of count-records and record-length is equal to the file-size.
The fields given above are either present in the well-defined header or are computed by reading the file.
The following fields are generated by reading the header entries and determining if one or more header entries of the specified type are present. The field is not printed in the output when the header entry is not present in the file.
Many of the SiLK tools write a header entry to the output file that contains the command line invocation used to create that file, and some of the SiLK tools also copy the command line history from their input files to the output file. (The --invocation-strip switch on the tools can be used to prevent copying and recording of the invocation.) The command lines are stored in individual header entries and this field displays those entries with the most recent invocation at the end of the list.
The command line history is has a couple of issues:
When multiple input files are used to create a single output, the entries are stored as a list, and this makes it is difficult to know which set of command line entries are associated with which input file.
When a SiLK tool creates multiple output files (e.g., when using both --pass and --fail to rwfilter(1)), the tool writes the same command line entry to each output file. Some context in addition to the command line history may be needed to know which branch of that tool a particular file represents.
Most of SiLK tools that create binary output files provide the --note-add and --note-file-add switches which allow an arbitrary annotation to be added to the header of a file. Some tools also copy the annotations from the source files to the destination files. The annotations are stored in individual header entries and this field displays those entries.
The IPset writing tools (rwset(1), rwsetbuild(1), rwsettool(1), rwaggbagtool(1), and rwbagtool(1)) support the following output formats for IPset data structures:
May hold only IPv4 addresses and does not have an ipset header entry.
May hold IPv4 or IPv6 addresses and is readable by SiLK 3.0 and later. It contains a header entry that describes the IPset data structure, and the entry specifies the number of nodes, the number of branches from each node, the number of leaves, the size of the nodes and leaves, and which node is the root of the tree.
May hold IPv4 or IPv6 addresses and is readable by SiLK 3.7 and later. The file’s header entry specifies whether the file contains IPv4 addresses or IPv6 addresses.
May hold only IPv6 addresses and is readable by SiLK 3.14 and later. The header entry specifies that the file contains IPv6 data.
Since SiLK 3.0.0, the tools that write binary Bag files (rwbag(1), rwbagbuild(1), and rwbagtool(1)) have written a header entry that specifies the type and size of the key and of the counter in the file.
The tools rwaggbag(1), rwaggbagbuild(1), and rwaggbagtool(1) write a header entry that contains the field types that comprise the key and the counter.
When using rwpmapbuild(1) to create a prefix map file, a string that specifies a mapname may be provided. rwpmapbuild writes the mapname to a header entry in the prefix map file. The mapname is used to generate command line switches or field names when the --pmap-file switch is specified to several of the SiLK tools (see pmapfilter(3) for details). When displaying the mapname, rwfileinfo prefixes it with the string v1: which denotes a version number for the prefix-map header entry. (The version number is printed for completeness.)
When rwflowpack(8) creates a SiLK Flow file for the repository, all the records in the file have the same starting hour, the same sensor, and the same flowtype (class/type pair). rwflowpack writes a header entry to the file that contains these values, and this field displays those values. (To print the names for the sensor and flowtype, the silk.conf(5) file must be accessible.)
When flowcap(8) creates a SiLK flow file, it adds a header entry specifying the name of the probe from which the data was collected.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
Specify what information to print for each file argument on the command line. FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Field-names are case-insensitive and may be shortened to a unique prefix. When the --fields option is not given, all fields are printed if the file contains the necessary information. The fields are always printed in the order they appear here regardless of the order they are specified in FIELDS.
The possible field values are given next with a brief description of each. For a full description of each field, see Field Descriptions above.
The contents of the file as a name and the corresponding hexadecimal ID.
An integer describing the layout or structure of the file.
Either BigEndian or littleEndian to indicate the representation used to store integers in the file (network or non-network byte order).
The compression library (if any) used to compress the data-section of the file, specified as a name and its decimal ID.
The octet length of the file’s header; alternatively the offset where data begins.
The octet length of a single record or the value 1 if the file’s content is not record-based.
The number of records in the file, computed by dividing the uncompressed data length by the record-length.
The size of the file on disk as reported by the operating system.
The command line invocation used to generate this file.
The version of the records contained in the file.
The release of SiLK that wrote this file.
For a repository Flow file generated by rwflowpack(8), this prints the timestamp of the starting hour, the flowtype, and the sensor of each flow record in the file.
For a Flow file generated by flowcap(8), the name of the probe where the flow records where initially collected.
The notes (annotations) that users have added to the file’s header.
For a prefix map file, the mapname that was set when the file was created by rwpmapbuild(1).
For an IPset file whose record-version is 3, a description of the tree data structure. For an IPset file whose record-version is 4, the type of IP addresses (IPv4 or IPv6).
For a bag file, the type and size of the key and of the counter.
For an aggregate bag file, the field types that comprise the key and the counter.
After the data for each individual file is printed, print a summary that shows the number of files processed, the sum of the individual file sizes, and the total number of records contained in those files.
Suppress printing of the file name and field names. The output contains only the values, where each value is printed left-justified on a single line.
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwfileinfo searches for the site configuration file in the locations specified in the FILES section.
Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwfileinfo opens each named file in turn and prints its information as if the filenames had been listed on the command line. Since SiLK 3.15.0.
Print the available options and exit.
Print a description of each field, its alias, and exit.
Print the version number and information about how SiLK was configured, then exit the application.
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line.
Get information about the file tcp-data.rw:
$ rwfileinfo tcp-data.rw tcp-data.rw: format(id) FT_RWGENERIC(0x16) version 16 byte-order littleEndian compression(id) none(0) header-length 208 record-length 52 record-version 5 silk-version 1.0.1 count-records 7 file-size 572 command-lines 1 rwfilter --proto=6 --pass=tcp-data.rw ... annotations 1 This is some interesting TCP data
Return a single value which is the number of records in the file tcp-data.rw:
$ rwfileinfo --no-titles --field=count-records tcp-data.rw 7
This environment variable is used as the value for the --site-config-file when that switch is not provided.
This environment variable specifies the root directory of data repository. As described in the FILES section, rwfileinfo may use this environment variable when searching for the SiLK site configuration file.
This environment variable gives the root of the install tree. When searching for configuration files, rwfileinfo may use this environment variable. See the FILES section for details.
Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.
rwfilter(1), rwaggbag(1), rwaggbagbuild(1), rwaggbagtool(1), rwbag(1), rwbagbuild(1), rwbagtool(1), rwpmapbuild(1), rwset(1), rwsetbuild(1), rwsettool(1) rwswapbytes(1), silk.conf(5), pmapfilter(3), flowcap(8), rwflowpack(8), silk(7), gzip(1)
Choose which SiLK Flow records to process
rwfilter INPUT_ARGS OUTPUT_ARGS PARTITIONING_ARGS [MISC_ARGS]
Selection switches, input switches, or input files are required:
rwfilter ... {{ [--class=CLASS] [--type={all | TYPE[,TYPE ...]}] | [--flowtypes=CLASS/TYPE[,CLASS/TYPE ...]] } [--sensors=SENSOR[,SENSOR ...]] [--start-date=YYYY/MM/DD[:HH] [--end-date=YYYY/MM/DD[:HH]]] [--data-rootdir=ROOT_DIRECTORY] [--print-missing-files] } | [--input-pipe=INPUT_PATH] | [--xargs] | [--xargs=INPUT_PATH] | [INPUT_PATH [INPUT_PATH...]]
One or more output switches are required:
rwfilter ... [--all-destination=ALL_PATH [--all-destination=ALL_PATH ...]] [--fail-destination=FAIL_PATH [--fail-destination=FAIL_PATH ...]] [--pass-destination=PASS_PATH [--pass-destination=PASS_PATH ...]] [{ --print-statistics[=STATS_PATH] | --print-volume-statistics[=STATS_PATH] }]
One or more partitioning switches are often used:
rwfilter ... [--ack-flag=SCALAR] [--active-time=TIME_WINDOW] [{--any-address=IP_WILDCARD | --not-any-address=IP_WILDCARD}] [--any-cc=COUNTRY_CODE_LIST] [{--any-cidr=IP_OR_CIDR_LIST | --not-any-cidr=IP_OR_CIDR_LIST}] [--any-index=INTEGER_LIST] [{--anyset=IP_SET_FILENAME | --not-anyset=IP_SET_FILENAME}] [--aport=INTEGER_LIST] [--application=INTEGER_LIST] [--attributes=ATTRIBUTES_LIST] [--bytes=INTEGER_RANGE] [--bytes-per-packet=DECIMAL_RANGE] [--cwr-flag=SCALAR] [{--daddress=IP_WILDCARD | --not-daddress=IP_WILDCARD}] [--dcc=COUNTRY_CODE_LIST] [{--dcidr=IP_OR_CIDR_LIST | --not-dcidr=IP_OR_CIDR_LIST}] [{--dipset=IP_SET_FILENAME | --not-dipset=IP_SET_FILENAME}] [--dport=INTEGER_LIST] [--dtype=SCALAR] [--duration=DECIMAL_RANGE] [--ece-flag=SCALAR] [--etime=TIME_WINDOW] [--fin-flag=SCALAR] [--flags-all=HIGH_MASK_FLAGS_LIST] [--flags-initial=HIGH_MASK_FLAGS_LIST] [--flags-session=HIGH_MASK_FLAGS_LIST] [--icmp-code=INTEGER_LIST] [--icmp-type=INTEGER_LIST] [--input-index=INTEGER_LIST] [--ip-version=INTEGER_LIST] [--ipa-src-expr=IPA_EXPR] [--ipa-dst-expr=IPA_EXPR] [--ipa-any-expr=IPA_EXPR] [{--next-hop-id=IP_WILDCARD | --not-next-hop-id=IP_WILDCARD}] [{--nhcidr=IP_OR_CIDR_LIST | --not-nhcidr=IP_OR_CIDR_LIST}] [{--nhipset=IP_SET_FILENAME | --not-nhipset=IP_SET_FILENAME}] [--output-index=INTEGER_LIST] [--packets=INTEGER_RANGE] [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...] { [--pmap-src-MAPNAME=LABELS] [--pmap-dst-MAPNAME=LABELS] [--pmap-any-MAPNAME=LABELS] } ] [--protocol=INTEGER_LIST] [--psh-flag=SCALAR] [--python-expr=PYTHON_EXPR] [--python-file=FILENAME [--python-file=FILENAME ...]] [--rst-flag=SCALAR] [{--saddress=IP_WILDCARD | --not-saddress=IP_WILDCARD}] [--scc=COUNTRY_CODE_LIST] [{--scidr=IP_OR_CIDR_LIST | --not-scidr=IP_OR_CIDR_LIST}] [{--sipset=IP_SET_FILENAME | --not-sipset=IP_SET_FILENAME}] [--sport=INTEGER_LIST] [--stime=TIME_WINDOW] [--stype=SCALAR] [--syn-flag=SCALAR] [--tcp-flags=TCP_FLAGS] [--tuple-file=TUPLE_FILENAME { [--tuple-fields=FIELDS] [--tuple-direction=DIRECTION] [--tuple-delimiter=CHAR] } ] [--urg-flag=SCALAR]
Miscellaneous switches:
rwfilter ... [--compression-method=COMP_METHOD] [--dry-run] [--max-fail-records=N] [--max-pass-records=N] [--note-add=TEXT] [--note-file-add=FILE] [--plugin=PLUGIN [--plugin=PLUGIN ...]] [--print-filenames] [--site-config-file=FILENAME] [--threads=N]
Help switches:
rwfilter [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]] [--plugin=PLUGIN ...] [--python-file=PATH] [--data-rootdir=ROOT_DIRECTORY] [--site-config-file=FILENAME] --help
rwfilter --version
rwfilter serves two purposes: (1) It acts as an interface to the data store to select which SiLK Flow records to process, and (2) it partitions those records into one or more pass and/or fail streams. Most invocations of rwfilter will both select and partition records but both actions are not required.
The Selection Switches let one choose flow records from the SiLK data store by specifying where the flow was collected (its sensor), the date of collection, and/or the flow’s direction. The act of selecting records from the data store is sometimes called a ”data pull”. If the --all-destination switch is given, all these selected records are written to the named stream (a file or the standard output), and partitioning is optional.
The Partitioning Switches describe various types of traffic behavior (e.g., TCP traffic, or all traffic going to port 80). When a flow record matches all of the behaviors, it is written to the streams specified by the --pass-destination switches. If a record fails to match any of these behavior predicates, it is written to the streams specified by --fail-destination.
The all, pass, and fail output streams from rwfilter are always binary SiLK Flow records. The output must be either written to a file or piped into another tool in the SiLK Suite, and rwfilter complains if it determines you are attempting to send the stream to a terminal. To view the records, pipe the records into rwcut(1).
In addition to the partitioning switches built in to rwfilter, additional partitioning predicates can be created as C or PySiLK plug-ins, and these can be loaded into rwfilter using the --plugin and/or --python-file switches as described below.
Instead of using the selection switches to choose flow records from the data store, rwfilter can apply the partitioning switches to existing files of SiLK flow records---such as files generated by a previous invocation of rwfilter. To run rwfilter in this mode, you may
specify, on the command line, the files and/or named pipes from which rwfilter should read SiLK Flow records. Specifying stdin or - or the command line causes rwfilter to read flow records from the standard input.
use the --input-pipe switch to specify a named pipe, or specify stdin or - as the argument to this switch to have rwfilter read flow records from the standard input.
use the --xargs switch to specify a file that contains the names of the input files to process. When --xargs is used without an argument, rwfilter attempts to read the names of the file from the standard input. The name of each input file must appear on a single line.
When rwfilter is reading flow records from input files, some of the selection switches act as partitioning switches. The remaining selection switches may not be specified when using the alternate forms of input, and it is an error to specify multiple types of input.
Unlike many other tools in the SiLK tool suite, rwfilter requires that you specify one or more Output Switches that tell rwfilter what types of output to produce.
Finally, there are Miscellaneous Switches that control other aspects of rwfilter.
As of SiLK 3.20, the Selection Switches --class, --type, --flowtypes, and --sensors accept a value in the form ”@PATH”, where @ is the ”at” character (ASCII 0x40) and PATH names a file or a path to a file. For example, the following reads the name of types from the file t.txt and uses the sensors S3, S7, and the names and/or IDs read from /tmp/sensor.txt:
rwfilter --type=@t.txt --sensors=S3,@/tmp/sensor.txt,S7 ...
Multiple @PATH values are allowed within a single argument. If the name of the file is -, the names are read from the standard input.
The file must be a text file. Blank lines are ignored as are comments, which begin with the # character and continue to the end of the line. Whitespace at the beginning and end of a line is ignored as is whitespace that surrounds commas; all other whitespace within a line is significant.
A file may contain a value on each line and/or multiple values on a line separated by commas and optional whitespace. For example:
# Sensor 4 S4 # The first sensors S0, S1,S2 S3 # Sensor 3
An attempt to use an @PATH directive in a file is an error.
When rwfilter is parsing the name of a file, it converts the sequences @, and @@ to , and @, respectively. For example, --class=@cl@@ss.txt@,v reads the class from the file cl@ss.txt,v. It is an error if any other character follows an embedded @ (--flowtypes=@f@il contains @i) or if a single @ occurs at the end of the name (--sensor=@errat@).
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
To read files from the data store, use the following options to specify which files to process. When rwfilter gets its input from files listed on the command line or from the --xargs or --input-pipe switches, the first four switches (--class, --type, --flowtypes, and --sensors) act as partitioning switches, and specifying any other selection switch produces an error.
The --class switch is used to specify a group of data files to process. Only a single class may be selected with the --class switch; for multiple classes, use the --flowtypes switch. The argument may be ”@PATH” which causes rwfilter to open the file PATH and read the class name from it; see Read Selection Argument Values from a File for details. Classes are defined in the silk.conf(5) site configuration file. If neither the --class nor --flowtypes option is given, the default-class as specified in silk.conf is used. To see the available classes and the default class, either examine the output from rwfilter --help or invoke rwsiteinfo(1) with the switch --fields=class,default-class.
The --type predicate further specifies data within the selected CLASS by listing the TYPEs of traffic to process. The switch takes either the keyword all to select all types for CLASS or a comma-separated list of types names and ”@PATH” directives, where @PATH tells rwfilter to read type names from the file PATH; see Read Selection Argument Values from a File for details. Types are defined in silk.conf, they typically refer to the direction of the flow, and they may vary by class. When neither the --type nor --flowtypes switch is given, a list of default types is used: The default-type list is determined by the value of CLASS, and the default types often include only incoming traffic. To see the available types and the default types for each class, examine the --help output of rwfilter or run rwsiteinfo with --fields=class,type,default-type.
...]
The --flowtypes predicate provides an alternate way to specify class/type pairs. The --flowtypes switch allows a single rwfilter invocation to process data from multiple classes. The keyword all may be used for the CLASS and/or TYPE to select all classes and/or types. As of SiLK 3.20.0, the arguments may also include ”@PATH” which causes rwfilter to open the file PATH and read the class/type pairs from it; see Read Selection Argument Values from a File.
...]
The --sensors switch is used to select data from specific sensors. The parameter is a comma separated list of sensor names, sensor IDs (integers), ranges of sensor IDs, sensor group names, and/or ”@PATH” directives. As described in Read Selection Argument Values from a File, @PATH tells rwfilter to read the names of the sensors from the file PATH. Sensors and sensor groups are defined in the silk.conf(5) site configuration file, and the rwsiteinfo(1) command can be used to print a mapping of sensor names to IDs and classes (--fields=sensor,id-sensor,class:list)). When the --sensors switch is not specified, the default is to use all sensors which are valid for the specified class(es).
The date predicates indicate which days and hours to consider when creating the list of files. The dates may be expressed as seconds since the UNIX epoch or in YYYY/MM/DD[:HH] format, where the hour is optional. A T may be used in place of the : to separate the day and hour. Whether the YYYY/MM/DD[:HH] strings represent times in UTC or the local timezone depend on how SiLK was compiled. To determine how your version of SiLK was compiled, see the Timezone support setting in the output from rwfilter --version.
When times are expressed in YYYY/MM/DD[:HH] format:
When both --start-date and --end-date are specified to hour precision, all hours within that time range are processed.
When --start-date is specified to day precision, the hour specified in --end-date (if any) is ignored, and files for all dates between midnight on start-date and 23:59 on end-date are processed.
When --start-date is specified to hour precision and --end-date is specified to day precision, the hour of the start-date is used as the hour for the end-date.
When --end-date is not specified and --start-date is specified to day precision, files for that complete day are processed.
When --end-date is not specified and --start-date is specified to hour precision, files for that single hour are processed.
When at least one time is expressed as seconds since the UNIX epoch:
When --end-date is specified in epoch seconds, the given --start-date and --end-date are considered to be in hour precision.
When --start-date is specified in epoch seconds and --end-date is specified in YYYY/MM/DD[:HH] format, the start-date is considered to be in day precision if it divisible by 86400, and hour precision otherwise.
When --start-date is specified in epoch seconds and --end-date is not given, the start-date is considered to be in hour-precision.
When neither --start-date nor --end-date is given, rwfilter processes all files for the current day.
It is an error to specify --end-date without specifying --start-date.
It is an error to specify --start-date when rwfilter believes there is some other input specified (see Non-Selection Input Switches).
Tell rwfilter to use ROOT_DIRECTORY as the root of the data repository, which overrides the location given in the SILK_DATA_ROOTDIR environment variable, which in turn overrides the location that was compiled into rwfilter (/data). It is an error to specify this switch when files are specified on the command line or Non-Selection Input Switches are given.
This option prints to the standard error the names of the files that rwfilter’s file selection switches expected to find but did not. The file names are preceded by the text ’Missing ’; each file name appears on a separate line. This switch is useful for debugging, but the list of files it produces can be misleading. For example, suppose there is a decommissioned sensor that still appears in the silk.conf file; rwfilter considers these data files as missing even though their absence is expected. Use the output from this switch judiciously. It is an error to specify this switch when files are specified on the command line or Non-Selection Input Switches are given.
Instead of using the Selection Switches to read flow records from files in the data store, you can tell rwfilter to process files named on the command line or use one (and only one) of the following switches. To have rwfilter read flow records from the standard input, specify stdin or - as the name of an input file or use the (deprecated) --input-pipe switch.
Read the names of the input files from INPUT_PATH or from the standard input if INPUT_PATH is not provided. The input is expected to have one filename per line. rwfilter opens each named file in turn and reads records from it as if the filenames had been listed on the command line.
Specify a source for SiLK Flow records, where INPUT_PATH is a named pipe or the string stdin or - to represent the standard input. You do not need to use this switch, you can simply specify the named pipe or the strings stdin or - on the command line. NOTE: This switch is deprecated, and it will be removed in the SiLK 4.0 release.
At least one of the following output switches must be provided:
Write every SiLK Flow record to ALL_PATH, where ALL_PATH refers to a file, a named pipe, the string stderr to refer to the standard error, or the strings stdout or - to refer to the standard output. This switch may be repeated to write all input records to multiple locations. It is not necessary to specify Partitioning Switches when --all-destination is given and --fail-destination and --pass-destination are not.
Write SiLK Flow records that have failed ANY of the partitioning predicates to FAIL_PATH, where FAIL_PATH refers to a non-existent file, a named pipe, the string stderr to refer to the standard error, or the strings stdout or - to refer to the standard output. This switch may be repeated to write records that fail any predicate to multiple locations. When using --fail-destination, partitioning switches are required.
Write SiLK Flow records that have passed ALL of the partitioning predicates to PASS_PATH, where PASS_PATH refers to a non-existent file, a named pipe, the string stderr to refer to the standard error, or the strings stdout or - to refer to the standard output. This switch may be repeated to write records that pass every predicate to multiple locations. When using --pass-destination, partitioning switches are required.
Print a one line summary specifying the number of files processed, the total number of records read, the number of records that passed all partitioning predicates, and the number of records that failed. If STATS_PATH is provided, the summary is printed there; otherwise it is printed to the standard error. This switch cannot be mixed with --print-volume-statistics. When running rwfilter with multiple threads and --max-pass-records or --max-fail-records is specified, the statistics may not match the number of records written by rwfilter. When using this switch, either partitioning switches or --all-destination is required.
Print a four line summary of rwfilter’s processing. For each of all records, records that pass all the partitioning predicates, and records that fail, print the number of flow records and the number of packets and bytes represented by those flow records. The output also includes the number of files processed. If STATS_PATH is provided, the summary is printed there; otherwise it is printed to the standard error. This switch cannot be mixed with --print-statistics. When running rwfilter with multiple threads and --max-pass-records or --max-fail-records is specified, the statistics may not match the number of records written by rwfilter. When using this switch, either partitioning switches or --all-destination is required.
rwfilter supports the following partitioning switches, at least one of which must be specified (unless the only Output Switch is --all-destination). The switches are AND’ed together; i.e., to pass the filter, the record must pass the test implied by each switch. Any record that does not pass is written to the fail-destination(s), if specified.
Each partitioning switch defines a test. These tests can be grouped into several broad categories; within each category, the tests are applied in the order in which the switches appear on the command line. The categories of the partitioning tests are:
tests for IP addresses (including the IPset checks), ports, protocol, times, TCP flags, byte and packet counts, IP version, application, country codes
tests based on the --tuple-file switch
tests that use the address type or prefix map mapping files
tests that use the IP-Association plug-in
tests based on the --python-expr and --python-file switches
tests defined in C-plugins and loaded via --plugin
Partitioning Switches for IP Addresses
There are three families of switches that partition based on an IP address. Each family can partition by the source IP, the destination IP, the next hop IP, or either source or destination IP. Each family includes a --not-* variant to reverse the sense of the test.
The --*cidr-family takes as its argument an IP_OR_CIDR_LIST, which is a one or more of the following separated by commas: an IPv4 address (10.1.2.3), an IPv6 address (2001:db8::10.1.2.3), an unsigned 32-bit integer representing an IPv4 address (167838211), or any of those with a CIDR block designation (192.168.0.0/16, 2001:db8::/32, 167772160/8).
The --*set-family requires that you store the IPs in a binary IPset file and pass the name of the file to the switch. IPset files are created from SiLK Flow records with rwset(1), or from textual input with rwsetbuild(1).
The --*address-family (which includes --next-hop-id) takes as its argument a single IP address, a single CIDR block, or a single SiLK IP Wildcard. A SiLK IP Wildcard may represent multiple, disjointed IPv4 or IPv6 addresses. An IP Wildcard contains an IP in its canonical form, except each part of the IP (where part is an octet for IPv4 or a hexadectet for IPv6) may be a single value, a range, a comma separated list of values and ranges, or the letter x to signify any value for that part of the IP (that is, 0-255 for IPv4). You may not specify a CIDR suffix when using the IP Wildcard notation. The following IP_WILDCARDs all represent the same value:
::ffff:0:0/112 ::ffff:0:x ::ffff:0:aaab-ffff,aaaa,0-aaa9 ::ffff:0.0.0.0/112 ::ffff:0.0.128-254,0-126,255,127.x
The next hop address often has a value of 0.0.0.0 since the default configuration of SiLK does not store the next hop address in the data repository.
With one restriction, any combination of IP partitioning switches is allowed in a single rwfilter invocation: A positive and negative version of the same switch (e.g., --sipset and --not-sipset) is not allowed. (--sipset and --not-scidr may be used together, as can --sipset and --not-dipset.)
The address-partitioning switches are:
Pass the record if its source IP address matches a value in IP_OR_CIDR_LIST, a comma separated list of IPs and/or CIDR blocks. See also --saddress and --sipset.
Pass the record if its destination IP address matches a value in IP_OR_CIDR_LIST. See also --daddress and --dipset.
Pass the record if either its source or its destination IP address matches a value in IP_OR_CIDR_LIST. This switch does not consider the next hop IP address. See also --any-address and --anyset.
Pass the record if its next hop IP address matches a value in IP_OR_CIDR_LIST. See also --next-hop-id and --nhipset.
Pass the record if its source IP address does not match a value in IP_OR_CIDR_LIST, a comma separated list of IPs and/or CIDR blocks. See also --not-saddress and --not-sipset.
Pass the record if its destination IP address does not match a value in IP_OR_CIDR_LIST. See also --not-daddress and --not-dipset.
Pass the record if neither its source nor its destination IP address matches a value in IP_OR_CIDR_LIST. See also --not-any-address and --not-anyset.
Pass the record if its next hop IP address does not match a value in IP_OR_CIDR_LIST. See also --not-next-hop-id and --not-nhipset.
Pass the record if its source IP address is matched by the SiLK IP Wildcard IP_WILDCARD. To match on multiple IPs, use --scidr or create an IPset and use --sipset.
Pass the record if its destination IP address is matched by IP_WILDCARD, a SiLK IP Wildcard. See also --dcidr and --dipset.
Pass the record if either its source or its destination IP address is matched by IP_WILDCARD, a SiLK IP Wildcard. This switch does not consider the next hop IP address. See also --any-cidr and --anyset.
Pass the record if its next hop IP address is matched by this IP_WILDCARD, a SiLK IP Wildcard. To match on multiple IPs, use --nhcidr or create an IPset and use --nhipset.
Pass the record if its source IP address is not matched by this IP_WILDCARD, a SiLK IP Wildcard. See also --not-scidr and --not-sipset.
Pass the record if its destination IP address is not matched by this IP_WILDCARD. See also --not-dcidr and --not-dipset.
Pass the record if neither its source nor its destination IP address is matched by this IP_WILDCARD. Does not consider the next hop address. See also --not-any-cidr and --not-anyset.
Pass the record if its next hop IP address is not matched by this IP_WILDCARD. See also --not-nhcidr and --not-nhipset.
Pass the record if its source IP address is in the list of IPs contained in the binary set file IP_SET_FILENAME. See also --scidr.
As --sipset for the destination IP address. See also --dcidr.
Pass the record if either its source IP address or its destination IP address is in the list of IPs contained in the binary set file IP_SET_FILENAME. Does not consider the next hop IP. See also --any-cidr.
As --sipset for the next-hop IP address. See also --nhcidr.
Pass the record if its source IP address is not in the list of IPs contained in the binary set file IP_SET_FILENAME. See also --not-scidr.
As --not-sipset for the destination IP address. See also --not-dcidr.
Pass the record if neither its source IP address nor its destination IP address is in the list of IPs contained in the binary set file IP_SET_FILENAME. Does not consider the next hop IP. See also --not-any-cidr.
As --not-sipset for the next hop IP address. See also --not-nhcidr.
Partitioning Switches for Remainder of Five-Tuple
The following switches partition based on the protocol and source or destination port. The parameter to each of these switches is an INTEGER_LIST, which is a comma-separated list of individual non-negative integer values and ranges of those values. For example, 1,2,3,5-10,99-103. A range may be specified without an upper limit, such as 1-, in which case the upper limit is set to the maximum value.
Pass the record if its source port is in this INTEGER_LIST, possible values are 0-65535.
Pass the record if its destination port is in this INTEGER_LIST, possible values are 0-65535
Pass the record if its source port and/or its destination port is in this INTEGER_LIST, possible values are 0-65535. For example, use --aport=25 to see all SMTP conversions regardless or where they originated.
Pass the record if its IP Suite Protocol is in this INTEGER_LIST, possible values are 0-255.
Pass the record if its ICMP (or ICMPv6) type is in this INTEGER_LIST; possible values 0-255. This switch also verifies that the flow’s protocol is 1 (or 58 if the flow is IPv6). It is an error to specify a --protocol that does not include 1 and/or 58.
Pass the record if its ICMP (or ICMPv6) code is in this INTEGER_LIST; possible values 0-255. This switch also verifies that the flow’s protocol is 1 (or 58 if the flow is IPv6). It is an error to specify a --protocol that does not include 1 and/or 58.
Partitioning Switches for Time
These switches partition based on whether the time stamps on the flow record occur within the specified time window. The form of the argument is range of two dates, start-window and end-window, each in the form YYYY/MM/DD[:HH[:MM[:SS[.ssssss]]]], for example 2003/01/31:23:45:00.000-2003/01/31:23:59:59.999 represents the last fifteen minutes of Jan 31, 2003. (A T may be used in place of : to separate the day and hour.) The start-window and end-window must be set to at least day precision. For the start-window, unspecified hour, minute, second, and nanosecond values are set to 0; for the end-window, those values are set to 23, 59, 59, and 999999999 respectively. Thus 2003/01/31:23-2003/01/31:23 becomes 2003/01/31:23:00:00.000-2003/01/31:23:59:59.999999999. If an end-window is not given, it is set to the start-window, giving a window of a single nanosecond. The date strings are considered to be in the timezone specified when SiLK was compiled, which you can determine from the output of rwfilter --version. You may also specify the times as seconds since the UNIX epoch; when the end-time is in epoch seconds, an unspecified nanoseconds value is set to 999999999 and otherwise the value is unchanged.
Pass the record if the record was active at ANY time during this TIME_WINDOW. If a single time is specified, pass the record if it was active at that instant.
Pass the record if its starting time is in this TIME_WINDOW.
As --stime for the ending time.
Pass the record if its duration--that is, the record’s end time minus its start time, as measured in seconds--is in this DECIMAL_RANGE. Use floating point numbers to specify fractional second values. The range should be specified as MIN-MAX; for example, 5.0-10.031. If a single value is given, the duration must match that value exactly. The upper limit may be omitted; for example, a range of 1.5- passes records whose duration is at least 1.5 seconds.
Partitioning Switches for Volume
The following switches partition based on the volume of the flow; that is, the number of bytes or packets. For additional volume-related switches, load the flowrate plug-in as described in the