NAME

rwbagtool - Perform high-level operations on binary Bag files

SYNOPSIS

  rwbagtool { --add | --subtract | --minimize | --maximize
              | --divide | --scalar-multiply=VALUE
              | --compare={lt | le | eq | ge | gt} }
        [--intersect=SETFILE | --complement-intersect=SETFILE]
        [--mincounter=VALUE] [--maxcounter=VALUE]
        [--minkey=VALUE] [--maxkey=VALUE]
        [--invert] [--coverset] [--ipset-record-version=VERSION]
        [--output-path=PATH]
        [--note-strip] [--note-add=TEXT] [--note-file-add=FILE]
        [--compression-method=COMP_METHOD]
        [BAGFILE[ BAGFILE...]]

  rwbagtool --help

  rwbagtool --version

DESCRIPTION

rwbagtool performs various operations on Bags. It can add Bags together, subtract a subset of data from a Bag, perform key intersection of a Bag with an IP set, extract the key list of a Bag as an IP set, or filter Bag records based on their counter value.

BAGFILE is a the name of a file or a named pipe, or the names stdin or - to have rwbagtool read from the standard input. If no Bag file names are given on the command line, rwbagtool attempts to read a Bag from the standard input. If BAGFILE does not contain a Bag, rwbagtool prints an error to stderr and exits abnormally.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

Operation switches

The first set of options are mutually exclusive; only one may be specified. If none are specified, the counters in the Bag files are summed.

--add

Sum the counters for each key for all Bag files given on the command line. If a key does not exist, it has a counter of zero. If no other operation is specified, the add operation is the default.

--subtract

Subtract from the first Bag file all subsequent Bag files. If a key does not appear in the first Bag file, rwbagtool assumes it has a value of 0. If any counter subtraction results in a negative number, the key will not appear in the resulting Bag file.

--minimize

Cause the output to contain the minimum counter seen for each key. Keys that do not appear in all input Bags will not appear in the output.

--maximize

Cause the output to contain the maximum counter seen for each key. The output will contain each key that appears in any input Bag.

--divide

Divide the first Bag file by the second Bag file. It is an error if more than two Bag files are specified. Every key in the first Bag file must appear in the second file; the second Bag may have keys that do not appear in the first, and those keys will not appear in the output. Since Bags do not support floating point numbers, the result of the division is rounded to the nearest integer (values ending in .5 are rounded up). If the result of the division is less than 0.5, the key will not appear in the output.

--scalar-multiply=VALUE

Multiply each counter in the Bag file by the scalar VALUE, where VALUE is an integer in the range 1 to 18446744073709551615. This switch accepts a single Bag as input.

--compare=OPERATION

Compare the key/counter pairs in exactly two Bag files. It is an error if more than two Bag files are specified. The keys in the output Bag will only be those whose counter in the first Bag is OPERATION the counter in the second Bag. The counters for all keys in the output will be 1. Any key that does not appear in both input Bag files will not appear in the result. The possible OPERATION values are the strings:

lt

GetCounter(Bag1, key) < GetCounter(Bag2, key)

le

GetCounter(Bag1, key) <= GetCounter(Bag2, key)

eq

GetCounter(Bag1, key) == GetCounter(Bag2, key)

ge

GetCounter(Bag1, key) >= GetCounter(Bag2, key)

gt

GetCounter(Bag1, key) > GetCounter(Bag2, key)

Masking/Limiting switches

The result of the above operation is an intermediate Bag file. The following switches are applied next to remove entries from the intermediate Bag:

--intersect=SETFILE

Mask the keys in the intermediate Bag using the set in SETFILE. SETFILE is the name of a file or a named pipe containing an IPset, or the name stdin or - to have rwbagtool read the IPset from the standard input. If SETFILE does not contain an IPset, rwbagtool prints an error to stderr and exits abnormally. Only key/counter pairs where the key matches an entry in SETFILE are written to the output. (IPsets are typically created by rwset(1) or rwsetbuild(1).)

--complement-intersect=SETFILE

As --intersect, but only writes key/counter pairs for keys which do not match an entry in SETFILE.

--mincounter=VALUE

Cause the output to contain only those records whose counter value is VALUE or higher. The allowable range is 1 to the maximum counter value; the default is 1.

--maxcounter=VALUE

Cause the output to contain only those records whose counter value is VALUE or lower. The allowable range is 1 to the maximum counter value; the default is the maximum counter value.

--minkey=VALUE

Cause the output to contain only those records whose key value is VALUE or higher. Default is 0 (or 0.0.0.0). Accepts input as an integer or as an IP address in dotted decimal notation.

--maxkey=VALUE

Cause the output to contain only those records whose key value is VALUE or higher. Default is 4294967295 (or 255.255.255.255). Accepts input as an integer or as an IP address in dotted decimal notation.

Output switches

The following switches control the output.

--invert

Generate a new Bag whose keys are the counters in the intermediate Bag and whose counter is the number of times the counter was seen. For example, this turns the Bag {sip:flow} into the Bag {flow:count(sip)}. Any counter in the intermediate Bag that is larger than the maximum possible key will be attributed to the maximum key; to prevent this, specify --maxcounter=4294967295.

--coverset

Instead of creating a Bag file as the output, write an IPset which contains the keys contained in the intermediate Bag.

--ipset-record-version=VERSION

Specify the format of the IPset records that are written to the output when the --coverset switch is used. VERSION may be 2, 3, 4, 5 or the special value 0. When the switch is not provided, the SILK_IPSET_RECORD_VERSION environment variable is checked for a version. The default version is 0. Since SiLK 3.11.0.

0

Use the default version for an IPv4 IPset and an IPv6 IPset, currently 2 and 3, respectively.

2

Create a file that may hold only IPv4 adresses and is readable by all versions of SiLK.

3

Create a file that may hold IPv4 or IPv6 adresses and is readable by SiLK 3.0 and later.

4

Create a file that may hold IPv4 or IPv6 adresses and is readable by SiLK 3.7 and later. These files are more compact that version 3 and often more compact than version 2.

5

Create a file that may hold only IPv6 adresses and is readable by SiLK 3.14 and later. When this version is specified, IPsets containing only IPv4 addresses are written in version 4. These files are usually more compact that version 4.

--output-path=PATH

Write the resulting Bag to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwbagtool exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwbagtool to exit with an error.

--note-strip

Do not copy the notes (annotations) from the input files to the output file. Normally notes from the input files are copied to the output.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

The examples assume the following contents for the files:

 Bag1.bag    Bag2.bag    Bag3.bag    Bag4.bag    Mask.set
  3|  10|     1|   1|     2|   8|     1|   1|          2
  4|   7|     4|   2|     4|  10|     4|   3|          4
  6|  14|     7|  32|     6|  14|     6|   4|          6
  7|  23|     8|   2|     7|  12|     7|   4|          8
  8|   2|                 9|   8|     8|   6|

Adding Bag files

 $ rwbagtool --add Bag1.bag Bag2.bag > Bag-sum.bag
 $ rwbagcat --integer-keys Bag-sum.bag
  1|   1|
  3|  10|
  4|   9|
  6|  14|
  7|  55|
  8|   4|

 $ rwbagtool --add Bag1.bag Bag2.bag Bag3.bag > Bag-sum2.bag
 $ rwbagcat --integer-keys Bag-sum2.bag
  1|   1|
  2|   8|
  3|  10|
  4|  19|
  6|  28|
  7|  67|
  8|   4|
  9|   8|

Subtracting Bag Files

 $ rwbagtool --sub Bag1.bag Bag2.bag > Bag-diff.bag
 $ rwbagcat --integer-keys Bag-diff.bag
  3|  10|
  4|   5|
  6|  14|

 $ rwbagtool --sub Bag2.bag Bag1.bag > Bag-diff2.bag
 $ rwbagcat --integer-keys Bag-diff2.bag
  1|   1|
  7|   9|

Getting the Minimum Value

 $ rwbagtool --minimize Bag1.bag Bag2.bag Bag3.bag > Bag-min.bag
 $ rwbagcat --integer-keys Bag-min.bag
  4|   2|
  7|  12|

Getting the Maximum Value

 $ rwbagtool --maximize Bag1.bag Bag2.bag Bag3.bag > Bag-max.bag
 $ rwbagcat --integer-keys Bag-max.bag
  1|   1|
  2|   8|
  3|  10|
  4|  10|
  6|  14|
  7|  32|
  8|   2|
  9|   8|

Dividing Bag Files

 $ rwbagtool --divide Bag2.bag Bag4.bag > Bag-div1.bag
 $ rwbagcat --integer-keys Bag-div1.bag
   1|   1|
   4|   1|
   7|   8|

However, when the order is reversed:

 $ rwbagtool --divide Bag4.bag Bag2.bag > Bag-div2.bag
 rwbagtool: Error dividing bags; key 6 not in divisor bag

To work around this issue, use the --coverset switch to create a copy of Bag4.bag that contains only the keys in Bag2.bag

 $ rwbagtool --coverset Bag2.bag > Bag2-keys.set
 $ rwbagtool --intersect=Bag2-keys.set  Bag4.bag  > Bag4-small.bag
 $ rwbagtool --divide Bag4-small.bag Bag2.bag > Bag-div2.bag
 $ rwbagcat --integer-keys Bag-div2.bag
   1|   1|
   4|   2|
   8|   3|

Or, in a single piped command without writing the IPset to disk:

 $ rwbagtool --coverset Bag2.bag                \
   | rwbagtool --intersect=-  Bag4.bag          \
   | rwbagtool --divide -  Bag2.bag             \
   | rwbagcat --integer-keys
   1|   1|
   4|   2|
   8|   3|

Scalar Multiplication

 $ rwbagtool --scalar-multiply=7 Bag1.bag > Bag-multiply.bag
 $ rwbagcat --integer-keys Bag-multiply.bag
  3|  70|
  4|  49|
  6|  98|
  7| 161|
  8|  14|

Comparing Bag Files

 $ rwbagtool --compare=lt Bag1.bag Bag2.bag > Bag-lt.bag
 $ rwbagcat --integer-keys Bag-lt.bag
  7|   1|

 $ rwbagtool --compare=le Bag1.bag Bag2.bag > Bag-le.bag
 $ rwbagcat --integer-keys Bag-le.bag
  7|   1|
  8|   1|

 $ rwbagtool --compare=eq Bag1.bag Bag2.bag > Bag-eq.bag
 $ rwbagcat --integer-keys Bag-eq.bag
  8|   1|

 $ rwbagtool --compare=ge Bag1.bag Bag2.bag > Bag-ge.bag
 $ rwbagcat --integer-keys Bag-ge.bag
  4|   1|
  8|   1|

 $ rwbagtool --compare=gt Bag1.bag Bag2.bag > Bag-gt.bag
 $ rwbagcat --integer-keys Bag-gt.bag
  4|   1|

Making a Cover Set

 $ rwbagtool --coverset Bag1.bag Bag2.bag Bag3.bag > Cover.set
 $ rwsetcat --integer-keys Cover.set
  1
  2
  3
  4
  6
  7
  8
  9

Inverting a Bag

 $ rwbagtool --invert Bag1.bag > Bag-inv1.bag
 $ rwbagcat --integer-keys Bag-inv1.bag
  2|   1|
  7|   1|
 10|   1|
 14|   1|
 23|   1|

 $ rwbagtool --invert Bag2.bag > Bag-inv2.bag
 $ rwbagcat --integer-keys Bag-inv2.bag
  1|   1|
  2|   2|
 32|   1|

 $ rwbagtool --invert Bag3.bag > Bag-inv3.bag
 $ rwbagcat --integer-keys Bag-inv3.bag
  8|   2|
 10|   1|
 12|   1|
 14|   1|

Masking Bag Files

 $ rwbagtool --intersect=Mask.set Bag1.bag > Bag-mask.bag
 $ rwbagcat --integer-keys Bag-mask.bag
  4|   7|
  6|  14|
  8|   2|

 $ rwbagtool --complement-intersect=Mask.set Bag1.bag > Bag-mask2.bag
 $ rwbagcat --integer-keys Bag-mask2.bag
  3|  10|
  7|  23|

Restricting the Output

 $ rwbagtool --add --maxkey=5 Bag1.bag Bag2.bag > Bag-res1.bag
 $ rwbagcat --integer-keys Bag-res1.bag
  1|   1|
  3|  10|
  4|   9|

 $ rwbagtool --minkey=3 --maxkey=6 Bag1.bag > Bag-res2.bag
 $ rwbagcat --integer-keys Bag-res2.bag
  3|  10|
  4|   9|
  6|  14|

 $ rwbagtool --mincounter=20 Bag1.bag Bag2.bag > Bag-res3.bag
 $ rwbagcat --integer-keys Bag-res3.bag
  7|  55|

 $ rwbagtool --sub --maxcounter=9 Bag1.bag Bag2.bag > Bag-res4.bag
 $ rwbagcat --integer-keys Bag-res4.bag
  4|   5|

ENVIRONMENT

SILK_IPSET_RECORD_VERSION

This environment variable is used as the value for the --ipset-record-version when that switch is not provided. Since SiLK 3.7.0.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.

SEE ALSO

rwbag(1), rwbagbuild(1), rwbagcat(1), rwfileinfo(1), rwset(1), rwsetbuild(1), rwsetcat(1), silk(7), zlib(3)