NAME

rwbagtool - Perform high-level operations on binary Bag files

SYNOPSIS

  rwbagtool { --add | --subtract | --minimize | --maximize
              | --divide | --scalar-multiply=VALUE
              | --compare={lt | le | eq | ge | gt} }
        [--intersect=SETFILE | --complement-intersect=SETFILE]
        [--mincounter=VALUE] [--maxcounter=VALUE]
        [--minkey=VALUE] [--maxkey=VALUE]
        [--invert] [--coverset] [--ipset-record-version=VERSION]
        [--output-path=PATH]
        [--note-strip] [--note-add=TEXT] [--note-file-add=FILE]
        [--compression-method=COMP_METHOD]
        [BAGFILE[ BAGFILE...]]

  rwbagtool --help

  rwbagtool --version

DESCRIPTION

rwbagtool performs various operations on binary Bag files and creates a new Bag file. A Bag is a set where each key is associated with a counter. rwbag(1) and rwbagbuild(1) are the primary tools used to create a Bag file. rwbagcat(1) prints a binary Bag file as text.

rwbagtool can add Bags together, subtract a subset of data from a Bag, divide a Bag by another, compare the counters of two Bag files, perform key intersection of a Bag with an IPset, extract the keys of a Bag as an IPset, or filter Bag entries based on their key or counter values.

In the command synopsis above, BAGFILE is a the name of a file or a named pipe, or the names stdin or - to have rwbagtool read from the standard input. If no Bag file names are given on the command line, rwbagtool attempts to read a Bag from the standard input. If BAGFILE does not contain a Bag, rwbagtool prints an error to stderr and exits abnormally.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

Operation switches

The first set of options are mutually exclusive; only one may be specified. If none are specified, the counters in the Bag files are summed.

--add

Sum the counters for each key for all Bag files given on the command line. At least one Bag file must be specified, and any number of additional Bag files may be given. If a key is not present in an input file, a counter of zero is used. When no operation switch is specified on the command line, the add operation is the default.

--subtract

Subtract from the first Bag file all subsequent Bag files. At least one Bag file must be specified, and any number of additional Bag files may be given. If a key does not appear in the first Bag file, rwbagtool assumes it has a value of 0. If subtracting a key's counters results in a non-positive number, the key does appear in the resulting Bag file.

--minimize

Cause the output to contain the minimum counter seen for each key. Keys that do not appear in all input Bags do not appear in the output. At least one Bag file must be specified, and any number of additional Bag files may be given.

--maximize

Cause the output to contain the maximum counter seen for each key. The output contains each key that appears in any input Bag. At least one Bag file must be specified, and any number of additional Bag files may be given.

--divide

Divide the first Bag file by the second Bag file. It is an error if only one Bag file or more than two Bag files are given. Every key in the first Bag file must appear in the second file; the second Bag may have keys that do not appear in the first, and those keys do not appear in the output. Since Bags do not support floating point numbers, the result of the division is rounded to the nearest integer (values ending in .5 are rounded up). If the result of the division is less than 0.5, the key does not appear in the output.

--scalar-multiply=VALUE

Multiply each counter in the Bag file by the scalar VALUE, where VALUE is an integer in the range 1 to 18446744073709551615. This switch requires a single Bag as input.

--compare=OPERATION

Compare the key/counter pairs in exactly two Bag files. It is an error if only one Bag file or more than two Bag files are specified. The keys in the output Bag are only those for which the comparison denoted by OPERATION is true when comparing the key's counter in the first Bag with the key's counter in the second Bag. The counters for all keys in the output have the value 1. Any key that does not appear in both input Bag files does not appear in the result. The possible OPERATION values are the strings:

lt

GetCounter(Bag1, key) < GetCounter(Bag2, key)

le

GetCounter(Bag1, key) <= GetCounter(Bag2, key)

eq

GetCounter(Bag1, key) == GetCounter(Bag2, key)

ge

GetCounter(Bag1, key) >= GetCounter(Bag2, key)

gt

GetCounter(Bag1, key) > GetCounter(Bag2, key)

Masking/Limiting switches

The result of the above operation is an intermediate Bag file. The following switches are applied next to remove entries from the intermediate Bag:

--intersect=SETFILE

Mask the keys in the intermediate Bag using the set in SETFILE. SETFILE is the name of a file or a named pipe containing an IPset, or the name stdin or - to have rwbagtool read the IPset from the standard input. If SETFILE does not contain an IPset, rwbagtool prints an error to stderr and exits abnormally. Only key/counter pairs where the key matches an entry in SETFILE are written to the output. (IPsets are typically created by rwset(1) or rwsetbuild(1).)

--complement-intersect=SETFILE

As --intersect, but only writes key/counter pairs for keys which do not match an entry in SETFILE.

--mincounter=VALUE

Cause the output to contain only those entries whose counter value is VALUE or higher. The allowable range is 1 to the maximum counter value; the default is 1.

--maxcounter=VALUE

Cause the output to contain only those entries whose counter value is VALUE or lower. The allowable range is 1 to the maximum counter value; the default is the maximum counter value.

--minkey=VALUE

Cause the output to contain only those entries whose key value is VALUE or higher. Default is 0 (or 0.0.0.0). Accepts input as an integer or as an IP address in dotted decimal notation.

--maxkey=VALUE

Cause the output to contain only those entries whose key value is VALUE or higher. Default is 4294967295 (or 255.255.255.255). Accepts input as an integer or as an IP address in dotted decimal notation.

Output switches

The following switches control the output.

--invert

Generate a new Bag whose keys are the counters in the intermediate Bag and whose counter is the number of times the counter was seen. For example, this turns the Bag {sip:flow} into the Bag {flow:count(sip)}. Any counter in the intermediate Bag that is larger than the maximum possible key is attributed to the counter for the maximum key; to prevent this, specify --maxcounter=4294967295 which removes all key-counter pairs whose counters do not fit into a key. (The --bin-ips switch on rwbagcat(1) allows one to invert a Bag file as it is being printed.)

--coverset

Instead of creating a Bag file as the output, write an IPset which contains the keys contained in the intermediate Bag.

--ipset-record-version=VERSION

Specify the format of the IPset records that are written to the output when the --coverset switch is used. VERSION may be 2, 3, 4, 5 or the special value 0. When the switch is not provided, the SILK_IPSET_RECORD_VERSION environment variable is checked for a version. The default version is 0. Since SiLK 3.11.0.

0

Use the default version for an IPv4 IPset and an IPv6 IPset, currently 2 and 3, respectively.

2

Create a file that may hold only IPv4 addresses and is readable by all versions of SiLK.

3

Create a file that may hold IPv4 or IPv6 addresses and is readable by SiLK 3.0 and later.

4

Create a file that may hold IPv4 or IPv6 addresses and is readable by SiLK 3.7 and later. These files are more compact that version 3 and often more compact than version 2.

5

Create a file that may hold only IPv6 addresses and is readable by SiLK 3.14 and later. When this version is specified, IPsets containing only IPv4 addresses are written in version 4. These files are usually more compact that version 4.

--output-path=PATH

Write the resulting Bag to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwbagtool exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwbagtool to exit with an error.

--note-strip

Do not copy the notes (annotations) from the input files to the output file. Normally notes from the input files are copied to the output.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

The examples assume the following contents for the files:

 Bag1.bag    Bag2.bag    Bag3.bag    Bag4.bag    Mask.set
  3|  10|     1|   1|     2|   8|     1|   1|          2
  4|   7|     4|   2|     4|  10|     4|   3|          4
  6|  14|     7|  32|     6|  14|     6|   4|          6
  7|  23|     8|   2|     7|  12|     7|   4|          8
  8|   2|                 9|   8|     8|   6|

The examples use rwbagcat(1) to print the contents of the Bag files.

Adding Bag Files

Adding Bag files produces a Bag whose keys are the set union of the keys in the input Bags. The counter for each key is the sum of the key's counters in each input Bag.

 $ rwbagtool --add Bag1.bag Bag2.bag > Bag-sum.bag
 $ rwbagcat --key-format=decimal Bag-sum.bag
  1|   1|
  3|  10|
  4|   9|
  6|  14|
  7|  55|
  8|   4|

 $ rwbagtool --add Bag1.bag Bag2.bag Bag3.bag > Bag-sum2.bag
 $ rwbagcat --key-format=decimal Bag-sum2.bag
  1|   1|
  2|   8|
  3|  10|
  4|  19|
  6|  28|
  7|  67|
  8|   4|
  9|   8|

Subtracting Bag Files

The --subtract switch subtracts from the key/counter pairs in the first Bag file the key/counter pairs in all other Bag file arguments. Keys that are not present in the first argument are ignored. If subtraction results in a counter value of zero or less, the key is removed from the result.

 $ rwbagtool --subtract Bag1.bag Bag2.bag > Bag-diff.bag
 $ rwbagcat --key-format=decimal Bag-diff.bag
  3|  10|
  4|   5|
  6|  14|

 $ rwbagtool --subtract Bag2.bag Bag1.bag > Bag-diff2.bag
 $ rwbagcat --key-format=decimal Bag-diff2.bag
  1|   1|
  7|   9|

Getting the Minimum Value

The output produced by the --minimize switch contains only the keys that appear in all of input Bags. For each key, the counter is the minimum value for that key in any input Bag.

 $ rwbagtool --minimize Bag1.bag Bag2.bag Bag3.bag > Bag-min.bag
 $ rwbagcat --key-format=decimal Bag-min.bag
  4|   2|
  7|  12|

Getting the Maximum Value

The keys of the Bag file produced by --maximize are the same as the keys produced by --add; that is, the union of all keys in the input files. For each key, its counter is the maximum value seen for that key in any single input Bag file.

 $ rwbagtool --maximize Bag1.bag Bag2.bag Bag3.bag > Bag-max.bag
 $ rwbagcat --key-format=decimal Bag-max.bag
  1|   1|
  2|   8|
  3|  10|
  4|  10|
  6|  14|
  7|  32|
  8|   2|
  9|   8|

Dividing Bag Files

The --divide switch requires exactly two Bag files as input. The keys in the first Bag argument must be either the same as or a subset of those in the second argument. The counter for each key in the first Bag file is divided by that key's counter in the second file. If the result of the division is less than 0.5, the key is not included in the output.

 $ rwbagtool --divide Bag2.bag Bag4.bag > Bag-div1.bag
 $ rwbagcat --key-format=decimal Bag-div1.bag
   1|   1|
   4|   1|
   7|   8|

When the order of the Bag file arguments is reversed an error is reported.

 $ rwbagtool --divide Bag4.bag Bag2.bag > Bag-div2.bag
 rwbagtool: Error dividing bags; key 6 not in divisor bag

To work around this issue, use the --coverset switch to create a copy of Bag4.bag that contains only the keys in Bag2.bag.

 $ rwbagtool --coverset Bag2.bag > Bag2-keys.set
 $ rwbagtool --intersect=Bag2-keys.set  Bag4.bag > Bag4-small.bag
 $ rwbagtool --divide Bag4-small.bag Bag2.bag > Bag-div2.bag
 $ rwbagcat --key-format=decimal Bag-div2.bag
   1|   1|
   4|   2|
   8|   3|

The following command is the same as the above except the IPset and Bag files are piped between the tools instead of being written to disk:

 $ rwbagtool --coverset Bag2.bag                \
   | rwbagtool --intersect=-  Bag4.bag          \
   | rwbagtool --divide -  Bag2.bag             \
   | rwbagcat --key-format=decimal
   1|   1|
   4|   2|
   8|   3|

Scalar Multiplication

The --scalar-multiply switch multiplies each counter in the input Bag by the specified value. Exactly one Bag file argument is required.

 $ rwbagtool --scalar-multiply=7 Bag1.bag > Bag-multiply.bag
 $ rwbagcat --key-format=decimal Bag-multiply.bag
  3|  70|
  4|  49|
  6|  98|
  7| 161|
  8|  14|

Use two rwbagtool commands if multiple operations are desired.

 $ rwbagtool --add Bag1.bag Bag2.bag   \
   | rwbagtool --scalar-multiply=3 --output-path=Bag12-multi.bag
 $ rwbagcat --key-format=decimal Bag12-multi.bag
  1|   3|
  3|  30|
  4|  27|
  6|  42|
  7| 165|
  8|  12|

Comparing Bag Files

The --compare switch takes an argument that specifies how to compare the counters in two Bag files, and it requires exactly two Bag files as input. For each key that appears in both Bag files, the counter value in the first file is compared to counter value in the second file. If the comparison is true, the key appears in the resulting Bag file with a counter of 1. If the comparison is false, the key is not present in the output file. Keys that appear in only one of the input files are ignored.

The following comparisons operate on Bag1.bag and Bag2.bag which have as common keys 4, 7, and 8.

Find counters in Bag1.bag that are less than those in Bag2.bag:

 $ rwbagtool --compare=lt Bag1.bag Bag2.bag > Bag-lt.bag
 $ rwbagcat --key-format=decimal Bag-lt.bag
  7|   1|

Find counters in Bag1.bag that are less than or equal to those in Bag2.bag:

 $ rwbagtool --compare=le Bag1.bag Bag2.bag > Bag-le.bag
 $ rwbagcat --key-format=decimal Bag-le.bag
  7|   1|
  8|   1|

Find counters in Bag1.bag that are equal to those in Bag2.bag:

 $ rwbagtool --compare=eq Bag1.bag Bag2.bag > Bag-eq.bag
 $ rwbagcat --key-format=decimal Bag-eq.bag
  8|   1|

Find counters in Bag1.bag that are greater than or equal to those in Bag2.bag:

 $ rwbagtool --compare=ge Bag1.bag Bag2.bag > Bag-ge.bag
 $ rwbagcat --key-format=decimal Bag-ge.bag
  4|   1|
  8|   1|

Find counters in Bag1.bag that are greater than those in Bag2.bag:

 $ rwbagtool --compare=gt Bag1.bag Bag2.bag > Bag-gt.bag
 $ rwbagcat --key-format=decimal Bag-gt.bag
  4|   1|

Making a Cover Set

A cover set is an IPset file that contains the keys that are present in any of the input Bag files. In other words, it is the union of the keys converted to an IPset. Since an operation switch is not provided in this command, an implicit --add operation is performed on the Bag files prior to creating the cover set. (rwsetcat(1) prints the contents of an IPset file as text.)

 $ rwbagtool --coverset Bag1.bag Bag2.bag Bag3.bag > Cover.set
 $ rwsetcat --key-format=decimal Cover.set
  1
  2
  3
  4
  6
  7
  8
  9

One use of a cover set is to limit the contents of a Bag file to keys that are present in a second Bag file:

 $ rwbagtool --coverset --output-path=Cover.set Bag1.bag
 $ rwbagtool --intersect=Cover.set Bag2.bag > Bag1-mask-Bag2.bag
 $ rwbagcat --key-format=decimal Bag1-mask-Bag2.bag
  4|   2|
  7|  32|
  8|   2|

To mask the contents of Bag2.bag by the keys that are not present in Bag1.bag:

 $ rwbagtool --complement-intersect=Cover.set Bag2.bag \
        > Bag1-notmask-Bag2.bag
 $ rwbagcat --key-format=decimal Bag1-notmask-Bag2.bag
  1|   1|

Inverting a Bag

The output of the --invert switch is a Bag file that counts the number of times each counter is present in the input Bag file.

 $ rwbagtool --invert Bag1.bag > Bag-inv1.bag
 $ rwbagcat --key-format=decimal Bag-inv1.bag
  2|   1|
  7|   1|
 10|   1|
 14|   1|
 23|   1|

 $ rwbagtool --invert Bag2.bag > Bag-inv2.bag
 $ rwbagcat --key-format=decimal Bag-inv2.bag
  1|   1|
  2|   2|
 32|   1|

 $ rwbagtool --invert Bag3.bag > Bag-inv3.bag
 $ rwbagcat --key-format=decimal Bag-inv3.bag
  8|   2|
 10|   1|
 12|   1|
 14|   1|

When multiple Bag files are specified on the command line, the files are added prior to creating the inverted Bag. Even though the counter 2 appears three times in the files Bag1.bag and Bag2.bag, the key 2 is not present in the following since the add operation is performed first.

 $ rwbagtool --invert Bag1.bag Bag2.bag   \
   | rwbagcat --key-format=decimal
  1|   1|
  4|   1|
  9|   1|
 10|   1|
 14|   1|
 55|   1|

Masking Bag Files

The --intersect switch takes an IPset file as an argument and limits the keys of the Bag produced by rwbagtool to only those keys that appear in the IPset file.

 $ rwbagtool --intersect=Mask.set Bag1.bag > Bag-mask.bag
 $ rwbagcat --key-format=decimal Bag-mask.bag
  4|   7|
  6|  14|
  8|   2|

The --complement-intersect switch limits the output to only those keys that do not appear in the IPset file.

 $ rwbagtool --complement-intersect=Mask.set Bag1.bag > Bag-mask2.bag
 $ rwbagcat --key-format=decimal Bag-mask2.bag
  3|  10|
  7|  23|

See also the next section.

Restricting the Output

In addition to limiting the result of rwbagtool to keys that appear or do not appear in an IPset file (cf. previous section), numeric limits may be used to restrict the keys or counters that in the resulting Bag file with use of the --minkey, --maxkey, --mincounter, and --maxcounter switches.

 $ rwbagtool --add --maxkey=5 Bag1.bag Bag2.bag > Bag-res1.bag
 $ rwbagcat --key-format=decimal Bag-res1.bag
  1|   1|
  3|  10|
  4|   9|

 $ rwbagtool --minkey=3 --maxkey=6 Bag1.bag > Bag-res2.bag
 $ rwbagcat --key-format=decimal Bag-res2.bag
  3|  10|
  4|   9|
  6|  14|

 $ rwbagtool --mincounter=20 Bag1.bag Bag2.bag > Bag-res3.bag
 $ rwbagcat --key-format=decimal Bag-res3.bag
  7|  55|

 $ rwbagtool --subtract --maxcounter=9 Bag1.bag Bag2.bag  \
        > Bag-res4.bag
 $ rwbagcat --key-format=decimal Bag-res4.bag
  4|   5|

Changing a File's Format

To share a Bag file with a user who has a version of SiLK that includes different compression libraries, it may be necessary to change the the compression-method of the Bag.

It is not possible to change the compression-method directly. A new file must be created first, and then you may then replace the old file with the new file.

To create a new file that uses a different compression-method of the Bag file A.bag, use rwbagtool with the --add switch and specify the desired argument:

 $ rwbagtool --add --compression=none --output-path=A1.bag A.bag

Changing the Key Type or Counter Type

Unfortunately, the Bag tools do not allow changing the key type or counter type of a Bag file. To change the types, use rwbagcat(1) to write the Bag as text and rwbagbuild(1) to convert the text back to a Bag file.

 $ rwbagcat Bag1.bag    \
   | rwbagbuild --bag-input=- --output-path=Bag1-typed.bag  \
        --key-type=sport --counter-type=sum-bytes

Use rwfileinfo(1) to see the type of the key and counter.

 $ rwfileinfo --field=bag Bag1-typed.bag
 Bag1-typed.bag:
   bag          key: sPort @ 4 octets; counter: sum-bytes @ 8 octets

Alternatively, one may use PySiLK (see pysilk(3)) to modify the key type and counter type.

 $ cat bag-type.py
 import sys
 from silk import *

 key_type = sys.argv[1]
 counter_type = sys.argv[2]
 old_file = sys.argv[3]
 new_file = sys.argv[4]

 old = Bag.load(old_file, key_type=IPv4Addr)
 new = Bag(old, key_type=key_type, counter_type=counter_type)
 new.save(new_file)
 $
 $ python bag-type.py sipv4 sum-packets Bag1.bag Bag1-type2.bag
 $ rwfileinfo --field=bag Bag1-type2.bag
 Bag1-type2.bag:
   bag          key: sIPv4 @ 4 octets; counter: sum-packets @ 8 octets

ENVIRONMENT

SILK_IPSET_RECORD_VERSION

This environment variable is used as the value for the --ipset-record-version when that switch is not provided. Since SiLK 3.7.0.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.

SEE ALSO

rwbag(1), rwbagbuild(1), rwbagcat(1), rwfileinfo(1), rwset(1), rwsetbuild(1), rwsetcat(1), silk(7), zlib(3)