NAME

rwaggbagtool - Manipulate binary Aggregate Bag files

SYNOPSIS

  rwaggbagtool
        [{ --remove-fields=REMOVE_LIST | --select-fields=SELECT_LIST
           | --to-bag=BAG_KEY,BAG_COUNTER
           | --to-ipset=FIELD [--ipset-record-version=VERSION] }]
        [--insert-field=FIELD=VALUE [--insert-field=FIELD2=VALUE2...]]
        [{ --add | --subtract | --divide }]
        [--zero-divisor-result={error | remove | maximum | VALUE}]
        [--scalar-multiply={VALUE | FIELD=VALUE}
          [--scalar-multiply={VALUE | FIELD=VALUE}...]]
        [--min-field=FIELD=VALUE [--min-field=FIELD=VALUE...]]
        [--max-field=FIELD=VALUE [--max-field=FIELD=VALUE...]]
        [--set-intersect=FIELD=FILE [--set-intersect=FIELD=FILE...]]
        [--set-complement=FIELD=FILE [--set-complement=FIELD=FILE...]]
        [--output-path=PATH [--modify-inplace [--backup-path=BACKUP]]]
        [--note-strip] [--note-add=TEXT] [--note-file-add=FILE]
        [--compression-method=COMP_METHOD]
        [--site-config-file=FILENAME]
        [AGGBAG_FILE [AGGBAG_FILE ...]]

  rwaggbagtool --help

  rwaggbagtool --help-fields

  rwaggbagtool --version

DESCRIPTION

rwaggbagtool performs operations on one or more Aggregate Bag files and creates a new Aggregate Bag file, a new Bag file, or an new IPset file. An Aggregate Bag is a binary file that maps a key to a counter, where the key and the counter are both composed of one or more fields. rwaggbag(1) and rwaggbagbuild(1) are the primary tools used to create an Aggregate Bag file. rwaggbagcat(1) prints a binary Aggregate Bag file as text.

The operations that rwaggbagtool supports are field manipulation (inserting or removing keys or counters), adding, subtracting, and dividing counters (all files must have the same keys and counters) across multiple Aggregate Bag files, multiplying all counters or only selected counters by a value, intersecting with an IPset, selecting rows based on minimum and maximum values of keys and counters, and creating a new IPset or Bag file.

rwaggbagtool processes the Aggregate Bag files listed on the command line. When no file names are specified, rwaggbagtool attempts to read an Aggregate Bag from the standard input. To read the standard input in addition to the named files, use - or stdin as a file name. If any input is not an Aggregate Bag file, rwaggbagtool prints an error to the standard error and exits with an error status.

By default, rwaggbagtool's output is written to the standard output. Use --output-path to specify a different location. As of SiLK 3.21.0, rwaggbagtool supports the --modify-inplace switch which correctly handles the case when an input file is also used as the output file. That switch causes rwaggbagtool to write the output to a temporary file first and then replace the original output file. The --backup-path switch may be used in conjunction with --modify-inplace to set the pathname where the original output file is copied.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

The options are presented here in the order in which rwaggbagtool performs them: Field manipulation switches are applied to each file when it is read; multi-file operation switches combine the Aggregate Bags together; single-file operation switches are applied; filtering switches remove rows from the Aggregate Bag; the result is output as an Aggregate Bag, a standard Bag, or as an IPset.

Field manipulation switches

The following switches allow modification of the fields in the Aggregate Bag file. The --remove-fields and --select-fields switches are mutually exclusive, and they reduce the number of fields in the Aggregate Bag input files. Those switches also conflict with --to-ipset and --to-bag which resemble field selectors. The --insert-field switch is applied after --remove-fields or --select-fields, and it adds a field unless that field is already present.

--remove-fields=REMOVE_LIST

Remove the fields specified in REMOVE_LIST from each of the Aggregate Bag input files, where REMOVE_LIST is a comma-separated list of field names. This switch may include field names that are not in an Aggregate Bag input, and those field names are ignored. If a field name is included in this list and in a --insert-field switch, the field is given the value specified by the --insert-field switch, and the field is included in the output Aggregate Bag file. If removing a key field produces multiple copies of a key, the counters of those keys are merged. rwaggbagbuild exits with an error when this switch is used with --select-fields, --to-ipset, or --to-bag.

--select-fields=SELECT_LIST

For each Aggregate Bag input file, only use the fields in SELECT_LIST, a comma-separated list of field names. Alternatively, consider this switch as removing all fields that are not included in SELECT_LIST. This switch may include field names that are not in an Aggregate Bag input, and those field names are ignored. When a field name is included in this list and in a --insert-field switch, the field uses its value from the input Aggregate Bag file if present, and it uses the value specified in the --insert-field switch otherwise. If selecting only some key fields produces multiple copies of a key, the counters of those keys are merged. rwaggbagbuild exits with an error when this switch is used with --remove-fields, --to-ipset, or --to-bag.

--insert-field=FIELD=VALUE

For each entry read from an Aggregate Bag input file, insert a field named FIELD and set its value to VALUE if one of the following is true: (1)the input file does not contain a field named FIELD or (2)the input file does have a field named FIELD but it was removed by either (2a)being listed in the --remove-fields list or (2b)not being listed in the --select-fields list. That is, this switch only inserts FIELD when FIELD is not present in the input Aggregate Bag, but specifying FIELD in --remove-fields removes it from the input. VALUE is a textual representation of the field's value as described in the description of the --fields switch in the rwaggbagbuild(1) tool. This switch may be repeated in order to insert multiple fields. If --to-ipset or --to-bag is specified, --insert-field may only name a field that is an argument to that switch.

Operations on multiple Aggregate Bag files

The following operations act on multiple Aggregate Bag files. These operations require all of the Aggregate Bag files to have the same set of key fields and counter fields. (Use the field manipulation switches to ensure this.) The values of the keys may differ, but the set of fields that comprise the key must match. It is an error if multiple operations are specified.

--add

Sum each of the counters for each key for all the Aggregate Bag input files. The keys in the result are the union of the set of keys that appear in all input files. Addition operations that overflow an unsigned 64-bit value are set to the maximum (18446744073709551615). If no other operation is specified, the add operation is the default.

--subtract

Subtract from the counters in the first Aggregate Bag file the counters in the second Aggregate Bag file, and repeat the process for each additional Aggregate Bag file. The keys in the result are a subset of the keys that appear in the first file: If a key does not appear in the first Aggregate Bag file, its counters are ignored in subsequent files. If a key does not appear in the second file, its counters in the first file are unchanged. Subtraction operations that result in a negative value are set to zero. If all counters for a key are zero, the key does not appear in the output.

--divide

Divide the counters in first Aggregate Bag file by the second Aggregate Bag file, and repeat the process for each additional Aggregate Bag file. The keys in the result are a subset of the keys that appear in the first file: If a key does not appear in the first Aggregate Bag file, its counters are ignored in subsequent files. If a key does not appear in the second file, its counters are treated as zero and the outcome is determined by the action specified by --zero-divisor-result. That option also determines the result when the two Aggregate Bag files have matching keys but a counter in the second bag is zero. If --zero-divisor-result is not given, rwaggbagtool exits with error if division by zero is detected. Since Aggregate Bag files do not support floating point numbers, the result of the division is rounded to the nearest integer (values ending in .5 are rounded up). Since SiLK 3.22.0.

While not an operation, the next switch is related to --divide and is described here.

--zero-divisor-result={ error | remove | maximum | VALUE }

Specify how to handle division by zero in the --divide operation, which can occur either because the first Aggregate Bag file (the dividend) contains a key that does not exist in the second file (the divisor) or because an individual counter in the divisor is zero. The supported arguments are:

error

Causes rwaggbagtool to exit with an error. This is the default when --zero-divisor-result is not given.

remove

Tells rwaggbagtool to remove this key from the output.

nochange

Tells rwaggbagtool to leave the individual counter in the first Aggregate Bag unchanged.

maximum

Sets the individual counter to the maximum value supported, which is the maximum unsigned 64-bit value (18446744073709551615).

VALUE

Sets the individual counter to VALUE, which can be any unsigned 64-bit value (0 to 18446744073709551615 inclusive).

This switch has no effect when --divide is not used. Since SiLK 3.22.0.

Counter operations

The following switch modifies the counters in an Aggregate Bag file. The operation may be combined with any of those from the previous section. This operation occurs after the above and before any filtering operation.

--scalar-multiply=VALUE
--scalar-multiply=FIELD=VALUE

Multiply all counter fields or one counter field by a value. If the argument is a positive integer value (1 or greater), multiply all counters by that value. If the argument contains an equals sign, treat the part to the left as a counter's field name and the part to the right as the multiplier for that field: a non-negative integer value (0 or greater). The maximum VALUE is 18446744073709551615. This switch may be repeated; when a counter name is repeated or the all-counters form is repeated, the final multiplier is the product of all the values. Since SiLK 3.22.0.

Filtering switches

The following switches remove entries from the Aggregate Bag file based on a field's value. These switches are applied immediately before the output is generated.

--min-field=FIELD=VALUE

Remove from the Aggregate Bag file all entries where the value of the field FIELD is less than VALUE, where VALUE is a textual representation of the field's value as described in the description of the --fields switch in the rwaggbagbuild(1) tool. This switch is ignored if FIELD is not present in the Aggregate Bag. This switch may be repeated. Since SiLK 3.17.0.

--max-field=FIELD=VALUE

Remove from the Aggregate Bag file all entries where the value of the field FIELD is greater than VALUE, where VALUE is a textual representation of the field's value as described in the description of the --fields switch in the rwaggbagbuild(1) tool. This switch is ignored if FIELD is not present in the Aggregate Bag. This switch may be repeated. Since SiLK 3.17.0.

--set-intersect=FIELD=SET_FILE

Read an IPset from the stream SET_FILE, and remove from the Aggregate Bag file all entries where the value of the field FIELD is not present in the IPset. SET_FILE may be the name a file or the string - or stdin to read the IPset from the standard input. This switch is ignored if FIELD is not present in the Aggregate Bag. This switch may be repeated. Since SiLK 3.17.0.

--set-complement=FIELD=SET_FILE

Read an IPset from the stream SET_FILE, and remove from the Aggregate Bag file all entries where the value of the field FIELD is present in the IPset. SET_FILE may be the name a file or the string - or stdin to read the IPset from the standard input. This switch is ignored if FIELD is not present in the Aggregate Bag. This switch may be repeated. Since SiLK 3.17.0.

Output switches

The following switches control the output.

--to-bag=BAG_KEY,BAG_COUNTER

After operating on the Aggregate Bag input files, create a (normal) Bag file from the resulting Aggregate Bag. Use the BAG_KEY field as the key of the Bag, and the BAG_COUNTER field as the counter of the Bag. Write the Bag to the standard output or the destination specified by --output-path. When this switch is used, the only legal field names that may be used in the --insert-field switch are BAG_KEY and BAG_COUNTER. rwaggbagbuild exits with an error when this switch is used with --remove-fields, --select-fields, or --to-ipset.

--to-ipset=FIELD

After operating on the Aggregate Bag input files, create an IPset file from the resulting Aggregate Bag by treating the values in the field named FIELD as IP addresses, inserting the IP addresses into the IPset, and writing the IPset to the standard output or the destination specified by --output-path. When this switch is used, the only legal field name that may be used in the --insert-field switch is FIELD. rwaggbagbuild exits with an error when this switch is used with --remove-fields, --select-fields, or --to-bag.

--ipset-record-version=VERSION

Specify the format of the IPset records that are written to the output when the --to-ipset switch is used. VERSION may be 2, 3, 4, 5 or the special value 0. When the switch is not provided, the SILK_IPSET_RECORD_VERSION environment variable is checked for a version. The default version is 0.

0

Use the default version for an IPv4 IPset and an IPv6 IPset. Use the --help switch to see the versions used for your SiLK installation.

2

Create a file that may hold only IPv4 addresses and is readable by all versions of SiLK.

3

Create a file that may hold IPv4 or IPv6 addresses and is readable by SiLK 3.0 and later.

4

Create a file that may hold IPv4 or IPv6 addresses and is readable by SiLK 3.7 and later. These files are more compact that version 3 and often more compact than version 2.

5

Create a file that may hold only IPv6 addresses and is readable by SiLK 3.14 and later. When this version is specified, IPsets containing only IPv4 addresses are written in version 4. These files are usually more compact that version 4.

--output-path=PATH

Write the resulting Aggregate Bag, IPset (see --to-ipset), or Bag (see --to-bag) to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwaggbagtool exits with an error unless the --modify-inplace switch is given or the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If --output-path is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwaggbagtool to exit with an error.

--modify-inplace

Allow rwaggbagtool to overwrite an existing file and properly account for the output file (PATH) also being an input file. When this switch is given, rwaggbagtool writes the output to a temporary location first, then overwrites PATH. rwaggbagtool attempts to copy the permission, owner, and group from the original file to the new file. The switch is ignored when PATH does not exist or the output is the standard output or standard error. rwaggbagtool exits with an error when this switch is given and PATH is not a regular file. If rwaggbagtool encounters an error or is interrupted prior to closing the temporary file, the temporary file is removed. See also --backup-path. Since SiLK 3.21.0.

--backup-path=BACKUP

Move the file named by --output-path (PATH) to the path BACKUP immediately prior to moving the temporary file created by --modify-inplace over PATH. If BACKUP names a directory, the file is moved into that directory. This switch will overwrite an existing file. If PATH and BACKUP point to the same location, the output is written to PATH and no backup is created. If BACKUP cannot be created, the output is left in the temporary file and rwaggbagtool exits with a message and an error. rwaggbagtool exits with an error if this switch is given without --modify-inplace. Since SiLK 3.21.0.

--note-strip

Do not copy the notes (annotations) from the input files to the output file. Normally notes from the input files are copied to the output.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

Miscellaneous switches

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwaggbagtool searches for the site configuration file in the locations specified in the "FILES" section.

--help

Print the available options and exit.

--help-fields

Print the names and descriptions of the fields that may be used in the command line options that require a field name. Since SiLK 3.22.0.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

Add two files

Read today's incoming flow records by type and use rwaggbag(1) to create an Aggregate Bag file for each, in.aggbag and inweb.aggbag, that count records using the protocol and both ports as the key. Add the counters in the two files to create total.aggbag. Use rwaggbagcat(1) to display the result.

 $ rwfilter --type=in --all=-                               \
   | rwaggbag --key=sport,dport,proto --counter=records     \
        --output-path=in.aggbag
 $ rwfilter --type=inweb --all=-                            \
   | rwaggbag --key=sport,dport,proto --counter=records     \
        --output-path=inweb.aggbag
 $ rwaggbagtool --add in.aggbag inweb.aggbag --output-path=total.aggbag
 $ rwaggbagcat total.aggbag

Subtract a file

Subtract inweb.aggbag from total.aggbag.

 $ rwaggbagtool --subtract total.aggbag inweb.aggbag    \
   | rwaggbagcat

Percent of traffic

Compute the percent of all incoming traffic per protocol and ports that was stored in the inweb type by multiplying the counters in inweb.aggbag by 100 and dividing by total.aggbag.

 $ rwaggbagtool --scalar-multiply=100 inweb.aggbag  \
   | rwaggbagtool --divide stdin total.aggbag       \
   | rwaggbagcat

Create a file

Create an Aggregate Bag file from data.rw where the ports are the key and that sums the bytes and packets.

 $ rwaggbag --key=sport,dport                       \
        --counter=sum-bytes,sum-packets data.rw     \
        --output-path=my-ab.aggbag

Choose selected fields

Using the previous file, get just the source port and byte count from the file my-ab.aggbag. One approach is to remove the destination port and packet count.

 $ rwaggbagtool --remove=dport,sum-packets my-ab.aggbag  \
        --output-path=source-bytes.aggbag

The other approach selects the source port and byte count.

 $ rwaggbagtool --select=sport,sum-bytes my-ag.aggbag    \
        --output-path=source-bytes.aggbag

To replace the packet count in my-ab.aggbag with zeros, remove the field and insert it with the value you want.

 $ rwaggbagtool --remove=sum-packets --insert=sum-packets=0  \
        my-ab.aggbag --output-path=zero-packets.aggbag

Convert to different formats

To create a regular Bag with the source port and byte count from my-ab.aggbag, use the --to-bag switch:

 $ rwaggbagtool --to-bag=sport,sum-bytes my-ab.aggbag  \
        --output-path=sport-byte.bag

The --to-ipset switch works similarly:

 $ rwaggbag --key=sipv6,dipv6 --counter=records data-v6.rw  \
        --output-path=ips.aggbag
 $ rwaggbagtool --to-ipset=dipv6 --output-path=dip.set

ENVIRONMENT

SILK_IPSET_RECORD_VERSION

This environment variable is used as the value for the --ipset-record-version when that switch is not provided.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the "FILES" section, rwaggbagtool may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwaggbagtool may use this environment variable. See the "FILES" section for details.

FILES

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/share/silk/silk.conf
/usr/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

NOTES

The Aggregate Bag tools were added in SiLK 3.15.0.

SiLK 3.17.0 added the --min-field, --max-field, --set-intersect, and --set-complement switches.

Support for country codes was added in SiLK 3.19.0.

The --modify-inplace switch was added in SiLK 3.21. When --backup-path is also given, there is a small time window when the original file does not exist: the time between moving the original file to the backup location and moving the temporary file into place.

SEE ALSO

rwaggbag(1), rwaggbagbuild(1), rwaggbagcat(1), rwfilter(1), rwfileinfo(1), silk(7), zlib(3)