NAME

rwbagbuild - Create a binary Bag from non-flow data

SYNOPSIS

  rwbagbuild { --set-input=SETFILE | --bag-input=TEXTFILE }
        [--delimiter=C] [--proto-port-delimiter=C]
        [--default-count=DEFAULTCOUNT]
        [--key-type=FIELD_TYPE] [--counter-type=FIELD_TYPE]
        [{ --pmap-file=PATH | --pmap-file=MAPNAME:PATH }]
        [--note-add=TEXT] [--note-file-add=FILE]
        [--invocation-strip] [--compression-method=COMP_METHOD]
        [--output-path=PATH]

  rwbagbuild --help

  rwbagbuild --version

DESCRIPTION

rwbagbuild builds a binary Bag file from an IPset file or from textual input. A Bag is a set of keys where each key is associated with a counter. Usually the key is some aspect of a flow record (an IP address, a port, the protocol, et cetera), and the counter is a volume (such as the number of flow records or the sum or bytes or packets) for the flow records that match that key.

Either --set-input or --bag-input must be provided to specify the type and the location of the input file. To read from the standard input, specify stdin or - as the argument to the switch.

Each occurrence of a unique key adds a counter value to the Bag file for that key, where the counter is the value specified by --default-count, a value specified on a line in the textual input, or a fallback value of 1. If the addition causes an overflow of the maximum counter value (18446744073709551614), the counter is set to the maximum. A message is printed to the standard error the first time an overflow condition is detected.

SET INPUT

When creating a Bag from an IPset, the count associated with each IP address is the value specified by the --default-count switch or 1 if the switch is not provided.

If the --key-type is sip-country, dip-country, or any-country, each IP address is mapped to its country code using the country code mapping file (see "FILES") and that key is added to the Bag file with the --default-count value.

If the --key-type is sip-pmap, dip-pmap, or any-ip-pmap, each IP address is mapped to a value found in the prefix map file specified in --pmap-file and that value is added to the Bag file with the --default-count value.

BAG (TEXTUAL) INPUT

The textual input read from the argument to the --bag-input switch is processed a line at a time. Comments begin with a '#'-character and continue to the end of the line; they are stripped from each line. Any line that is blank or contains only whitespace is ignored. All other lines must contain a valid key or key-counter pair; whitespace around the key and counter is ignored. The key and counter are separated by a one-character delimiter. The default delimiter is vertical bar ('|'); use --delimiter to specify a different delimiter.

Each line that is not ignored must begin with a key. The accepted formats of the key are described below.

When the --default-count switch is given, rwbagtool only parses the key and ignores everything on a line to the right of the first delimiter. To re-iterate, the --default-count switch overrides any counter present on the line.

If the delimiter is not present on a line, rwbagtool parses the key and adds the --default-count value (or the fallback value of 1) to the Bag for that key.

When --default-count is not given, any text between the first delimiter and optional second delimiter on a line is treated as the counter. If the counter contains only whitespace, the counter for the key is incremented by 1; otherwise, the counter must be a (decimal) number from 0 to 18446744073709551614 inclusive. If a second delimiter is present, it and any text that follows it is ignored.

rwbagbuild prints an error and exits when a key or counter cannot be parsed.

Format of the counter

The counter is any non-negative (decimal) integer value from 0 to 18446744073709551614 inclusive (the maximum is one less than the maximum unsigned 64-bit value). When writing the Bag file, keys whose counter is zero are not written to the file.

Format of the Key

The key is a 32-bit integer, an IP address, a CIDR block, a SiLK IPWildcard, or a pair of numbers when the key-type is a protocol-port prefix map file.

For key-types that use fewer than 32-bits, rwbagbuild does not verify the validity of the key. For example, it is possible to have 257 as a key in Bag whose key-type is protocol.

rwbagbuild parses specific key-types as follows:

sIPv4, dIPv4, nhIPv4, any-IPv4

key is an IPv4 address or a 32-bit value; key-type set to corresponding IPv6 type when an IPv6 address is present. A CIDR block or SiLK IPWildcard representing multiple addresses adds multiple entries to the Bag

sIPv6, dIPv6, nhIPv6, any-IPv6

key is an IPv6 address. An IPv4 address is mapped into the ::ffff:0:0/96 netblock. All keys must be IP addresses (integers are not allowed).

flags, initialFlags, sessionFlags

key is the numeric value of the flags, 17 = FIN|ACK

sTime, eTime, any-time

key is seconds since the UNIX epoch

duration

key represents seconds

sensor

key is the numeric sensor ID

sip-country, dip-country, any-country

key is an IP address; the country_codes.pmap prefix map file is used to map the IP to a country code that is stored in the Bag

sip-pmap, dip-pmap, any-ip-pmap

key is an IP address; the specified --prefix-map file is used to map the IP to a value that is stored in the Bag

sport-pmap, dport-pmap, any-port-pmap

key is comprised of two numbers separated by a delimiter: a protocol (8-bit number) and a port (16-bit number). Those values are looked up in the specified --prefix-map file and the result is stored in the Bag. The delimiter separating the protocol and port may be set by --proto-port-delimiter. If not explicitly set, it is the same as the delimiter specified to --delimiter. The default delimiter is '|'.

attributes

these bits of the key are relevant, though any 32-bit value is accepted: 0x08=F, 0x10=S, 0x20=T, 0x40=C

class, type

key is treated as a number

An IP address or integer key must be expressed in one of the following formats. rwbagbuild complains if the key field contains a mixture of IPv6 addresses and integer values.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

The first two switches control the type of input; exactly one must be provided:

--set-input=SETFILE

Create a Bag from an IPset. SETFILE is a filename, a named pipe, or the keyword stdin or - to read the IPset from the standard input. Counts have a volume of 1 when the --default-count switch is not specified. (IPsets are typically created by rwset(1) or rwsetbuild(1).)

--bag-input=TEXTFILE

Create a Bag from a delimited text file. TEXTFILE is a filename, a named pipe, or the keyword stdin or - to read the text from the standard input. See the "DESCRIPTION" section for the syntax of the TEXTFILE.

--delimiter=C

Expect the character C between each key-counter pair in the TEXTFILE read by the --bag-input switch. The default delimiter is the vertical pipe ('|'). The delimiter is ignored if the --set-input switch is specified. When the delimiter is a whitespace character, any amount of whitespace may surround and separate the key and counter. Since '#' is used to denote comments and newline is used to denote records, neither is a valid delimiter character.

--proto-port-delimiter=C

Expect the character C between the protocol and port that comprise a key when the --key-type is sport-pmap, dport-pmap, or any-port-pmap. Unless this switch is specified, rwbagbuild expects the key-counter delimiter to appear between the protocol and port.

--default-count=DEFAULTCOUNT

Override the counts of all values in the input text or IPset with the value of DEFAULTCOUNT. DEFAULTCOUNT must be a positive integer from 1 to 18446744073709551614 inclusive.

--key-type=FIELD_TYPE

Write a entry into the header of the Bag file that specifies the key contains FIELD_TYPE values. When this switch is not specified, the key type of the Bag is set to custom. The FIELD_TYPE is case insensitive. The supported FIELD_TYPEs are:

sIPv4

source IP address, IPv4 only

dIPv4

destination IP address, IPv4 only

sPort

source port

dPort

destination port

protocol

IP protocol

packets

packets, see also sum-packets

bytes

bytes, see also sum-bytes

flags

an unsigned bitwise OR of TCP flags

sTime

starting time of the flow record, seconds resolution

duration

duration of the flow record, seconds resolution

eTime

ending time of the flow record, seconds resolution

sensor

sensor ID

input

SNMP input

output

SNMP output

nhIPv4

next hop IP address, IPv4 only

initialFlags

TCP flags on first packet in the flow

sessionFlags

bitwise OR of TCP flags on all packets in the flow except the first

attributes

flow attributes set by the flow generator

application

guess as to the content of the flow, as set by the flow generator

class

class of the sensor

type

type of the sensor

icmpTypeCode

an encoded version of the ICMP type and code, where the type is in the upper byte and the code is in the lower byte

sIPv6

source IP, IPv6

dIPv6

destination IP, IPv6

nhIPv6

next hop IP, IPv6

records

count of flows

sum-packets

sum of packet counts

sum-bytes

sum of byte counts

sum-duration

sum of duration values

any-IPv4

a generic IPv4 address

any-IPv6

a generic IPv6 address

any-port

a generic port

any-snmp

a generic SNMP value

any-time

a generic time value, in seconds resolution

sip-country

the country code of the source IP address. For textual input, the key column must contain an IP address or an integer. rwbagbuild maps the IP address to a country code and stores the country code in the bag. Uses the mapping file specified by the SILK_COUNTRY_CODES environment variable or the country_codes.pmap mapping file, as described in "FILES". (See also ccfilter(3).) Since SiLK 3.12.0.

dip-country

the country code of the destination IP. See sip-country. Since SiLK 3.12.0.

any-country

the country code of any IP address. See sip-country. Since SiLK 3.12.0.

sip-pmap

a prefix map value found from a source IP address. Maps each IP address in the key column to a value from a prefix map file and stores the value in the bag. The type of the prefix map must be IPv4-address or IPv4-address. Use the --pmap-file switch to specify the path to the file. Since SiLK 3.12.0.

dip-pmap

a prefix map value found from a destination IP address. See sip-pmap. Since SiLK 3.12.0.

any-ip-pmap:PMAP_PATH

a prefix map value found from any IP address. See sip-pmap. Since SiLK 3.12.0.

sport-pmap

a prefix map value found from a protocol/source-port pair. Each key must contain two values, a protocol and a port. Maps each protocol/port pair to a value from a prefix map file and stores the value in the bag. The type of the prefix map must be proto-port. Use the --pmap-file switch to specify the path to the file. Since SiLK 3.12.0.

dport-pmap

a prefix map value found from a protocol/destination-port pair. See sport-pmap. Since SiLK 3.12.0.

any-port-pmap

a prefix map value found from a protocol/port pair. See sport-pmap. Since SiLK 3.12.0.

custom

a number

--counter-type=FIELD_TYPE

Write a entry into the header of the Bag file that specifies the counter contains FIELD_TYPE values. When this switch is not specified, the counter type of the Bag is set to custom. Although the supported FIELD_TYPEs are the same as those for the key, the value is always treated as a number that can be summed. rwbagbuild does not use the country code or prefix map when parsing the value field.

--pmap-file=PATH
--pmap-file=MAPNAME:PATH

When the key-type is one of sip-pmap, dip-pmap, any-ip-pmap, sport-pmap, dport-pmap, or any-port-pmap, use the prefix map file located at PATH to map the key to a string. Specify PATH as - or stdin to read from the standard input. A map-name may be included in the argument to the switch, but rwbagbuild currently does not use the map-name. To create a prefix map file, use rwpmapbuild(1). Since SiLK 3.12.0.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--invocation-strip

Do not record the command used to create the Bag file in the output. When this switch is not given, the invocation is written to the file's header, and the invocation may be viewed with rwfileinfo(1). Since SiLK 3.12.0.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--output-path=PATH

Write the binary Bag output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwbagtool exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwbagtool to exit with an error.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

Create a bag with IP addresses as keys from a text file

Assume the file mybag.txt contains the following lines, where each line contains an IP address, a comma as a delimiter, a count, and ends with a newline.

 192.168.0.1,5
 192.168.0.2,500
 192.168.0.3,3
 192.168.0.4,14
 192.168.0.5,5

To build a bag with it:

 $ rwbagbuild --bag-input=mybag.txt --delimiter=, > mybag.bag

Use rwbagcat(1) to view its contents:

 $ rwbagcat mybag.bag
     192.168.0.1|                   5|
     192.168.0.2|                 500|
     192.168.0.3|                   3|
     192.168.0.4|                  14|
     192.168.0.5|                   5|

Create a bag with protocols as keys from a text file

To create a Bag of protocol data from the text file myproto.txt:

   1|      4|
   6|    138|
  17|    131|

use

 $ rwbagbuild --key-type=proto --bag-input=myproto.txt > myproto.bag
 $ rwbagcat myproto.bag
          1|                   4|
          6|                 138|
         17|                 131|

When the --key-type switch is specified, rwbagcat knows the keys should be printed as integers, and rwfileinfo(1) shows the type of the key:

 $ rwfileinfo --fields=bag myproto.bag
 myproto.bag:
   bag            key: protocol @ 4 octets; counter: custom @ 8 octets

Without the --key-type switch, rwbagbuild assumes the integers in myproto.txt represent IP addresses:

 $ rwbagbuild --bag-input=myproto.txt | rwbagcat
         0.0.0.1|                   4|
         0.0.0.6|                 138|
        0.0.0.17|                 131|

Although the --key-format switch on rwbagcat may be used to choose how the keys are displayed, it is generally better to use the --key-type switch when creating the bag.

$ rwbagbuild --bag-input=myproto.txt | rwbagcat --key-format=decimal 1| 4| 6| 138| 17| 131|

Create a bag and override the existing counter

To ignore the counts that exist in myproto.txt and set the counts for each protocol to 1, use the --default-count switch which overrides the existing value:

 $ rwbagbuild --key-type=protocol --bag-input=myproto.txt  \
        --default-count=1 --output-path=myproto1.bag
 $ rwbagcat myproto1.bag
          1|                   1|
          6|                   1|
         17|                   1|

Create a bag from multiple text files

To create a bag from multiple text files (X.txt, Y.txt, and Z.txt), use the UNIX cat(1) utility to concatenate the files and have rwbagbuild read the combined input. To avoid creating a temporary file, feed the output of cat as the standard input to rwbagbuild.

 $ cat X.txt Y.txt Z.txt                                \
   | rwbagbuild --bag-input=- --output-path=xyz.bag

For each key that appears in multiple input files, rwbagbuild sums the counters for the key.

Create a bag with IP addresses as keys from an IPset file

Given the IP set myset.set, create a bag where every entry in the bag has a count of 3:

 $ rwbagbuild --set-input=myset.set --default-count=3  \
        --out=mybag2.bag

Create a bag from multiple IPset files

Suppose we have three IPset files, A.set, B.set, and C.set:

 $ rwsetcat A.set
 10.0.0.1
 10.0.0.2
 $ rwsetcat B.set
 10.0.0.2
 10.0.0.3
 $ rwsetcat C.set
 10.0.0.1
 10.0.0.2
 10.0.0.4

We want to create a bag file from these IPset files where the count for each IP address is the number of files that IP appears in. rwbagbuild accepts a single file as an argument, so we cannot do the following:

 $ rwbagbuild --set-input=A.set --set-input=B.set ...   # WRONG!

(Even if we could repeat the --set-input switch, specifying it multiple times would be annoying if we had 300 files instead of only 3.)

Since IPset files are (mathematical) sets, joining them together first with rwsettool(1) and then running rwbagbuild causes each IP address to get a count of 1:

 $ rwsettool --union A.set B.set C.set   \
   | rwbagbuild --set-input=-            \
   | rwbagcat
        10.0.0.1|                   1|
        10.0.0.2|                   1|
        10.0.0.3|                   1|
        10.0.0.4|                   1|

When rwbagbuild is processing textual input, it sums the counters for keys that appear in the input multiple times. We can use rwsetcat(1) to convert each IPset file to text and feed that as single textual stream to rwbagbuild. Use the --cidr-blocks switch on rwsetcat to reduce the amount of input that rwbagbuild must process. This is probably the best approach to the problem:

 $ rwsetcat --cidr-block *.set | rwbagbuild --bag-input=- > total1.bag
 $ rwbagcat total1.bag
        10.0.0.1|                   2|
        10.0.0.2|                   3|
        10.0.0.3|                   1|
        10.0.0.4|                   1|

A less efficient solution is to convert each IPset to a bag and then use rwbagtool(1) to add the bags together:

 $ for i in *.set ; do
        rwbagbuild --set-input=$i --output-path=/tmp/$i.bag ;
   done
 $ rwbagtool --add /tmp/*.set.bag > total2.bag
 $ rm /tmp/*.set.bag

There is no need to create a bag file for each IPset; we can get by with only two bag files, the final bag file, total3.bag, and a temporary file, tmp.bag. We initialize total3.bag to an empty bag. As we loop over each IPset, rwbagbuild converts the IPset to a bag on its standard output, rwbagtool creates tmp.bag by adding its standard input to total3.bag, and we rename tmp.bag to total3.bag:

 $ rwbagbuild --bag-input=/dev/null --output-path=total3.bag
 $ for i in *.set ; do
        rwbagbuild --set-input=$i  \
        | rwbagtool --output-path=tmp.bag --add total3.bag stdin ;
        /bin/mv tmp.bag total3.bag ;
   done
 $ rwbagcat total3.bag
        10.0.0.1|                   2|
        10.0.0.2|                   3|
        10.0.0.3|                   1|
        10.0.0.4|                   1|

Create a bag where the key is the country code

As of SiLK 3.12.0, a Bag file may contain a country code as its key. In rwbagbuild, specify the --key-type as sip-country, dip-country, or any-country. That key-type works with either textual input or IPset input. The form of the textual input when mapping an IP address to a country code is identical to that when building an ordinary bag.

 $ rwbagbuild --bag-input=mybag.txt --delimiter=,       \
        --key-type=any-country --output-path=scc1.bag
 $ rwbagcat scc1.bag
 --|                 527|

 $ rwbagbuild --set-input=A.set --key-type=any-country  \
        --output-path=scc2.bag
 $ rwbagcat scc2.bag
 --|                   2|

Create a bag using a prefix map value as the key

rwbagbuild and rwbag(1) can use a prefix map file as the key in a Bag file as of SiLK 3.12.0. Use the --pmap-file switch to specify the prefix map file, and specify the --key-type using one of the types that end in -pmap.

For a prefix map that maps by IP addresses, use a key-type of sip-pmap, dip-pmap, or any-ip-pmap. The input may be an IPset or text. The form of the textual input is the same as for a normal bag file.

 $ rwbagbuild --set-input=A.set --key-type=sip-pmap     \
        --pmap-file=ip-map.pmap --output=test1.bag

 $ rwbagbuild --bag-input=mybag.txt --delimiter=,       \
        --key-type=sip-pmap --pmap-file=ip-map.pmap     \
        --output-path=test2.bag

The prefix map file is not stored as part of the Bag, so you must provide the name of the prefix map when running rwbagcat(1).

 $ rwbagcat --pmap-file=ip-map.pmap test2.bag
          internal|                 527|

For a prefix map file that maps by protocol-port pairs, the textual input must contain either three column (protocol, port, counter) or two columns (protocol and port) which uses the --default-counter.

 $ cat proto-port-count.txt
 6| 25|  800|
 6| 80| 5642|
 6| 22
 $ rwbagbuild --key-type=sport-pmap                 \
        --bag-input=proto-port-count.txt            \
        --pmap-file=proto-port-map.pmap             \
        --output-path=service.bag
 $ rwbagcat --pmap-file=port-map.pmap service.bag
   TCP/SSH|                   1|
  TCP/SMTP|                 800|
  TCP/HTTP|                5642|

Delimiter examples

A single value followed by an optional delimiter is treated as a key. The counter for those keys is set to 1. A delimiter may follow the count, and any text after that delimiter is ignored. When the counter is 0, the key is not inserted into the Bag.

 $ cat sport.txt
 0
 1|
 2|3
 4|5|
 6|7|8|
 9|10|||||
 11|0
 $ rwbagbuild --bag-input=sport.txt --key-type=sport \
   | rwbagcat
          0|                   1|
          1|                   1|
          2|                   3|
          4|                   5|
          6|                   7|
          9|                  10|

The --default-counter switch overrides the count.

 $ rwbagbuild --bag-input=sport.txt --key-type=sport --default-count=1 \
   | rwbagcat
          0|                   1|
          1|                   1|
          2|                   1|
          4|                   1|
          6|                   1|
          9|                   1|
         11|                   1|

In fact, the --default-counter switch causes rwbagbuild to ignore all text after the delimiter that follows the key.

 $ echo '12|13 14' | rwbagbuild --bag-input=- --output=/dev/null
 rwbagbuild: Error parsing line 1: Extra text after count
 rwbagbuild: Error creating bag from text bag

 $ echo '12|13 14' | rwbagbuild --bag-input=- --default-count=1 \
   | rwbagcat --key-format=decimal
         12|                   1|

ENVIRONMENT

SILK_COUNTRY_CODES

This environment variable allows the user to specify the country code mapping file that rwbagbuild uses when mapping an IP to a country for the sip-country, dip-country, or any-country keys. The value may be a complete path or a file relative to the SILK_PATH. See the "FILES" section for standard locations of this file.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.

SILK_PATH

This environment variable gives the root of the install tree. When searching for the country code mapping file, rwbagbuild may use this environment variable. See the "FILES" section for details.

FILES

$SILK_COUNTRY_CODES
$SILK_PATH/share/silk/country_codes.pmap
$SILK_PATH/share/country_codes.pmap
/usr/share/silk/country_codes.pmap
/usr/share/country_codes.pmap

Possible locations for the country code mapping file required by the sip-country, dip-country, and any-country key-types.

SEE ALSO

rwbag(1), rwbagcat(1), rwbagtool(1), rwfileinfo(1), rwpmapbuild(1), rwset(1), rwsetbuild(1), rwsetcat(1), rwsettool(1), ccfilter(3), silk(7), zlib(3), cat(1)

BUGS

rwbagbuild should verify the key's value is within the allowed range for the specified --key-type.

rwbagbuild should accept non-numeric values for some fields, such as times and TCP flags.

The --default-count switch is poorly named.