NAME

rwbagbuild - Create a binary Bag from non-flow data

SYNOPSIS

  rwbagbuild { --set-input=SETFILE | --bag-input=TEXTFILE }
        [--delimiter=C] [--proto-port-delimiter=C]
        [--default-count=DEFAULTCOUNT]
        [--key-type=FIELD_TYPE] [--counter-type=FIELD_TYPE]
        [{ --pmap-file=PATH | --pmap-file=MAPNAME:PATH }]
        [--note-add=TEXT] [--note-file-add=FILE]
        [--invocation-strip] [--compression-method=COMP_METHOD]
        [--output-path=PATH]

  rwbagbuild --help

  rwbagbuild --version

DESCRIPTION

rwbagbuild builds a binary Bag file from an IPset file or from textual input. A Bag is a set where each key is associated with a counter. Usually the key is some aspect of a flow record (an IP address, a port, the protocol, et cetera), and the counter is a volume (such as the number of flow records or the sum or bytes or packets) for the flow records that match that key.

Either --set-input or --bag-input must be provided to specify the type and the location of the input file. To read from the standard input, specify stdin or - as the argument to the switch.

SET INPUT

When creating a Bag from an IPset, the value associated with each IP address is the value specified by the --default-count switch or 1 if the switch is not provided.

If the --key-type is sip-country, dip-country, or any-country, each IP address is mapped to its country code using the country code mapping file (see "FILES") and that value is stored in the Bag file.

If the --key-type is sip-pmap, dip-pmap, or any-ip-pmap, each IP address is mapped to a value found in the prefix map file specified in --pmap-file and that value is stored in the Bag file.

BAG (TEXTUAL) INPUT

The textual input read from the argument to the --bag-input switch is processed a line at a time. Comments begin with a '#'-character and continue to the end of the line; they are stripped from each line. Any line that is blank or contains only whitespace is ignored. All other lines must contain a valid key or key-counter pair; whitespace around the key and counter is ignored.

The key is typically a 32-bit integer, an IP address, a CIDR block, or a SiLK IPWildcard. When the --key-type is sport-pmap, dport-pmap, or any-port-pmap, the key is comprised of two numbers: a protocol (8-bit number) and a port (16-bit number). The delimiter separating the protocol and port may be set by --proto-port-delimiter. If not explicitly set, it is the same as the delimiter specified to --delimiter. The default delimiter is '|'.

An IP address or integer key must be expressed in one of the following formats. rwbagbuild complains if the key field contains a mixture of IPv6 addresses and integer values.

A line may contain only a key or it may contain a key and counter. If the delimiter character is not present on a line, the line must contain only a key. If the delimiter is present, the line must contain key before the delimiter and an integer counter after the delimiter. These lines may have a delimiter after the counter; this delimiter and any text following it are ignored.

When the --default-count switch is specified, its value is used as the count for each key, and any counter value present on the line is ignored. Otherwise, the parsed count is used, or 1 is used as the counter if no delimiter was present.

For each key-count pair, the key is inserted into Bag with its count or, if the key is already present in the Bag, its total count is incremented by the count from this line. When using the --default-count switch, the count for a key that appears in the input N times is the product of N and DEFAULTCOUNT.

rwbagbuild prints an error and exits when a key or counter cannot be parsed or when a line contains a delimiter character after the key but has no count,

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

The following two switches control the type of input; one and only one must be provided:

--set-input=SETFILE

Create a Bag from an IPset. SETFILE is a filename, a named pipe, or the keyword stdin or - to read the IPset from the standard input. Counts have a volume of 1 when the --default-count switch is not specified. (IPsets are typically created by rwset(1) or rwsetbuild(1).)

--bag-input=TEXTFILE

Create a Bag from a delimited text file. TEXTFILE is a filename, a named pipe, or the keyword stdin or - to read the text from the standard input. See the "DESCRIPTION" section for the syntax of the TEXTFILE.

--delimiter=C

Expect the character C between each key-counter pair in the TEXTFILE read by the --bag-input switch. The default delimiter is the vertical pipe ('|'). The delimiter is ignored if the --set-input switch is specified. When the delimiter is a whitespace character, any amount of whitespace may surround and separate the key and counter. Since '#' is used to denote comments and newline is used to denote records, neither is a valid delimiter character.

--proto-port-delimiter=C

Expect the character C between the protocol and port that comprise a key when the --key-type is sport-pmap, dport-pmap, or any-port-pmap. Unless this switch is specified, rwbagbuild expects the key-counter delimiter to appear between the protocol and port.

--default-count=DEFAULTCOUNT

Override the counts of all values in the input text or IPset with the value of DEFAULTCOUNT. DEFAULTCOUNT must be a positive integer.

--key-type=FIELD_TYPE

Write a entry into the header of the Bag file that specifies the key contains FIELD_TYPE values. When this switch is not specified, the key type of the Bag is set to custom. The FIELD_TYPE is case insensitive. The supported FIELD_TYPEs are:

sIPv4

source IP address, IPv4 only

dIPv4

destination IP address, IPv4 only

sPort

source port

dPort

destination port

protocol

IP protocol

packets

packets, see also sum-packets

bytes

bytes, see also sum-bytes

flags

bitwise OR of TCP flags

sTime

starting time of the flow record, seconds resolution

duration

duration of the flow record, seconds resolution

eTime

ending time of the flow record, seconds resolution

sensor

sensor ID

input

SNMP input

output

SNMP output

nhIPv4

next hop IP address, IPv4 only

initialFlags

TCP flags on first packet in the flow

sessionFlags

bitwise OR of TCP flags on all packets in the flow except the first

attributes

flow attributes set by the flow generator

application

guess as to the content of the flow, as set by the flow generator

class

class of the sensor

type

type of the sensor

icmpTypeCode

an encoded version of the ICMP type and code, where the type is in the upper byte and the code is in the lower byte

sIPv6

source IP, IPv6

dIPv6

destination IP, IPv6

nhIPv6

next hop IP, IPv6

records

count of flows

sum-packets

sum of packet counts

sum-bytes

sum of byte counts

sum-duration

sum of duration values

any-IPv4

a generic IPv4 address

any-IPv6

a generic IPv6 address

any-port

a generic port

any-snmp

a generic SNMP value

any-time

a generic time value, in seconds resolution

sip-country

the country code of the source IP. Maps each IP address in the key column to a country code and stores the country code in the bag. Uses the mapping file specified by the SILK_COUNTRY_CODES environment variable or the country_codes.pmap mapping file, as described in "FILES". (See also ccfilter(3).) The abbreviations are those used by the Root-Zone Whois Index (see for example http://www.iana.org/cctld/cctld-whois.htm) or the following special codes: -- N/A (e.g. private and experimental reserved addresses); a1 anonymous proxy; a2 satellite provider; o1 other Since SiLK 3.12.0.

dip-country

the country code of the destination IP. See sip-country. Since SiLK 3.12.0.

any-country

the country code of any IP address. See sip-country. Since SiLK 3.12.0.

sip-pmap

a prefix map value found from a source IP address. Maps each IP address in the key column to a value from a prefix map file and stores the value in the bag. The type of the prefix map must be IPv4-address or IPv4-address. Use the --pmap-file switch to specify the path to the file. Since SiLK 3.12.0.

dip-pmap

a prefix map value found from a destination IP address. See sip-pmap. Since SiLK 3.12.0.

any-ip-pmap:PMAP_PATH

a prefix map value found from any IP address. See sip-pmap. Since SiLK 3.12.0.

sport-pmap

a prefix map value found from a protocol/source-port pair. Each key must contain two values, a protocol and a port. Maps each protocol/port pair to a value from a prefix map file and stores the value in the bag. The type of the prefix map must be proto-port. Use the --pmap-file switch to specify the path to the file. Since SiLK 3.12.0.

dport-pmap

a prefix map value found from a protocol/destination-port pair. See sport-pmap. Since SiLK 3.12.0.

any-port-pmap

a prefix map value found from a protocol/port pair. See sport-pmap. Since SiLK 3.12.0.

custom

a number

--counter-type=FIELD_TYPE

Write a entry into the header of the Bag file that specifies the counter contains FIELD_TYPE values. When this switch is not specified, the counter type of the Bag is set to custom. Although the supported FIELD_TYPEs are the same as those for the key, the value is always treated as a number that can be summed. rwbagbuild does not use the country code or prefix map when parsing the value field.

--pmap-file=PATH
--pmap-file=MAPNAME:PATH

When the key-type is one of sip-pmap, dip-pmap, any-ip-pmap, sport-pmap, dport-pmap, or any-port-pmap, use the prefix map file located at PATH to map the key to a string. Specify PATH as - or stdin to read from the standard input. A map-name may be included in the argument to the switch, but rwbagbuild currently does not use the map-name. To create a prefix map file, use rwpmapbuild(1). Since SiLK 3.12.0.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--invocation-strip

Do not record the command used to create the Bag file in the output. When this switch is not given, the invocation is written to the file's header, and the invocation may be viewed with rwfileinfo(1). Since SiLK 3.12.0.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--output-path=PATH

Write the binary Bag output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwbagtool exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwbagtool to exit with an error.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

Create a bag with IP addresses as keys from a text file

Assume the file mybag.txt contains the following lines, where each line contains an IP address, a comma as a delimiter, a count, and ends with a newline.

 192.168.0.1,5
 192.168.0.2,500
 192.168.0.3,3
 192.168.0.4,14
 192.168.0.5,5

To build a bag with it:

 $ rwbagbuild --bag-input=mybag.txt --delimiter=, > mybag.bag

Use rwbagcat(1) to view its contents:

 $ rwbagcat mybag.bag
     192.168.0.1|                   5|
     192.168.0.2|                 500|
     192.168.0.3|                   3|
     192.168.0.4|                  14|
     192.168.0.5|                   5|

Create a bag with protocols as keys from a text file

To create a Bag of protocol data from the text file myproto.txt:

   1|      4|
   6|    138|
  17|    131|

use

 $ rwbagbuild --key-type=proto --bag-input=myproto.txt > myproto.bag
 $ rwbagcat myproto.bag
          1|                   4|
          6|                 138|
         17|                 131|

When the --key-type switch is specified, rwbagcat knows the keys should be printed as integers, and rwfileinfo(1) shows the type of the key:

 $ rwfileinfo --fields=bag myproto.bag
 myproto.bag:
   bag            key: protocol @ 4 octets; counter: custom @ 8 octets

Without the --key-type switch, rwbagbuild assumes the integers in myproto.txt represent IP addresses:

 $ rwbagbuild --bag-input=myproto.txt | rwbagcat
         0.0.0.1|                   4|
         0.0.0.6|                 138|
        0.0.0.17|                 131|

Although the --integer-keys switch on rwbagcat forces it to print keys as integers, it is generally better to use the --key-type switch when creating the bag.

$ rwbagbuild --bag-input=myproto.txt | rwbagcat --integer-keys 1| 4| 6| 138| 17| 131|

Create a bag and override the existing counter

To ignore the counts that exist in myproto.txt and set the counts for each protocol to 1, use the --default-count switch which overrides the existing value:

 $ rwbagbuild --key-type=protocol --bag-input=myproto.txt  \
        --default-count=1 --output-path=myproto1.bag
 $ rwbagcat myproto1.bag
          1|                   1|
          6|                   1|
         17|                   1|

Create a bag from multiple text files

To create a bag from multiple text files (X.txt, Y.txt, and Z.txt), use the UNIX cat(1) utility to concatenate the files and have rwbagbuild read the combined input. To avoid creating a temporary file, feed the output of cat as the standard input to rwbagbuild.

 $ cat X.txt Y.txt Z.txt                                \
   | rwbagbuild --bag-input=- --output-path=xyz.bag

For each key that appears in multiple input files, rwbagbuild sums the counters for the key.

Create a bag with IP addresses as keys from an IPset file

Given the IP set myset.set, create a bag where every entry in the bag has a count of 3:

 $ rwbagbuild --set-input=myset.set --default-count=3  \
        --out=mybag2.bag

Create a bag from multiple IPset files

Suppose we have three IPset files, A.set, B.set, and C.set:

 $ rwsetcat A.set
 10.0.0.1
 10.0.0.2
 $ rwsetcat B.set
 10.0.0.2
 10.0.0.3
 $ rwsetcat C.set
 10.0.0.1
 10.0.0.2
 10.0.0.4

We want to create a bag file from these IPset files where the count for each IP address is the number of files that IP appears in. rwbagbuild accepts a single file as an argument, so we cannot do the following:

 $ rwbagbuild --set-input=A.set --set-input=B.set ...   # WRONG!

(Even if we could repeat the --set-input switch, specifying it multiple times would be annoying if we had 300 files instead of only 3.)

Since IPset files are (mathematical) sets, joining them together first with rwsettool(1) and then running rwbagbuild causes each IP address to get a count of 1:

 $ rwsettool --union A.set B.set C.set   \
   | rwbagbuild --set-input=-            \
   | rwbagcat
        10.0.0.1|                   1|
        10.0.0.2|                   1|
        10.0.0.3|                   1|
        10.0.0.4|                   1|

When rwbagbuild is processing textual input, it sums the counters for keys that appear in the input multiple times. We can use rwsetcat(1) to convert each IPset file to text and feed that as single textual stream to rwbagbuild. Use the --cidr-blocks switch on rwsetcat to reduce the amount of input that rwbagbuild must process. This is probably the best approach to the problem:

 $ rwsetcat --cidr-block *.set | rwbagbuild --bag-input=- > total1.bag
 $ rwbagcat total1.bag
        10.0.0.1|                   2|
        10.0.0.2|                   3|
        10.0.0.3|                   1|
        10.0.0.4|                   1|

A less efficient solution is to convert each IPset to a bag and then use rwbagtool(1) to add the bags together:

 $ for i in *.set ; do
        rwbagbuild --set-input=$i --output-file=/tmp/$i.bag ;
   done
 $ rwbagtool --add /tmp/*.set.bag > total2.bag
 $ rm /tmp/*.set.bag

There is no need to create a bag file for each IPset; we can get by with only two bag files, the final bag file, total3.bag, and a temporary file, tmp.bag. We initialize total3.bag to an empty bag. As we loop over each IPset, rwbagbuild converts the IPset to a bag on its standard output, rwbagtool creates tmp.bag by adding its standard input to total3.bag, and we rename tmp.bag to total3.bag:

 $ rwbagbuild --bag-input=/dev/null --output-file=total3.bag
 $ for i in *.set ; do
        rwbagbuild --set-input=$i  \
        | rwbagtool --output-file=tmp.bag --add total3.bag stdin ;
        /bin/mv tmp.bag total3.bag ;
   done
 $ rwbagcat total3.bag
        10.0.0.1|                   2|
        10.0.0.2|                   3|
        10.0.0.3|                   1|
        10.0.0.4|                   1|

Create a bag where the key is the country code

As of SiLK 3.12.0, a Bag file may contain a country code as its key. In rwbagbuild, specify the --key-type as sip-country, dip-country, or any-country. That key-type works with either textual input or IPset input. The form of the textual input when mapping an IP address to a country code is identical to that when building an ordinary bag.

 $ rwbagbuild --bag-input=mybag.txt --delimiter=,       \
        --key-type=any-country --output-file=scc1.bag
 $ rwbagcat scc1.bag
 --|                 527|

 $ rwbagbuild --set-input=A.set --key-type=any-country  \
        --output-file=scc2.bag
 $ rwbagcat scc2.bag
 --|                   2|

Create a bag using a prefix map value as the key

rwbagbuild and rwbag(1) can use a prefix map file as the key in a Bag file as of SiLK 3.12.0. Use the --pmap-file switch to specify the prefix map file, and specify the --key-type using one of the types that end in -pmap.

For a prefix map that maps by IP addresses, use a key-type of sip-pmap, dip-pmap, or any-ip-pmap. The input may be an IPset or text. The form of the textual input is the same as for a normal bag file.

 $ rwbagbuild --set-input=A.set --key-type=sip-pmap     \
        --pmap-file=ip-map.pmap --output=test1.bag

 $ rwbagbuild --bag-input=mybag.txt --delimiter=,       \
        --key-type=sip-pmap --pmap-file=ip-map.pmap     \
        --output-file=test2.bag

The prefix map file is not stored as part of the Bag, so you must provide the name of the prefix map when running rwbagcat(1).

 $ rwbagcat --pmap-file=ip-map.pmap test2.bag
          internal|                 527|

For a prefix map file that maps by protocol-port pairs, the textual input must contain either three column (protocol, port, counter) or two columns (protocol and port) which uses the --default-counter.

 $ cat proto-port-count.txt
 6| 25|  800|
 6| 80| 5642|
 6| 22
 $ rwbagbuild --key-type=sport-pmap                 \
        --bag-input=proto-port-count.txt            \
        --pmap-file=proto-port-map.pmap             \
        --output-path=service.bag
 $ rwbagcat --pmap-file=port-map.pmap service.bag
   TCP/SSH|                   1|
  TCP/SMTP|                 800|
  TCP/HTTP|                5642|

ENVIRONMENT

SILK_COUNTRY_CODES

This environment variable allows the user to specify the country code mapping file that rwbagbuild uses when mapping an IP to a country for the sip-country, dip-country, or any-country keys. The value may be a complete path or a file relative to the SILK_PATH. See the "FILES" section for standard locations of this file.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.

SILK_PATH

This environment variable gives the root of the install tree. When searching for the country code mapping file, rwbagbuild may use this environment variable. See the "FILES" section for details.

FILES

$SILK_COUNTRY_CODES
$SILK_PATH/share/silk/country_codes.pmap
$SILK_PATH/share/country_codes.pmap
/usr/share/silk/country_codes.pmap
/usr/share/country_codes.pmap

Possible locations for the country code mapping file required by the sip-country, dip-country, and any-country key-types.

SEE ALSO

rwbag(1), rwbagcat(1), rwbagtool(1), rwfileinfo(1), rwpmapbuild(1), rwset(1), rwsetbuild(1), rwsetcat(1), rwsettool(1), silk(7), ccfilter(3), zlib(3)

BUGS

The --default-count switch is poorly named.