NAME

rwaggbagbuild - Create a binary aggregate bag from non-flow data

SYNOPSIS

  rwaggbagbuild [--fields=FIELDS]
        [--constant-field=FIELD=VALUE [--constant-field=FIELD=VALUE...]]
        [--column-separator=CHAR] [--no-titles]
        [--bad-input-lines=FILE] [--verbose] [--stop-on-error]
        [--note-add=TEXT] [--note-file-add=FILE]
        [--invocation-strip] [--compression-method=COMP_METHOD]
        [--output-path=PATH] [--site-config-file=FILENAME]
        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE...]]}

  rwaggbagbuild --help

  rwaggbagbuild --help-fields

  rwaggbagbuild --version

DESCRIPTION

rwaggbagbuild builds a binary Aggregate Bag file by reading one or more files containing textual input. To build an Aggregate Bag from SiLK Flow records, use rwaggbag(1).

An Aggregate Bag is a binary file that maps a key to a counter, where the key and the counter are both composed of one or more fields. For example, an Aggregate Bag could contain the sum of the packet count and the sum of the byte count for each unique source IP and source port pair.

rwaggbagbuild reads its input from the files named on the command line or from the standard input when no file names are specified, when --xargs is not present, and when the standard input is not a terminal. To read the standard input in addition to the named files, use - or stdin as a file name. When the --xargs switch is provided, rwaggbagbuild reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.

The new Aggregate Bag file is written to the location specified by the --output-path switch. If it is not provided, output is sent to the standard output when it is not connected to a terminal.

The Aggregate Bag file must have at least one field that it considers and key field and at least one field that it considers a counter field. See the description of the --fields switch.

In general (and as detailed below), each line of the text input files becomes one entry in the Aggregate Bag file. It is also possible to specify that each entry in the Aggregate Bag file contains additional fields, each with a specific value. These fields are specified by the --constant-field switch whose argument is a field name, an equals sign ('='), and a textual representation of a value. The named field becomes one of the key or counter fields in the Aggregate Bag file, and that field is given the specified value for each entry that is read from an input file. See the --fields switch in the "OPTIONS" section for the names of the fields and the acceptable forms of the textual input for each field.

The remainder of this section details how rwaggbagbuild processes each text input file to create an Aggregate Bag file.

When the --fields switch is specified, its argument specifies the key and counter fields that the new Aggregate Bag file is to contain. If --fields is not specified, the first line of the first input file is expected to contain field names, and those names determine the Aggregate Bag's key and counter. A field name of ignore causes rwaggbagbuild to ignore the values in that field when parsing the input.

The textual input is processed one line at a time. Comments begin with a '#'-character and continue to the end of the line; they are stripped from each line. After removing the comments, any line that is blank or contains only whitespace is ignored.

All other lines must contain valid input, which is a set of fields separated by a delimiter. The default delimiter is the virtual bar ('|') and may be changed with the --column-separator switch. Whitespace around a delimiter is allowed; however, using space or tab as the separator causes each space or tab character to be treated as a field delimiter. The newline character is not a valid delimiter character since it is used to denote records, and '#' is not a valid delimiter since it begins a comment.

The first line of each input file may contain delimiter-separated field names denoting in which order the fields appear in this input file. As mentioned above, when the --fields switch is not given, the first line of the first file determines the Aggregate Bag's key and counter. To tell rwaggbagbuild to treat the first line of each file as field values to be parsed, specify the --no-titles switch.

Every other line must contain delimiter-separated field values. A delimiter may follow the final field on a line. rwaggbagbuild ignores lines that contain either too few or too many fields.

See the description of the --fields switch in the "OPTIONS" section for the names of the fields and the acceptable forms of the textual input for each field.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--fields=FIELDS

Specify the fields in the input files. FIELDS is a comma separated list of field names. Field names are case-insensitive, and a name may be abbreviated to the shortest unique prefix. Other than the ignore field, a field name may not be specified more than once. The Aggregate Bag file must have at least one key field and at least one counter field.

The names of the fields that are considered key fields, their descriptions, and the format of the input that each expects are:

ignore

field that rwaggbagbuild is to skip

sIPv4

source IP address, IPv4 only; either the canonical dotted-quad format or an integer from 0 to 4294967295 inclusive

dIPv4

destination IP address, IPv4 only; uses the same format as sIPv4

nhIPv4

next hop IP address, IPv4 only; uses the same format as sIPv4

any-IPv4

a generic IPv4 address; uses the same format as sIPv4

sIPv6

source IP address, IPv6 only; the canonical hex-encoded format for IPv6 addresses

dIPv6

destination IP address, IPv6 only; uses the same format as sIPv6

nhIPv6

next hop IP address, IPv6 only; uses the same format as sIPv6

any-IPv6

a generic IPv6 address; uses the same format as sIPv6

sPort

source port; an integer from 0 to 65535 inclusive

dPort

destination port; an integer from 0 to 65535 inclusive

any-port

a generic port; an integer from 0 to 65535 inclusive

protocol

IP protocol; an integer from 0 to 255 inclusive

packets

packet count; an integer from 1 to 4294967295 inclusive

bytes

byte count; an integer from 1 to 4294967295 inclusive

flags

bit-wise OR of TCP flags over all packets; a string containing F, S, R, P, A, U, E, C in upper- or lowercase

initialFlags

TCP flags on the first packet; uses the same form as flags

sessionFlags

bit-wise OR of TCP flags on the second through final packet; uses the same form as flags

sTime

starting time in seconds; uses the form YYYY/MM/DD[:hh[:mm[:ss[.sss]]]] (any milliseconds value is dropped). A T may be used in place of : to separate the day and hour fields. A floating point value between 536870912 and 2147483647 is also allowed and is treated as seconds since the UNIX epoch.

eTime

ending time in seconds; uses the same format as sTime

any-time

a generic time in seconds; uses the same format as sTime

duration

duration of flow; a floating point value from 0.0 to 4294967.295

sensor

sensor name or ID at the collection point; a string as given in silk.conf(5)

class

class at collection point; a string as given in silk.conf

type

type at collection point; a string as given in silk.conf

input

router SNMP ingress interface or vlanId; an integer from 0 to 65535

output

router SNMP egress interface or postVlanId; an integer from 0 to 65535

any-snmp

a generic SNMP value; an integer from 0 to 65535

attribute

flow attributes set by the flow generator:

S

all the packets in this flow record are exactly the same size

F

flow generator saw additional packets in this flow following a packet with a FIN flag (excluding ACK packets)

T

flow generator prematurely created a record for a long-running connection due to a timeout or a byte-count threshold

C

flow generator created a record as a continuation of a previous record for a connection that exceeded a timeout or byte-count threshold

application

guess as to the content of the flow; as an integer from 0 to 65535

icmpType

ICMP type; an integer from 0 to 255 inclusive

icmpCode

ICMP code; an integer from 0 to 255 inclusive

scc

the country code of the source; accepts a two character string to use as the country of the source IP. The code is not checked for validity against the country_codes.pmap file. The code must be ASCII and it may contain two letters, a letter followed by a number, or the string --. Since SiLK 3.19.0.

dcc

the country code of the destination. See scc. Since SiLK 3.19.0.

any-cc

a generic country code. See scc. Since SiLK 3.19.0.

custom-key

a generic key; an integer from 0 to 4294967295 inclusive

The names and descriptions of the fields that are considered counter fields are listed next. For each, the type of input is an unsigned 64-bit number; that is, an integer from 0 to 18446744073709551615.

records

count of records that match the key

sum-packets

sum of packet counts

sum-bytes

sum of byte counts

sum-duration

sum of duration values

custom-counter

a generic counter

--constant-field=FIELD=VALUE

For each entry (row) read from the input file(s), insert or replace a field named FIELD and set its value to VALUE. VALUE is a textual representation of the field's value as described in the description of the --fields switch above. When FIELD is a counter field and the same key appears multiple times in the input, VALUE is added to the counter multiple times. If a field named FIELD appears in an input file, its value from that file is ignored. Specify the --constant-field switch multiple times to insert multiple fields.

--column-separator=CHAR

When reading textual input, use the character CHAR as the delimiter between columns (fields) in the input. The default column separator is the vertical pipe ('|'). rwaggbagbuild normally ignores whitespace (space and tab) around the column separator; however, using space or tab as the separator causes each space or tab character to be treated as a field delimiter. The newline character is not a valid delimiter character since it is used to denote records, and '#' is not a valid delimiter since it begins a comment.

--bad-input-lines=FILEPATH

When parsing textual input, copy any lines than cannot be parsed to FILEPATH. The strings stdout and stderr may be used for the standard output and standard error, respectively. Each bad line is prepended by the name of the source input file, a colon, the line number, and a colon. On exit, rwaggbagbuild removes FILEPATH if all input lines were successfully parsed.

--verbose

When a textual input line fails to parse, print a message to the standard error describing the problem. When this switch is not specified, parsing failures are not reported. rwaggbagbuild continues to process the input after printing the message. To stop processing when a parsing error occurs, use --stop-on-error.

--stop-on-error

When a textual input line fails to parse, print a message to the standard error describing the problem and exit the program. When this occurs, the output file contains any records successfully created prior to reading the bad input line. The default behavior of rwaggbagbuild is to silently ignore parsing errors. To report parsing errors and continue processing the input, use --verbose.

--no-titles

Parse the first line of the input as field values. Normally when the --fields switch is specified, rwaggbagbuild examines the first line to determine if the line contains the names (titles) of fields and skips the line if it does. rwaggbagbuild exits with an error when --no-titles is given but --fields is not.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--invocation-strip

Do not record the command used to create the Aggregate Bag file in the output. When this switch is not given, the invocation is written to the file's header, and the invocation may be viewed with rwfileinfo(1).

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--output-path=PATH

Write the binary Aggregate Bag output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwaggbagbuild exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwaggbagbuild to exit with an error.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwaggbagbuild searches for the site configuration file in the locations specified in the "FILES" section.

--xargs
--xargs=FILENAME

Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwaggbagbuild opens each named file in turn and reads text from it as if the filenames had been listed on the command line.

--help

Print the available options and exit.

--help-fields

Print the names and descriptions of the keys and counters that may be used in the --fields and --constant-field switches and exit. Since SiLK 3.22.0.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

Assume the following textual data in the file rec.txt:

             dIP|dPort|   packets|     bytes|
   10.245.15.175|   80|       127|     12862|
 192.168.251.186|29222|       131|    351213|
  10.247.186.130|   80|       596|     38941|
 192.168.239.224|29362|       600|    404478|
 192.168.215.219|   80|       400|     32375|
   10.255.252.19|28925|       404|   1052274|
 192.168.255.249|   80|       112|      7412|
    10.208.7.238|29246|       109|    112977|
 192.168.254.127|   80|       111|      9759|
   10.218.34.108|29700|       114|    461845|

To create an Aggregate Bag file from this data, provide the --fields switch with the names used by the Aggregate Bag tools:

 $ rwaggbagbuild --fields=dipv4,dport,sum-packets,sum-bytes  \
        --output-path=ab.aggbag rec.txt

Use the rwaggbagcat(1) tool to view it:

 $ rwaggbagcat ab.aggbag
           dIPv4|dPort|    sum-packets|           sum-bytes|
    10.208.7.238|29246|            109|              112977|
   10.218.34.108|29700|            114|              461845|
   10.245.15.175|   80|            127|               12862|
  10.247.186.130|   80|            596|               38941|
   10.255.252.19|28925|            404|             1052274|
 192.168.215.219|   80|            400|               32375|
 192.168.239.224|29362|            600|              404478|
 192.168.251.186|29222|            131|              351213|
 192.168.254.127|   80|            111|                9759|
 192.168.255.249|   80|            112|                7412|

Create an Aggregate Bag from the destination port field and count the number of times each port appears, ignore all fields except the dPort fields and use --constant-field to add a new field:

 $ rwaggbagbuild --fields=ignore,dport,ignore,ignore  \
        --constant-field=record=1                     \
   | rwaggbagcat
 dPort|   records|
    80|         5|
 28925|         1|
 29222|         1|
 29246|         1|
 29362|         1|
 29700|         1|

Alternatively, use rwaggbagtool(1) to get the same information from the ab.aggbag file created above:

 $ rwaggbagtool --select-fields=dport        \
        --insert-field=record=1 ab.aggbag    \
   | rwaggbagcat
 dPort|   records|
    80|         5|
 28925|         1|
 29222|         1|
 29246|         1|
 29362|         1|
 29700|         1|

ENVIRONMENT

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the "FILES" section, rwaggbagbuild may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwaggbagbuild may use this environment variable. See the "FILES" section for details.

FILES

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/share/silk/silk.conf
/usr/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwaggbag(1), rwaggbagcat(1), rwaggbagtool(1), rwfileinfo(1), rwset(1), rwsetbuild(1), rwsetcat(1), rwsettool(1), ccfilter(3), silk.conf(5), silk(7), zlib(3)

NOTES

rwaggbagbuild and the other Aggregate Bag tools were introduced in SiLK 3.15.0.