NAME
rwgroup - Tag similar SiLK records with a common next hop IP value
SYNOPSIS
rwgroup
{--id-fields=KEY | --delta-field=FIELD --delta-value=DELTA}
[--objective] [--summarize] [--plugin=PLUGIN]
[--rec-threshold=THRESHOLD] [--group-offset=IP]
[--note-add=TEXT] [--note-file-add=FILE] [--output-path=PATH]
[--copy-input=PATH] [--compression-method=COMP_METHOD]
[--site-config-file=FILENAME] [--python-file=PATH ...]
[--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[FILE]
rwgroup [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[--plugin=PLUGIN ...] [--python-file=PATH ...] --help
rwgroup --version
DESCRIPTION
rwgroup reads sorted SiLK Flow records (c.f. rwsort(1)) from the standard input or from a single file name listed on the command line, marks records that form a group with an identifier in the Next Hop IP field, and prints the binary SiLK Flow records to the standard output. In some ways rwgroup is similar to rwuniq(1), but rwgroup writes SiLK flow records instead of textual output.
Two SiLK records are defined as being in the same group when the fields specified in the --id-fields switch match exactly and when the field listed in the --delta-field matches within the value given by the --delta-value switch. Either --id-fields or --delta-fields is required; both may be specified. A --delta-value must be given when --delta-fields is present.
The records that make up the first group will have the value 0 written into their Next Hop IP field. Each subsequent group will value their Next Hop IP value incremented by 1. The --group-offset switch will change the initial group's Next Hop IP value.
The --rec-threshold switch may be used to only print groups that contain a certain number of records. The --summarize switch attempts to merge records in the same group to a single output record.
rwgroup requires that the records are sorted on the fields listed in the --id-fields and --delta-fields switches. For example, a call using
rwgroup --id-field=2 --delta-field=9 --delta-value=3
should read the output of
rwsort --field=2,9
otherwise the results are unpredictable.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
At least one value for --id-field or --delta-field must be provided; rwgroup will terminate with an error if no fields are specified.
- --id-fields=KEY
-
KEY contains the list of flow attributes (a.k.a. fields or columns) that must match exactly for flows to be considered part of the same group. Each field may be specified once only. KEY is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Field-names are case insensitive. Example:
-
--id-fields=stime,10,1-5
-
There is no default value for the --id-fields switch.
-
The complete list of built-in fields that the SiLK tool suite supports follows, though note that not all fields are present in all SiLK file formats; when a field is not present, its value is 0.
- sIP,1
-
source IP address
- dIP,2
-
destination IP address
- sPort,3
-
source port for TCP and UDP, or equivalent
- dPort,4
-
destination port for TCP and UDP, or equivalent
- protocol,5
-
IP protocol
- packets,pkts,6
-
packet count
- bytes,7
-
byte count
- flags,8
-
bit-wise OR of TCP flags over all packets
- sTime,9
-
starting time of flow (seconds resolution)
- dur,10
-
duration of flow (seconds resolution)
- eTime,11
-
end time of flow (seconds resolution)
- sensor,12
-
name or ID of sensor at the collection point
- class,20
-
class of sensor at the collection point
- type,21
-
type of sensor at the collection point
- icmpTypeCode,25
-
the ICMP type and code
- initialFlags,26
-
TCP flags on first packet in the flow
- sessionFlags,27
-
bit-wise OR of TCP flags over all packets except the first in the flow
- attributes,28
-
flow attributes set by the flow generator:
F-
flow generator saw additional packets in this flow following a packet with a FIN flag (excluding ACK packets)
T-
flow generator prematurely created a record for a long-running connection due to a timeout. (When the flow generator yaf(1) is run with the --silk switch, it will prematurely create a flow and mark it with
Tif the byte count of the flow cannot be stored in a 32-bit value.) C-
flow generator created this flow as a continuation of long-running connection, where the previous flow for this connection met a timeout (or a byte threshold in the case of yaf).
- application,29
-
guess as to the content the flow. Some software that generates flow records from packet data, such as yaf, will inspect the contents of the packets that make up a flow and use traffic signatures to label the content of the flow. SiLK calls this label the application; yaf refers to it as the appLabel. The application is the port number that is traditionally used for that type of traffic (see the /etc/services file on most UNIX systems). For example, traffic that the flow generator recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard HTTP/web port (80).
- stype,16
-
categorize the source IP address as
non-routable,internal, orexternaland group based on the category. See addrtype(3). - dtype,17
-
as stype for the destination IP address
- scc,18
-
the country code of the source IP address. See ccfilter(3).
- dcc,19
-
as scc for the destination IP
- src-MAPNAME
-
value determined by passing the source IP or the protocol/source-port to the user-defined mapping defined in the prefix map associated with MAPNAME. See the description of the --pmap-file switch and the pmapfilter(3) manual page.
- dst-MAPNAME
-
as src-MAPNAME for the destination IP or protocol/destination-port.
- sval
- dval
-
These are deprecated field names created by pmapfilter that correspond to src-MAPNAME and dst-MAPNAME, respectively. These fields are available when a prefix map is used that is not associated with a MAPNAME.
- --delta-field=FIELD
-
Specify a single field that can differ by a specified delta-value among the SiLK records that make up a group. The FIELD identifiers include most of those specified for --id-fields. The exceptions are that plug-in fields are not supported, nor are fields that do not have numeric values (e.g., class, type, flags). The most common value for this switch is
stime, which allows records that are identical in the id-fields but temporally far apart to be in different groups. The switch takes a single argument; multiple delta fields cannot be specified. When this switch is specified, the --delta-value switch is required. - --delta-value=DELTA_VALUE
-
Specify the acceptable difference between the values of the --delta-field. The --delta-value switch is required when the --delta-field switch is provided. For fields other than those holding IPs, when two consecutive records have values less than or equal to DELTA_VALUE, the records are considered members of the same group. When the delta-field refers to an IP field, DELTA_VALUE is the number of least significant bits of the IPs to remove before comparing them. For example, when --delta-field=sIP --delta-value=8 is specified, two records are the same group if their source IPv4 addresses belong to the same /24 or if their source IPv6 addresses belong to the same /120. The --objective switch affects the meaning of this switch.
- --objective
-
Change the behavior of the --delta-value switch so that a record is considered part of a group if the value of its --delta-field is within the DELTA_VALUE of the first record in the group. (When this switch is not specified, consecutive records are compared.)
- --summarize
-
Cause rwgroup to print (typically) a single record for each group. By default, all records in each group having at least --rec-threshold members is printed. When --summarize is active, the record that is written for the group is the first record in the group with the following modifications:
-
The packets and bytes values are the sum of the packets and bytes values, respectively, for all records in the group.
-
The start-time value is the earliest start time for the records in the group.
-
The end-time value is the latest end time for the records in the group.
-
The flags and session-flags values are the bitwise-OR of all flags and session-flags values, respectively, for the records in the group.
- --plugin=PLUGIN
-
Augment the list of fields by using run-time loading of the plug-in (shared object) whose path is PLUGIN. The creation of these plug-ins is beyond the scope of this manual page. When PLUGIN contains a slash (
/), rwgroup assumes the path to PLUGIN is correct. Otherwise, rwgroup will attempt to find the file in $SILK_PATH/lib/silk, $SILK_PATH/share/lib, $SILK_PATH/lib, and in these directories parallel to the application's directory: lib/silk, share/lib, and lib. If rwgroup does not find the file, it assumes the plug-in is in the current directory. To force rwgroup to look in the current directory first, specify --plugin=./PLUGIN. When the SILK_PLUGIN_DEBUG environment variable is non-empty, rwgroup prints status messages to the standard error as it tries to open each of its plug-ins. - --rec-threshold=THRESHOLD
-
Specify the minimum number of SiLK records a group must contain before the records in the group are written to the output stream. The default is 1; i.e., write all records. The maximum threshold is 65535.
- --group-offset=IP
-
Specify the value to write into the Next Hop IP for the records that comprise the first group. The value IP may be an integer, or an IPv4 or IPv6 address in the canonical presenation form. If not specified, counting begins at 0. The value for each subsequent group is incremented by 1.
- --note-add=TEXT
-
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
- --note-file-add=FILENAME
-
Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.
- --copy-input=PATH
-
Copy all binary input to the specified file or named pipe. PATH can be
stdoutto print flows to the standard output as long as the --output-path switch has been used to redirect rwgroup's output. - --output-path=PATH
-
Determines where the output of rwgroup is written. If this option is not given, output is written to the standard output.
- --compression-method=COMP_METHOD
-
Set the compression method of the output to COMP_METHOD. Some SiLK tools can use an external library to compress their binary output. The list of available compression methods and the default method are set when SiLK is compiled (the --help and --version switches print the available and default compression methods) and depend on which supported libraries are found. SiLK can support:
- none
-
Do not compress the output using an external library
- zlib
-
Use the zlib(3) library for compressing the output
- lzo1x
-
Use the lzo1x algorithm from the LZO real time compression library for compression
- best
-
Use whichever available method gives the
bestcompression in general, though not necessarily thebestfor this particular output. - --site-config-file=FILENAME
-
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK_CONFIG_FILE environment variable is used if that variable is not empty. The value of SILK_CONFIG_FILE should include the name of the file. Otherwise, the application looks for a file named silk.conf in the following directories: the directory specified in the SILK_DATA_ROOTDIR environment variable; the data root directory that is compiled into SiLK (use the --version switch to view this value); the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/; and the share/silk/ and share/ directories parallel to the application's directory.
- --help
-
Print the available options and exit. Options that add fields can be specified before --help so that the new options appear in the output.
- --version
-
Print the version number and information about how SiLK was configured, then exit the application.
- --pmap-file=MAPNAME:PATH
- --pmap-file=PATH
-
When the prefix map plug-in is used, rwgroup reads the mapping file located at PATH. When MAPNAME is provided, it will be used to refer to the fields specific to that prefix map. If MAPNAME is not provided, rwgroup will check the prefix map file to see if a map-name was specified when the file was created. Using multiple --prefix-map switches allows additional prefix map files to be read as long as each uses a unique map-name. For more information, see pmapfilter(3).
- --python-file=PATH
-
When the SiLK Python plug-in is used, rwgroup reads the Python code from the file PATH to define additional fields that can be used as part of the group key. This file should call register_plugin_field() for each field it wishes to define. For details and examples, see the silkpython(3) and pysilk(3) manual pages.
Many SiLK file formats do not store the following fields and their values will always be 0; they are listed here for completeness:
SiLK can store flows generated by enhanced collection software that provides more information than NetFlow v5. These flows may support some or all of these additional fields; for flows without this additional information, the field's value is always 0.
Consider a long-running ssh session that exceeds the flow generator's
active timeout. (This is the active timeout since the flow
generator creates a flow for a connection that still has activity).
The flow generator will create multiple flow records for this ssh
session, each spanning some portion of the total session. The first
flow record will be marked with a T indicating that it hit the
timeout. The second through next-to-last records will be marked with
TC indicating that this flow both timed out and is a continuation
of a flow that timed out. The final flow will be marked with a C,
indicating that it was created as a continuation of an active flow.
The list of built-in fields may be augmented by run-time loading of plug-ins (shared object files or dynamic libraries) when the plug-in is available. rwgroup automatically looks for the following plug-ins:
ADDRESS TYPE (addrtype.so)
COUNTRY CODE (ccfilter.so)
PREFIX MAP (pmapfilter.so)
Note that multiple records for a group may be printed if the bytes, packets, or elapsed time values are too large to be stored in a SiLK flow record.
LIMITATIONS
rwgroup requires sorted data. The application works by comparing records in the order that the records are received (similar to the UNIX uniq(1) command), odd orders will produce odd groupings.
EXAMPLES
As a rule of thumb, the --id-fields and --delta-field parameters should match rwsort(1)'s call, with --delta-field being the last parameter. A call to group all web traffic by queries from the same addresses (field=2) within 10 seconds (field=9) of the first query from that address will be:
rwfilter --proto=6 --dport=80 --pass=stdout | \
rwsort --field=2,9 | \
rwgroup --id-field=2 --delta-field=9 --delta-value=10
--objective
ENVIRONMENT
- PYTHONPATH
-
This environment variable is used by Python to locate modules. When --python-file is specified, rwgroup loads Python which in turn loads the PySiLK module which is comprised of several files (silk/pysilk_nl.so, silk/__init__.py, etc). If this silk/ directory is located outside Python's normal search path (for example, in the SiLK installation tree), it may be necessary to set or modify the PYTHONPATH environment variable to include the parent directory of silk/ so that Python can find the PySiLK module.
- SILK_PYTHON_TRACEBACK
-
When set, Python plug-ins will output traceback information on Python errors to the standard error.
- SILK_COUNTRY_CODES
-
This environment variable allows the user to specify the country code mapping file that the ccfilter(3) plug-in will use. The value may be a complete path or a file relative to the SILK_PATH. If the variable is not specified, the code looks for a file named country_codes.pmap in the location specified by SILK_PATH.
- SILK_CONFIG_FILE
-
This environment variable is used as the value for the --site-config-file when that switch is not provided.
- SILK_DATA_ROOTDIR
-
When the --site-config-file switch is not provided and the SILK_CONFIG_FILE environment variable is not set, rwgroup looks for the site configuration file in $SILK_DATA_ROOTDIR/silk.conf.
- SILK_PATH
-
This environment variable gives the root of the install tree. As part of its search for the SiLK site configuration file, rwgroup checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share. These directories are also searched when any other configuration file is required (e.g., the country code map). In addition, rwgroup looks for plug-ins in $SILK_PATH/lib/silk, $SILK_PATH/share/lib and $SILK_PATH/lib.
- SILK_PLUGIN_DEBUG
-
When set to 1, rwgroup prints status messages to the standard error as it tries to open each of its plug-ins.
SEE ALSO
rwfilter(1), rwfileinfo(1), rwsort(1), rwuniq(1), addrtype(3), ccfilter(3), pmapfilter(3), silkpython(3), pysilk(3), uniq(1), yaf(1), zlib(3)


