NAME

rwuniq - Bin SiLK Flow records by a key and print each bin's volume

SYNOPSIS

  rwuniq --fields=KEY [--values=VALUES]
        [--threshold=MIN-MAX --threshold=MIN]
        [--presorted-input] [--sort-output]
        [{--bin-time=SECONDS | --bin-time}]
        [--timestamp-format=FORMAT] [--epoch-time]
        [--ip-format=FORMAT] [--integer-ips] [--zero-pad-ips]
        [--integer-sensors] [--integer-tcp-flags]
        [--no-titles] [--no-columns] [--column-separator=CHAR]
        [--no-final-delimiter] [{--delimited | --delimited=CHAR}]
        [--print-filenames] [--copy-input=PATH] [--output-path=PATH]
        [--pager=PAGER_PROG] [--temp-directory=DIR_PATH]
        [{--legacy-timestamps | --legacy-timestamps={1,0}}]
        [--all-counts] [{--bytes | --bytes=MIN | --bytes=MIN-MAX}]
        [{--packets | --packets=MIN | --packets=MIN-MAX}]
        [{--flows | --flows=MIN | --flows=MIN-MAX}]
        [--stime] [--etime]
        [{--sip-distinct | --sip-distinct=MIN | --sip-distinct=MIN-MAX}]
        [{--dip-distinct | --dip-distinct=MIN | --dip-distinct=MIN-MAX}]
        [--ipv6-policy={ignore,asv4,mix,force,only}]
        [--site-config-file=FILENAME]
        [--plugin=PLUGIN [--plugin=PLUGIN ...]]
        [--python-file=PATH [--python-file=PATH ...]]
        [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
        [--pmap-column-width=NUM]
        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}

  rwuniq [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
        [--plugin=PLUGIN ...] [--python-file=PATH ...] --help

  rwuniq [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
        [--plugin=PLUGIN ...] [--python-file=PATH ...] --help-fields

  rwuniq --version

DESCRIPTION

rwuniq reads SiLK Flow records and groups them by a key composed of user-specified attributes of the flows. For each group (or bin), a collection of user-specified aggregate values is computed; these values are typically related to the volume of the bin, such as the sum of the bytes fields for all records that match the key. Once all the SiLK Flow records are read, the key fields and the aggregate values are printed. For some of the built-in aggregate values, it is possible to limit the output to the bins where the aggregate value meets a user-specified minimum and/or maximum.

There is no need to sort the input to rwuniq since rwuniq normally rearranges the records as they are read. To have rwuniq sort its output, use the --sort-output switch.

rwuniq reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file is uncompressed as it is read. When the --xargs switch is provided, rwuniq reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.

The user must provide the --fields switch to select the flow attribute(s) (or field(s)) that comprise the key for each bin. The available fields are similar to those supported by rwcut(1); see the description of the --fields switch in the "OPTIONS" section below for the details. The list of fields can be extended by loading PySiLK files (see silkpython(3)) or plug-ins (silk-plugin(3)). The fields are printed in the order in which they occur in the --fields switch. The size of the key is limited to 256 octets. A larger key more quickly uses the available the memory leading to slower performance.

The aggregate value(s) to compute for each bin are also chosen by the user. As with the key fields, the user can extend the list of aggregate fields by using PySiLK or plug-ins. Specify the aggregate fields with the --values switch; the aggregate fields are printed in the order they occur in the --values switch. If the user does not provide --values or a --threshold switch (described next), rwuniq defaults to computing the number of flow records for each bin. As with the key fields, requesting more aggregate values slows performance.

The --threshold switch (added in SiLK 3.17.0) allows the user to print only bins where a value field is within a certain range. The switch's argument contains the name of the value field, an equals sign, the minimum value (start of the range), and optionally a hyphen and the maximum value (end of the range); e.g., --threshold=bytes=1000-2000. The upper bound is unlimited when no maximum is specified. The --threshold switch may be repeated to set multiple thresholds, and only those bins that meet all thresholds are printed. Each field named by --threshold is appended to the set of aggregate value fields unless that field was named in the --values switch.

The --presorted-input switch may allow rwuniq to process data more efficiently by causing rwuniq to assume the input has been previously sorted with the rwsort(1) command. With this switch, rwuniq typically does not need large amounts of memory because it does not bin each flow; instead, it keeps a running summation and outputs the bin whenever the key changes. For the output to be meaningful, rwsort and rwuniq must be invoked with the same --fields value. When multiple input files are specified and --presorted-input is given, rwuniq merge-sorts the flow records from the input files. rwuniq typically runs faster if you do not include the --presorted-input switch when counting distinct values, even when reading sorted input. Finally, you may get unusual results with --presorted-input when the --fields switch contains multiple time-related key fields (sTime, duration, eTime), or when the time-related key is not the final key listed in --fields; see the "NOTES" section for details.

rwuniq attempts to keep all key and aggregate value data in the computer's memory. If rwuniq runs out of memory, the current key and aggregate value data is written to a temporary file. Once all input has been processed, the data from the temporary files is merged to produce the final output. By default, these temporary files are stored in the /tmp directory. Because these files can be large, it is strongly recommended that /tmp not be used as the temporary directory. To modify the temporary directory used by rwuniq, provide the --temp-directory switch, set the SILK_TMPDIR environment variable, or set the TMPDIR environment variable.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

The --fields switch is required. rwuniq fails when it is not provided.

--fields=KEY

KEY contains the list of flow attributes (a.k.a. fields or columns) that make up the key into which flows are binned. The columns are displayed in the order the fields are specified. Each field may be specified once only. KEY is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Field-names are case insensitive. Example:

 --fields=stime,10,1-5

There is no default value for the --fields switch; the switch must be specified.

The complete list of built-in fields that the SiLK tool suite supports follows, though note that not all fields are present in all SiLK file formats; when a field is not present, its value is 0.

sIP,1

source IP address

dIP,2

destination IP address

sPort,3

source port for TCP and UDP, or equivalent

dPort,4

destination port for TCP and UDP, or equivalent. See note at iType.

protocol,5

IP protocol

packets,pkts,6

packet count

bytes,7

byte count

flags,8

bit-wise OR of TCP flags over all packets

sTime,9

starting time of flow (seconds resolution unless --bin-time includes fractional seconds). When the time-related fields sTime,duration,eTime are all in use, rwuniq ignores the final time field when binning the records.

duration,10

duration of flow (seconds resolution unless --bin-time includes fractional seconds). This field is not adjusted by --bin-time unless --fields includes both sTime and eTime. See note at sTime,9.

eTime,11

end time of flow (seconds resolution unless --bin-time includes fractional seconds). See note at sTime,9.

sensor,12

name or ID of the sensor where the flow was collected

class,20

class assigned to the flow by rwflowpack(8). Binning by class and/or type equates to binning by the integer value used internally to represent the class/type pair. When --fields contains class but not type, rwuniq's output contains multiple rows with the same value(s) for the key field(s).

type,21

type assigned to the flow by rwflowpack(8). See note on previous entry.

iType

the ICMP type value for ICMP or ICMPv6 flows and empty (numerically zero) for non-ICMP flows. Internally, SiLK stores the ICMP type and code in the dPort field. To avoid getting very odd results, either do not use the dPort field when your key includes ICMP field(s) or be certain to include the protocol field as part of your key. This field was introduced in SiLK 3.8.1.

iCode

the ICMP code value for ICMP or ICMPv6 flows and empty for non-ICMP flows. See note at iType.

icmpTypeCode,25

equivalent to iType,iCode when used in --fields. This field may not be mixed with iType or iCode, and this field is deprecated as of SiLK 3.8.1. As of SiLK 3.8.1, icmpTypeCode may no longer be used as the argument to the Distinct: value field; the dPort field provides an equivalent result as long as the input is limited to ICMP flow records.

Many SiLK file formats do not store the following fields and their values are always be 0; they are listed here for completeness:

in,13

router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))

out,14

router SNMP output interface or postVlanId

nhIP,15

router next hop IP

SiLK can store flows generated by enhanced collection software that provides more information than NetFlow v5. These flows may support some or all of these additional fields; for flows without this additional information, the field's value is always 0.

initialFlags,26

TCP flags on first packet in the flow

sessionFlags,27

bit-wise OR of TCP flags over all packets except the first in the flow

attributes,28

flow attributes set by the flow generator:

S

all the packets in this flow record are exactly the same size

F

flow generator saw additional packets in this flow following a packet with a FIN flag (excluding ACK packets)

T

flow generator prematurely created a record for a long-running connection due to a timeout. (When the flow generator yaf(1) is run with the --silk switch, it prematurely creates a flow and mark it with T if the byte count of the flow cannot be stored in a 32-bit value.)

C

flow generator created this flow as a continuation of long-running connection, where the previous flow for this connection met a timeout (or a byte threshold in the case of yaf).

Consider a long-running ssh session that exceeds the flow generator's active timeout. (This is the active timeout since the flow generator creates a flow for a connection that still has activity). The flow generator will create multiple flow records for this ssh session, each spanning some portion of the total session. The first flow record will be marked with a T indicating that it hit the timeout. The second through next-to-last records will be marked with TC indicating that this flow both timed out and is a continuation of a flow that timed out. The final flow will be marked with a C, indicating that it was created as a continuation of an active flow.

application,29

guess as to the content of the flow. Some software that generates flow records from packet data, such as yaf, will inspect the contents of the packets that make up a flow and use traffic signatures to label the content of the flow. SiLK calls this label the application; yaf refers to it as the appLabel. The application is the port number that is traditionally used for that type of traffic (see the /etc/services file on most UNIX systems). For example, traffic that the flow generator recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard HTTP/web port (80).

The following fields provide a way to label the IPs or ports on a record. These fields require external files to provide the mapping from the IP or port to the label:

sType,16

for the source IP address, the value 0 if the address is non-routable, 1 if it is internal, or 2 if it is routable and external. Uses the mapping file specified by the SILK_ADDRESS_TYPES environment variable, or the address_types.pmap mapping file, as described in addrtype(3).

dType,17

as sType for the destination IP address

scc,18

for the source IP address, a two-letter country code abbreviation denoting the country where that IP address is located. Uses the mapping file specified by the SILK_COUNTRY_CODES environment variable, or the country_codes.pmap mapping file, as described in ccfilter(3). The abbreviations are those defined by ISO 3166-1 (see for example https://www.iso.org/iso-3166-country-codes.html or https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2) or the following special codes: -- N/A (e.g. private and experimental reserved addresses); a1 anonymous proxy; a2 satellite provider; o1 other

dcc,19

as scc for the destination IP

src-map-name

label contained in the prefix map file associated with map-name. If the prefix map is for IP addresses, the label is that associated with the source IP address. If the prefix map is for protocol/port pairs, the label is that associated with the protocol and source port. See also the description of the --pmap-file switch below and the pmapfilter(3) manual page.

dst-map-name

as src-map-name for the destination IP address or the protocol and destination port.

sval

as src-map-name when no map-name is associated with the prefix map file

dval

as dst-map-name when no map-name is associated with the prefix map file

Finally, the list of built-in fields may be augmented by the run-time loading of PySiLK code or plug-ins written in C (also called shared object files or dynamic libraries), as described by the --python-file and --plugin switches.

--values=VALUES

Specify the aggregate values to compute for each bin as a comma separated list of names. Names are case insensitive. When the --threshold switch specifies an aggregate value field that does appear in VALUES, that field is appended to VALUES. When neither the --values switch nor any --threshold switch is specified, rwuniq counts the number of flow records for each bin. The aggregate fields are printed in the order they occur in VALUES. The names of the built-in value fields follow. This list can be augmented through the use of PySiLK and plug-ins.

Records

Count the number of flow records that mapped to each bin.

Packets

Sum the number of packets across all records that mapped to each bin.

Bytes

Sum the number of bytes across all records that mapped to each bin.

sTime-Earliest

Keep track of the earliest start time (minimum time) seen across all records that mapped to each bin, in seconds resolution. The --bin-time switch does not normally affect this value; however, this value uses milliseconds resolution when --bin-time includes fractional seconds.

eTime-Latest

Keep track of the latest end time (maximum time) seen across all records that mapped to each bin, in seconds resolution. The --bin-time switch does not normally affect this value; however, this value uses milliseconds resolution when --bin-time includes fractional seconds.

sIP-Distinct

Count the number of distinct source IP addresses that were seen for each bin, an alias for Distinct:sIP.

dIP-Distinct

Count the number of distinct destination IP addresses that were seen for each bin, an alias for Distinct:dIP.

Distinct:KEY_FIELD

Count the number of distinct values for KEY_FIELD, where KEY_FIELD is any field that can be used as an argument to --fields except icmpTypeCode. For example, Distinct:sPort counts the number of distinct source ports for each bin. When this aggregate value field is used, the specified KEY_FIELD cannot be present in the argument to --fields.

Flows

Count the number of flow records that mapped to each bin; an alias for Records.

--plugin=PLUGIN

Augment the list of key fields and/or aggregate value fields by using run-time loading of the plug-in (shared object) whose path is PLUGIN. The switch may be repeated to load multiple plug-ins. The creation of plug-ins is described in the silk-plugin(3) manual page. When PLUGIN does not contain a slash (/), rwuniq attempts to find a file named PLUGIN in the directories listed in the "FILES" section. If rwuniq finds the file, it uses that path. If PLUGIN contains a slash or if rwuniq does not find the file, rwuniq relies on your operating system's dlopen(3) call to find the file. When the SILK_PLUGIN_DEBUG environment variable is non-empty, rwuniq prints status messages to the standard error as it attempts to find and open each of its plug-ins.

--threshold=VALUE_FIELD=MIN-MAX
--threshold=VALUE_FIELD=MIN

Limit the output of rwuniq to the bins where the value of the aggregate value field VALUE_FIELD is not less than MIN and not more than MAX. If MAX is not given, limit the output to the bins where the value of VALUE_FIELD is at least MIN. The VALUE_FIELD argument is case insensitive and may be abbreviated to the shortest unique prefix. This switch may be repeated to set thresholds for multiple fields, and rwuniq only prints bins that meet all thresholds. A MIN of 0 is treated as 1. If VALUE_FIELD is not present in the argument to the --values switch, it is appended to those aggregate values. VALUE_FIELD may be Records (or Flows), Packets, Bytes, sIP-Distinct, dIP-Distinct, or Distinct:KEY_FIELD. Setting thresholds for aggregate value fields defined by plug-ins is not supported. Since SiLK 3.17.0.

Miscellaneous options:

--presorted-input

Cause rwuniq to assume that it is reading sorted input; i.e., that rwuniq's input file(s) were generated by rwsort(1) using the exact same value for the --fields switch. When no distinct counts are being computed, rwuniq can process its input without needing to write temporary files. When multiple input files are specified, rwuniq merge-sorts the flow records from the input files. See the "NOTES" section for issues that may occur when using --presorted-input.

--sort-output

Cause rwuniq to present the output in sorted numerical order. The key rwuniq uses for sorting is the same key it uses to index each bin.

--bin-time=SECONDS
--bin-time

Adjust the times in the key fields sTime and eTime to appear on SECONDS-second boundaries (the floor of the time is used). As of SiLK 3.17.0, SECONDS may be a fractional value of 0.001 or greater, and rwuniq uses millisecond timestamps when SECONDS includes a fractional value that is non-zero. When this switch is not specified, times appear on 1-second boundaries. When the switch is used but no argument is given, rwuniq uses 60-second time bins. (When the start-time is the only key field and time binning is desired, consider using rwcount(1) instead.)

--timestamp-format=FORMAT

Specify the format and/or timezone to use when printing timestamps. When this switch is not specified, the SILK_TIMESTAMP_FORMAT environment variable is checked for a default format and/or timezone. If it is empty or contains invalid values, timestamps are printed in the default format, and the timezone is UTC unless SiLK was compiled with local timezone support. FORMAT is a comma-separated list of a format and/or a timezone. The format is one of:

default

Print the timestamps as YYYY/MM/DDThh:mm:ss.

iso

Print the timestamps as YYYY-MM-DD hh:mm:ss.

m/d/y

Print the timestamps as MM/DD/YYYY hh:mm:ss.

epoch

Print the timestamps as the number of seconds since 00:00:00 UTC on 1970-01-01.

When a timezone is specified, it is used regardless of the default timezone support compiled into SiLK. The timezone is one of:

utc

Use Coordinated Universal Time to print timestamps.

local

Use the TZ environment variable or the local timezone.

--epoch-time

Print timestamps as epoch time (number of seconds since midnight GMT on 1970-01-01). This switch is equivalent to --timestamp-format=epoch, it is deprecated as of SiLK 3.0.0, and it will be removed in the SiLK 4.0 release.

--ip-format=FORMAT

Specify how IP addresses are printed, where FORMAT is a comma-separated list of the arguments described below. When this switch is not specified, the SILK_IP_FORMAT environment variable is checked for a value and that format is used if it is valid. The default FORMAT is canonical. Since SiLK 3.7.0.

canonical

Print IP addresses in the canonical format. If the key only contains IPv4 addresses, use dot-separated decimal (192.0.2.1). Otherwise, use colon-separated hexadecimal (2001:db8::1) or a mixed IPv4-IPv6 representation for IPv4-mapped IPv6 addresses (the ::ffff:0:0/96 netblock, e.g., ::ffff:192.0.2.1) and IPv4-compatible IPv6 addresses (the ::/96 netblock other than ::/127, e.g., ::192.0.2.1).

no-mixed

Print IP addresses in the canonical format (192.0.2.1 or 2001:db8::1) but do not used the mixed IPv4-IPv6 representations. For example, use ::ffff:c000:201 instead of ::ffff:192.0.2.1. Since SiLK 3.17.0.

decimal

Print IP addresses as integers in decimal format. For example, print 192.0.2.1 and 2001:db8::1 as 3221225985 and 42540766411282592856903984951653826561, respectively.

hexadecimal

Print IP addresses as integers in hexadecimal format. For example, print 192.0.2.1 and 2001:db8::1 as c00000201 and 20010db8000000000000000000000001, respectively.

zero-padded

Make all IP address strings contain the same number of characters by padding numbers with leading zeros. For example, print 192.0.2.1 and 2001:db8::1 as 192.000.002.001 and 2001:0db8:0000:0000:0000:0000:0000:0001, respectively. For IPv6 addresses, this setting implies no-mixed, so that ::ffff:192.0.2.1 is printed as 0000:0000:0000:0000:0000:ffff:c000:0201. As of SiLK 3.17.0, may be combined with any of the above, including decimal and hexadecimal.

The following arguments modify certain IP addresses prior to printing. These arguments may be combined with the above formats.

map-v4

Change IPv4 addresses to IPv4-mapped IPv6 addresses (addresses in the ::ffff:0:0/96 netblock) prior to formatting. Since SiLK 3.17.0.

unmap-v6

When the key contains IPv6 addresses, change any IPv4-mapped IPv6 addresses (addresses in the ::ffff:0:0/96 netblock) to IPv4 addresses prior to formatting. Since SiLK 3.17.0.

The following argument is also available:

force-ipv6

Set FORMAT to map-v4,no-mixed.

--integer-ips

Print IP addresses as integers. This switch is equivalent to --ip-format=decimal, it is deprecated as of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.

--zero-pad-ips

Print IP addresses as fully-expanded, zero-padded values in their canonical form. This switch is equivalent to --ip-format=zero-padded, it is deprecated as of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.

--integer-sensors

Print the integer ID of the sensor rather than its name.

--integer-tcp-flags

Print the TCP flag fields (flags, initialFlags, sessionFlags) as an integer value. Typically, the characters F,S,R,P,A,U,E,C are used to represent the TCP flags.

--no-titles

Turn off column titles. By default, titles are printed.

--no-columns

Disable fixed-width columnar output.

--column-separator=C

Use specified character between columns and after the final column. When this switch is not specified, the default of '|' is used.

--no-final-delimiter

Do not print the column separator after the final column. Normally a delimiter is printed.

--delimited
--delimited=C

Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default '|'.

Print to the standard error the names of input files as they are opened.

--copy-input=PATH

Copy all binary SiLK Flow records read as input to the specified file or named pipe. PATH may be stdout or - to write flows to the standard output as long as the --output-path switch is specified to redirect rwuniq's textual output to a different location.

--output-path=PATH

Write the textual output to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output (and bypass the paging program). If PATH names an existing file, rwuniq exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is either sent to the pager or written to the standard output.

--pager=PAGER_PROG

When output is to a terminal, invoke the program PAGER_PROG to view the output one screen full at a time. This switch overrides the SILK_PAGER environment variable, which in turn overrides the PAGER variable. If the --output-path switch is given or if the value of the pager is determined to be the empty string, no paging is performed and all output is written to the terminal.

--ipv6-policy=POLICY

Determine how IPv4 and IPv6 flows are handled when SiLK has been compiled with IPv6 support. When the switch is not provided, the SILK_IPV6_POLICY environment variable is checked for a policy. If it is also unset or contains an invalid policy, the POLICY is mix. When SiLK has not been compiled with IPv6 support, IPv6 flows are always ignored, regardless of the value passed to this switch or in the SILK_IPV6_POLICY variable. The supported values for POLICY are:

ignore

Ignore any flow record marked as IPv6, regardless of the IP addresses it contains.

asv4

Convert IPv6 flow records that contain addresses in the ::ffff:0:0/96 netblock (that is, IPv4-mapped IPv6 addresses) to IPv4 and ignore all other IPv6 flow records.

mix

Process the input as a mixture of IPv4 and IPv6 flow records. When an IP address is used as part of the key or value, this policy is equivalent to force.

force

Convert IPv4 flow records to IPv6, mapping the IPv4 addresses into the ::ffff:0:0/96 netblock.

only

Process only flow records that are marked as IPv6 and ignore IPv4 flow records in the input.

--temp-directory=DIR_PATH

Specify the name of the directory in which to store data files temporarily when the memory is not large enough to store all the bins and their aggregate values. This switch overrides the directory specified in the SILK_TMPDIR environment variable, which overrides the directory specified in the TMPDIR variable, which overrides the default, /tmp.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwuniq searches for the site configuration file in the locations specified in the "FILES" section.

--legacy-timestamps
--legacy-timestamps=NUM

When NUM is not specified or is 1, this switch is equivalent to --timestamp-format=m/d/y. Otherwise, the switch has no effect. This switch is deprecated as of SiLK 3.0.0, and it will be removed in the SiLK 4.0 release.

--xargs
--xargs=FILENAME

Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwuniq opens each named file in turn and reads records from it as if the filenames had been listed on the command line.

--help

Print the available options and exit. Specifying switches that add new fields, values, or additional switches before --help allows the output to include descriptions of those fields or switches.

--help-fields

Print the description and alias(es) of each field and value and exit. Specifying switches that add new fields before --help-fields allows the output to include descriptions of those fields.

--version

Print the version number and information about how SiLK was configured, then exit the application.

--pmap-file=PATH
--pmap-file=MAPNAME:PATH

Load the prefix map file located at PATH and create fields named src-map-name and dst-map-name where map-name is either the MAPNAME part of the argument or the map-name specified when the file was created (see rwpmapbuild(1)). If no map-name is available, rwuniq names the fields sval and dval. Specify PATH as - or stdin to read from the standard input. The switch may be repeated to load multiple prefix map files, but each prefix map must use a unique map-name. The --pmap-file switch(es) must precede the --fields switch. See also pmapfilter(3).

--pmap-column-width=NUM

When printing a label associated with a prefix map, this switch gives the maximum number of characters to use when displaying the textual value of the field.

--python-file=PATH

When the SiLK Python plug-in is used, rwuniq reads the Python code from the file PATH to define additional fields that can be used as part of the key or as an aggregate value. This file should call register_field() for each field it wishes to define. For details and examples, see the silkpython(3) and pysilk(3) manual pages.

Deprecated volume switches

These options add the named aggregate field(s) to --values if the field is not present. When an argument is specified, the switch is equivalent to a --threshold switch. Use of these switches is deprecated.

--all-counts

Append the following fields to the argument of the --values switch unless the field is already present: Bytes, Packets, Records, sTime-Earliest, and eTime-Latest. Deprecated since SiLK 2.0.0.

--bytes

Append Bytes to the argument of the --values switch unless it is already present. Deprecated since SiLK 2.0.0.

--bytes=MIN

Add --threshold=bytes=MIN to the options. Deprecated since SiLK 3.17.0.

--bytes=MIN-MAX

Add --threshold=bytes=MIN-MAX to the options. Deprecated since SiLK 3.17.0.

--packets

Append Packets to the argument of the --values switch unless it is already present. Deprecated since SiLK 2.0.0.

--packets=MIN

Add --threshold=packets=MIN to the options. Deprecated since SiLK 3.17.0.

--packets=MIN-MAX

Add --threshold=packets=MIN-MAX to the options. Deprecated since SiLK 3.17.0.

--flows

Append Records to the argument of the --values switch unless it is already present. Deprecated since SiLK 2.0.0.

--flows=MIN

Add --threshold=records=MIN to the options. Deprecated since SiLK 3.17.0.

--flows=MIN-MAX

Add --threshold=records=MIN-MAX to the options. Deprecated since SiLK 3.17.0.

--sip-distinct

Append Distinct:sIP to the argument of the --values switch unless it is already present. Deprecated since SiLK 2.0.0.

--sip-distinct=MIN

Add --threshold=distinct:sip=MIN to the options. Deprecated since SiLK 3.17.0.

--sip-distinct=MIN-MAX

Add --threshold=distinct:sip=MIN-MAX to the options. Deprecated since SiLK 3.17.0.

--dip-distinct

Append Distinct:dIP to the argument of the --values switch unless it is already present. Deprecated since SiLK 2.0.0.

--dip-distinct=MIN

Add --threshold=distinct:dip=MIN to the options. Deprecated since SiLK 3.17.0.

--dip-distinct=MIN-MAX

Add --threshold=distinct:dip=MIN-MAX to the options. Deprecated since SiLK 3.17.0.

--stime

Append sTime-Earliest to the argument of the --values switch unless it is already present. Deprecated since SiLK 2.0.0.

--etime

Append eTime-Latest to the argument of the --values switch unless it is already present. Deprecated since SiLK 2.0.0.

EXAMPLES

In these examples, the dollar sign ($) represents the shell prompt and a backslash (\) is used to continue a line for better readability. Many examples assume previous rwfilter(1) commands have written data files named data.rw and data-v6.rw.

The --fields switch is required to specify which field(s) comprise the key. By default, rwuniq counts the number of records for each key. This example uses the source port as the key.

 $ rwuniq --fields=sport data.rw | head
 sPort|   Records|
    53|     62216|
    22|     27994|
    67|      7807|
 29897|        78|
 28816|        24|
    80|     27044|
 28925|        22|
     0|      7801|
 29246|        63|

Notice how the keys are printed in an arbitrary order. Use the --sort-output switch to arrange the keys from lowest to highest.

 $ rwuniq --fields=sport --sort-output data.rw | head
 sPort|   Records|
     0|      7801|
    22|     27994|
    25|     15568|
    53|     62216|
    67|      7807|
    80|     27044|
   123|      7741|
   443|      7917|
  8080|      3946|

To sort the output by a volume field (such as the number of records), use rwstats(1).

 $ rwstats --fields=sport --count=10 data.rw
 INPUT: 250928 Records for 4739 Bins and 250928 Total Records
 OUTPUT: Top 10 Bins by Records
 sPort|   Records|  %Records|   cumul_%|
    53|     62216| 24.794363| 24.794363|
    22|     27994| 11.156188| 35.950552|
    80|     27044| 10.777594| 46.728145|
    25|     15568|  6.204170| 52.932315|
   443|      7917|  3.155088| 56.087404|
    67|      7807|  3.111251| 59.198655|
     0|      7801|  3.108860| 62.307515|
   123|      7741|  3.084949| 65.392463|
  8080|      3946|  1.572563| 66.965026|
 29921|       117|  0.046627| 67.011653|

Alternatively, process the textual output of rwuniq with the UNIX sort(1) utility.

 $ rwuniq --fields=sport data.rw  \
   | sort -r -t '|' -k 2 | head
 sPort|   Records|
    53|     62216|
    22|     27994|
    80|     27044|
    25|     15568|
   443|      7917|
    67|      7807|
     0|      7801|
   123|      7741|
  8080|      3946|

Use the --values field to change the volume that rwuniq computes for each key. This example prints the byte-, packet-, and record-counts for each protocol, sorting the results by protocol.

 $ rwuniq --fields=proto --values=bytes,packets,records --sort data.rw
 pro|               Bytes|        Packets|   Records|
   1|             5344836|          73473|      7801|
   6|         59945492930|       72127917|    165363|
  17|            17553593|          77764|     77764|

The --threshold switch limits the output to rows where a value field meets a minimum value or falls within a specific range. For example, print the number of records and packets seen for each source port for bins having at least 1000 records.

 $ rwuniq --fields=sport --values=records,packets \
        --threshold=records=1000 data.rw
 sPort|   Records|        Packets|
    53|     62216|          62216|
    22|     27994|       23434615|
    67|      7807|           7807|
    80|     27044|        8271125|
     0|      7801|          73473|
   123|      7741|           7741|
    25|     15568|         427777|
   443|      7917|        2421124|
  8080|      3946|        1202528|

Multiple thresholds may be specified.

 $ rwuniq --fields=sport --values=records,packets                 \
        --threshold=records=1000-5000 --threshold=packets=1000000 \
        data.rw
 sPort|   Records|        Packets|
  8080|      3946|        1202528|

The --bin-time switch adjusts the times used by the sTime and eTime key fields. An argument of 86400 moves the starting and ending time to day boundaries.

 $ rwuniq --bin-time=86400 --fields=stime,etime data.rw
               sTime|              eTime|   Records|
 2009/02/12T00:00:00|2009/02/12T00:00:00|     82969|
 2009/02/12T00:00:00|2009/02/13T00:00:00|       360|
 2009/02/13T00:00:00|2009/02/13T00:00:00|     83594|
 2009/02/13T00:00:00|2009/02/14T00:00:00|       332|
 2009/02/14T00:00:00|2009/02/14T00:00:00|     83673|

The --bin-time switch does not adjust the duration value unless both sTime and eTime are given.

 $ rwuniq --bin-time=86400 --fields=stime,dur --sort data.rw | head -6
               sTime|durat|   Records|
 2009/02/12T00:00:00|    0|     29523|
 2009/02/12T00:00:00|    1|      4312|
 2009/02/12T00:00:00|    2|      4376|
 2009/02/12T00:00:00|    3|      3986|
 2009/02/12T00:00:00|    4|       923|

 $ rwuniq --bin-time=86400 --fields=stime,dur,etime data.rw
               sTime|durat|              eTime|   Records|
 2009/02/12T00:00:00|    0|2009/02/12T00:00:00|     82969|
 2009/02/12T00:00:00|86400|2009/02/13T00:00:00|       360|
 2009/02/13T00:00:00|    0|2009/02/13T00:00:00|     83594|
 2009/02/13T00:00:00|86400|2009/02/14T00:00:00|       332|
 2009/02/14T00:00:00|    0|2009/02/14T00:00:00|     83673|

As of SiLK 3.17.0, the --bin-time switch accepts a floating point value. When the fractional part is non-zero, rwuniq uses millisecond precision for the times and the duration.

 $ rwuniq --bin-time=0.001 --fields=duration data.rw | head -6
  duration|   Records|
     0.000|     85565|
  1791.045|         4|
     2.120|        19|
    22.263|         5|
    19.902|         3|

The --bin-time does not adjust the sTime-Earliest and eTime-Latest aggregate value fields, but it does determine whether those fields maintain millisecond precision.

 $ rwuniq --bin-time=86400 --fields=stime --value=etime data.rw
               sTime|       eTime-Latest|
 2009/02/12T00:00:00|2009/02/12T00:29:59|
 2009/02/13T00:00:00|2009/02/13T00:29:58|
 2009/02/14T00:00:00|2009/02/14T00:29:59|

 $ rwuniq --bin-time=0.001 --fields=proto --value=stime,etime data.rw
 pro|         sTime-Earliest|           eTime-Latest|
  17|2009/02/12T00:00:02.745|1970/01/15T06:57:35.997|
   6|2009/02/12T00:00:03.004|1970/01/15T06:57:35.998|
   1|2009/02/12T00:00:20.601|1970/01/15T06:57:35.992|

With an input of both IPv4 and IPv6 records, rwuniq maps the IPv4 records into the ::ffff:0:0/96 netblock. The data is normally mapped back to IPv4 on output. Given this input:

 $ rwcut --fields=sip,packets /tmp/v4v6.rw
                                     sIP|   packets|
                                     ::1|        45|
                              192.0.2.22|        87|
                    ::ffff:203.0.113.113|      2662|
                  2001:db8:54:32:ab:cd::|       345|
c
The B<rwuniq> tool produces:

 $ rwuniq --fields=sip --values=packets /tmp/v4v6.rw
                                     sIP|        Packets|
                                     ::1|             45|
                              192.0.2.22|             87|
                           203.0.113.113|           2662|
                  2001:db8:54:32:ab:cd::|            345|

Set the --ip-format to map-v4 to leave the values as IPv4-mapped IPv6. (Using an --ipv6-policy of force-ipv6 has the same effect.)

 $ rwuniq --fields=sip --values=packets --ip-format=map-v4 /tmp/v4v6.rw
                                     sIP|        Packets|
                                     ::1|             45|
                       ::ffff:192.0.2.22|             87|
                    ::ffff:203.0.113.113|           2662|
                  2001:db8:54:32:ab:cd::|            345|

Print the source addresses that sent more than 10,000,000 bytes, and for each address print the number of unique destination hosts it contacted:

 $ rwuniq --fields=sip --values=bytes,distinct:dip \
        --threshold=bytes=10000000 data-v6.rw
                       sIP|               Bytes|dIP-Distin|
      2001:db8:a:fd::90:bd|            14529210|         2|

Print the number of bytes that host shared with each destination (first use rwfilter to limit the input to that host):

 $ rwfilter --saddr=2001:db8:a:fd::90:bd --pass=- data-v6.rw        \
   | rwuniq --fields=dip --values=bytes
                       dIP|               Bytes|
     2001:db8:c0:a8::fa:5d|             7097847|
      2001:db8:c0:a8::dd:6|             7431363|

Print the packet and byte counts for each IPv4 source-destination pair, where the prefix length is 16 (use rwnetmask(1) on the input to rwuniq):

 $ rwnetmask --4sip-prefix=16 --4dip-prefix=16 data.rw      \
   | rwuniq --fields=sip,dip --values=packet,byte | head
            sIP|            dIP|  Packets|        Bytes|
     10.139.0.0|    192.168.0.0|    33490|     22950353|
      10.40.0.0|    192.168.0.0|      258|        18544|
     10.204.0.0|    192.168.0.0|   353233|    288736424|
     10.106.0.0|    192.168.0.0|    13051|      3843693|
      10.71.0.0|    192.168.0.0|     4355|      1391194|
      10.98.0.0|    192.168.0.0|     7312|      7328359|
     10.114.0.0|    192.168.0.0|     2538|      4137927|
     10.168.0.0|    192.168.0.0|    92094|     86883062|
     10.176.0.0|    192.168.0.0|   122101|    116555051|

Given a file of scan traffic, print the source of TCP traffic with no more than 3 packets and which also appears at least 4 times. First use rwfilter to limit the traffic to TCP and find the flow records where the packet count in that flow record is no more than 3.

 $ rwfilter --proto=6 --packets=1-3 --pass=- scandata.rw          \
   | rwuniq --field=sip --values=flow,packets --threshold=flows=4 \
   | head -5
             sIP|   Records|        Packets|
   10.249.216.38|       256|            256|
    10.155.55.93|       256|            256|
   10.61.255.154|       256|            256|
    10.60.122.82|       256|            256|

The silkpython(3) manual page provides examples that use PySiLK to create arbitrary fields to use as part of the key for rwuniq.

ENVIRONMENT

SILK_IPV6_POLICY

This environment variable is used as the value for --ipv6-policy when that switch is not provided.

SILK_IP_FORMAT

This environment variable is used as the value for --ip-format when that switch is not provided. Since SiLK 3.11.0.

SILK_TIMESTAMP_FORMAT

This environment variable is used as the value for --timestamp-format when that switch is not provided. Since SiLK 3.11.0.

SILK_PAGER

When set to a non-empty string, rwuniq automatically invokes this program to display its output a screen at a time. If set to an empty string, rwuniq does not automatically page its output.

PAGER

When set and SILK_PAGER is not set, rwuniq automatically invokes this program to display its output a screen at a time.

SILK_TMPDIR

When set and --temp-directory is not specified, rwuniq writes the temporary files it creates to this directory. SILK_TMPDIR overrides the value of TMPDIR.

TMPDIR

When set and SILK_TMPDIR is not set, rwuniq writes the temporary files it creates to this directory.

PYTHONPATH

This environment variable is used by Python to locate modules. When --python-file is specified, rwuniq must load the Python files that comprise the PySiLK package, such as silk/__init__.py. If this silk/ directory is located outside Python's normal search path (for example, in the SiLK installation tree), it may be necessary to set or modify the PYTHONPATH environment variable to include the parent directory of silk/ so that Python can find the PySiLK module.

SILK_PYTHON_TRACEBACK

When set, Python plug-ins print traceback information on Python errors to the standard error.

SILK_COUNTRY_CODES

This environment variable allows the user to specify the country code mapping file that rwuniq uses when computing the scc and dcc fields. The value may be a complete path or a file relative to the SILK_PATH. See the "FILES" section for standard locations of this file.

SILK_ADDRESS_TYPES

This environment variable allows the user to specify the address type mapping file that rwuniq uses when computing the sType and dType fields. The value may be a complete path or a file relative to the SILK_PATH. See the "FILES" section for standard locations of this file.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the "FILES" section, rwuniq may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files and plug-ins, rwuniq may use this environment variable. See the "FILES" section for details.

TZ

When the argument to the --timestamp-format switch includes local or when a SiLK installation is built to use the local timezone, the value of the TZ environment variable determines the timezone in which rwuniq displays timestamps. (If both of those are false, the TZ environment variable is ignored.) If the TZ environment variable is not set, the machine's default timezone is used. Setting TZ to the empty string or 0 causes timestamps to be displayed in UTC. For system information on the TZ variable, see tzset(3) or environ(7). (To determine if SiLK was built with support for the local timezone, check the Timezone support value in the output of rwuniq --version.)

SILK_PLUGIN_DEBUG

When set to 1, rwuniq prints status messages to the standard error as it attempts to find and open each of its plug-ins. In addition, when an attempt to register a field fails, rwuniq prints a message specifying the additional function(s) that must be defined to register the field in rwuniq. Be aware that the output can be rather verbose.

SILK_TEMPFILE_DEBUG

When set to 1, rwuniq prints debugging messages to the standard error as it creates, re-opens, and removes temporary files.

SILK_UNIQUE_DEBUG

When set to 1, the binning engine used by rwuniq prints debugging messages to the standard error.

FILES

${SILK_ADDRESS_TYPES}
${SILK_PATH}/share/silk/address_types.pmap
${SILK_PATH}/share/address_types.pmap
/usr/share/silk/address_types.pmap
/usr/share/address_types.pmap

Possible locations for the address types mapping file required by the sType and dType fields.

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/share/silk/silk.conf
/usr/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

${SILK_COUNTRY_CODES}
${SILK_PATH}/share/silk/country_codes.pmap
${SILK_PATH}/share/country_codes.pmap
/usr/share/silk/country_codes.pmap
/usr/share/country_codes.pmap

Possible locations for the country code mapping file required by the scc and dcc fields.

${SILK_PATH}/lib64/silk/
${SILK_PATH}/lib64/
${SILK_PATH}/lib/silk/
${SILK_PATH}/lib/
/usr/lib64/silk/
/usr/lib64/
/usr/lib/silk/
/usr/lib/

Directories that rwuniq checks when attempting to load a plug-in.

${SILK_TMPDIR}/
${TMPDIR}/
/tmp/

Directory in which to create temporary files.

NOTES

If multiple thresholds are given (e.g., --threshold=bytes=80 --threshold=flows=2), the values must meet all thresholds before the record is printed. For example, if a given key saw a single 100-byte flow, the entry would not printed given the switches above.

rwuniq functionally replaces the combination of

 rwcut | sort | uniq -c

To get a list of unique IP addresses in a data set without the counting or threshold abilities of rwuniq, consider using the IPset tools rwset(1) and rwsetcat(1) for improved performance:

 rwset --sip-set=stdout | rwsetcat --print-ips

For situations where the key and value are each a single field, the Bag tools (rwbag(1), rwbagcat(1)) often provide better performance, especially when the key length is one or two bytes:

 rwbag --bag-file=sport,bytes,stdout | rwbagcat

To create a binary file that contains rwuniq-like output, use rwaggbag(1) or rwaggbagbuild(1). The content of these files may be printed with rwaggbagcat(1).

rwgroup(1) works similarly to rwuniq, except the data remains in the form of SiLK Flow records, and the next-hop-IP field is modified to denote the records that form a bin.

rwstats(1) can do the same binning as rwuniq, and then sort the data by an aggregate field.

When the --bin-time switch is given and the three time fields (starting-time (sTime), ending-time (eTime), and duration (duration)) are present in the key, the duration field's value will be modified to be the difference between the ending and starting times.

When the three time-related key fields (sTime,duration,eTime) are all in use, rwuniq will ignore the final time field when binning the records, but the field will appear in the output. Due to truncation of the milliseconds values, rwuniq will print a different number of rows depending on the order in which those three values appear in the --fields switch.

rwuniq supports counting distinct source and/or destination IPs. To see the number of distinct sources for each 10 minute bin, run:

 rwuniq --fields=stime --values=distinct:sip --bin-time=600 --sort-output

When computing distinct counts over a field, the field may not be part of the key; that is, you cannot have --fields=sip --values=sip-distinct.

Using the --presorted-input switch sometimes introduces more issues than it solves, and --presorted-input is less necessary now that rwuniq can use temporary files while processing input.

When computing distinct IP counts, rwuniq will typically run faster if you do not use the --presorted-input switch, even if the data was previously sorted.

When using the --presorted-input switch, it is highly recommended that you use no more than one time-related key field (sTime, duration, eTime) in the --fields switch and that the time-related key appear last in --fields. The issue is caused by rwsort considering the millisecond values on the times when sorting, while rwuniq truncates the millisecond value. The result may be unsorted output and multiple rows in the output that have the same values for the key fields:

 $ rwsort --fields=stime,duration data.rw       \
   | rwuniq --fields=stime,dur --presorted
               sTime|durat|   Records|
 ...
 2009/02/12T00:00:57|    0|         2|
 2009/02/12T00:00:57|   29|         2|
 2009/02/12T00:00:57|    0|         2|
 2009/02/12T00:00:57|   13|         2|
 ...

rwuniq's strength is its ability to build arbitrary keys and aggregate fields. For a key of a single IP address, see rwaddrcount(1) and rwbag(1); for a key made up of a single CIDR block (/8, /16, /24 only), a single port, or a single protocol, use rwtotal(1) or rwbag(1).

As of SiLK 3.17.0, fields that are specified with the legacy thresholding switches (e.g., --bytes) and not with --values are printed in the order in which those switches appear. Previously, the order was always bytes, packets, flows, stime, etime, sip-distinct, dip-distinct.

SEE ALSO

rwfilter(1), rwbag(1), rwbagcat(1), rwaggbag(1), rwaggbagbuild(1), rwaggbagcat(1), rwcut(1), rwset(1), rwsetcat(1), rwaddrcount(1), rwgroup(1), rwstats(1), rwnetmask(1), rwsort(1), rwtotal(1), rwcount(1), rwpmapbuild(1), addrtype(3), ccfilter(3), pmapfilter(3), pysilk(3), silkpython(3), silk-plugin(3), sensor.conf(5), rwflowpack(8), silk(7), yaf(1), dlopen(3), tzset(3), environ(7)