NAME
rwuniq - Bin SiLK Flow records by a key and print the bins' volume
SYNOPSIS
rwuniq --fields=KEY [--dynamic-library=DYNLIB] [--presorted-input]
[--all-counts] [{--flows | --flows=MIN | --flows=MIN-MAX}]
[{--packets | --packets=MIN | --packets=MIN-MAX}]
[{--bytes | --bytes=MIN | --bytes=MIN-MAX}]
[--stime] [--etime]
[{--sip-distinct | --sip-distinct=MIN | --sip-distinct=MIN-MAX}]
[{--dip-distinct | --dip-distinct=MIN | --dip-distinct=MIN-MAX}]
[{--bin-time | --bin-time=SECONDS}] [--epoch-time]
[{--integer-ips | --zero-pad-ips}] [--integer-sensors]
[--no-titles] [--no-columns] [--column-separator=CHAR]
[--no-final-delimiter] [{--delimited | --delimited=CHAR}]
[--print-filenames] [--copy-input=PATH] [--output-path=PATH]
[--pager=PAGER_PROG]
[--ipv6-policy={ignore,asv4,mix,force,only}]
[--site-config-file=FILENAME]
[{--legacy-timestamps | --legacy-timestamps=NUM}] [FILES...]
DESCRIPTION
rwuniq reads SiLK Flow records from files named on the command line or from the standard input when no file names are given, bins the flows by a key composed of user-specified attributes of the SiLK Flow records. For each bin, a user selected combination of flow count, packet sum, byte sum, earliest start time, most recent end time, distinct source address count, and/or distinct destination address count are computed. Once all the input is processed (or rwuniq exhausts the machine's memory), the key fields and the computed values are printed for each bin that meets the user specified minima and maxima.
The flow attribute(s) (or field(s)) that make up the KEY for each
bin are selected by the user, with the available fields being the same
as those supported by rwcut. See the description of the --fields
switch in the OPTIONS section below for the names of the keys. It is
an error if the user does not specify the --fields switch. The
total byte count of the key is basically unlimited, but a larger key
will more quickly exhaust the memory.
When rwuniq is not told what to compute, it computes the number of flows for each key and prints all keys that had at least one flow.
The --presorted-input switch allows rwuniq to efficiently process data that has been previously sorted with the rwsort(1) command. With this switch, rwuniq does not need large amounts of memory because it does not bin each flow; instead, it keeps running summations and outputs a bin whenever the key changes. For the output to be meaningful, rwsort and rwuniq must be invoked with the same --fields value.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
The --fields switch is required. rwuniq will fail when it is not provided.
- --fields=KEY
- KEY contains the list of flow attributes (a.k.a. fields or columns) that make up the key into which flows are binned. The columns will be displayed in the order the fields are specified. Each field may be specified once only.
-
KEY is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-), e.g.,
-
--fields=stime,10,1-5
-
There is no default value for the --fields switch; the switch must be specified.
-
The complete list of built-in fields that the SiLK tool suite supports follows, though note that not all fields are present in all SiLK file formats; when a field is not present, its value is 0.
- sIP,sip,1
- source IP address
- dIP,dip,2
- destination IP address
- sPort,sport,3
- source port for TCP and UDP, or equivalent
- dPort,dport,4
- destination port for TCP and UDP, or equivalent
- protocol,5
- IP protocol
- packets,pkts,6
- packet count
- bytes,7
- byte count
- flags,8
- bit-wise OR of TCP flags over all packets
- sTime,stime,9
- starting time of flow (seconds resolution)
- dur,10
- duration of flow (seconds resolution)
- eTime,etime,11
- end time of flow (seconds resolution)
- sensor,12
- name or ID of sensor at the collection point
- class,20
- class of sensor at the collection point
- type,21
- type of sensor at the collection point
- icmpTypeCode,icmptypecode,25
-
include two columns,
iTypeandiCodethat contain the ICMP type and code for ICMP flows; for non-ICMP flows, these columns are empty - initialFlags,initialflags,26
- TCP flags on first packet in the flow
- sessionFlags,sessionflags,27
- bit-wise OR of TCP flags over all packets except the first in the flow
- attributes,28
- flow attributes set by flow collector:
T
- flow collector generated a flow record for a long-running connection due to timeout.
C
- this flow is a continuation of a long-running connection that the collector terminated.
F
- additional non-ACK packets seen after a packet with the FIN flag set.
- application,29
- guess as to the application generating the flow; value will be standard port for the application, such as 80 for web traffic
- stype,16
- for the source IP address, the value 0 if the address is non-routable, 1 if it is internal, or 2 if it is routable and external. See addrtype(3).
- dtype,17
- as stype for the destination IP address
- scc,18
-
for the source IP, a two-letter country code abbreviation denoting the
country who
ownsthat IP address. See ccfilter(3). - dcc,19
- as scc for the destination IP
- sval
- value from the user-defined mapping (see the --pmap-file switch) for the source. For an IP-based map, this corresponds to sip. For a proto-port-based map, it is protocol/sport. See pmapfilter(3)
- dval
- as sval for the destination IP or proto/dport.
- --dynamic-library=DYNLIB
-
Augment the list of fields by using run-time loading of the plug-in
(shared object) whose path is DYNLIB. The creation of these
plug-ins is beyond the scope of this manual page. When DYNLIB
contains a slash (
/), rwuniq assumes the path to DYNLIB is correct. Otherwise, rwuniq will attempt to find the file in $SILK_PATH/lib/silk, $SILK_PATH/share/lib, $SILK_PATH/lib, and in these directories parallel to the application's directory: lib/silk, share/lib, and lib. If rwuniq does not find the file, it assumes the plug-in is in the current directory. To force rwuniq to look in the current directory first, specify --dynamic-library=./DYNLIB. When the SILK_DYNLIB_DEBUG environment variable is non-empty, rwuniq prints status messages to the standard error as it tries to open each of its plug-ins. - --presorted-input
- Cause rwuniq to assume that it is reading sorted input; i.e., that rwuniq's input was generated by rwsort(1) using the exact same value for the --fields switch. This option allows rwuniq to process an endless stream of records.
Many SiLK file formats do not store the following fields and their values will always be 0; they are listed here for completeness:
SiLK can store flows generated by enhanced collection software that provides more information than NetFlow v5. These flows may support some or all of these additional fields; for flows without this additional information, the field's value is always 0.
The list of built-in fields may be augmented by run-time loading of plug-ins (shared object files or dynamic libraries) when the plug-in is available. rwuniq automatically looks for the following plug-ins:
ADDRESS TYPE (addrytype.so)
COUNTRY CODE (ccfilter.so)
PREFIX MAP (pmapfilter.so)
The next eight options determine what summations are computed and printed. If none of these eight options are provided, flows are counted for consistency with earlier versions of rwuniq, though this behavior may change in a future release.
- --all-counts
- Enable the next five sets of options with their default thresholds; i.e., all possible counts (except the distinct counts) are computed and printed.
- --flows
- --flows=MIN
- --flows=MIN-MAX
- Cause rwuniq to sum the number of flow records in each uniquely keyed bin. When MIN is provided, bins are printed only when they had at least MIN number of flows. When MAX is also provided, bins are printed only when they had no more than MAX flows. A MIN of 0 is treated as 1. When MIN is not provided, a default of 1 is used.
- --packets
- --packets=MIN
- --packets=MIN-MAX
- Cause rwuniq to sum, for each unique key, the number of packets in each flow record. When MIN is provided, keys are printed only when they had at least MIN sum of packets. When MAX is also provided, bins are printed only when they had no more than MAX sum of packets. A MIN of 0 is treated as 1. When MIN is not provided, a default of 1 is used.
- --bytes
- --bytes=MIN
- --bytes=MIN-MAX
- Cause rwuniq to total, for each unique key, the number of bytes in each flow record. When MIN is provided, keys are printed only when they had at least MIN total bytes. When MAX is also provided, bins are printed only when they had no more than MAX total bytes. A MIN of 0 is treated as 1. When MIN is not provided, a default of 1 is used.
- --stime
- Cause rwuniq to keep track of the earliest time at which it saw a flow that matched each bin's unique key. This option does not support thresholds.
- --etime
- Cause rwuniq to keep track of the latest (most recent) time at which it saw a flow that matched each bin's unique key. This option does not support thresholds.
- --sip-distinct
- --sip-distinct=MIN
- --sip-distinct=MIN-MAX
- Cause rwuniq to count the number of distinct source IP addresses that were seen for each uniquely keyed bin. When MIN is provided, bins are printed only when they had at least MIN distinct sources. When MAX is also provided, bins are printed only when they had no more than MAX distinct sources. A MIN of 0 is treated as 1. When MIN is not provided, a default of 1 is used. When this switch is provided, the sIP field cannot be part of the key.
- --dip-distinct
- --dip-distinct=MIN
- --dip-distinct=MIN-MAX
- As --sip-distinct for destination IP addresses.
- --bin-time
- --bin-time=SECONDS
- Adjust the key fields 'sTime' and 'eTime' to appear on SECONDS-second boundaries (the floor of the time is used). When no value is provided to the switch, 60-second time bins are used. (When the start-time is the only key field and time binning is desired, consider using rwcount(1) instead.)
The appearance of the output is controlled by this next set of switches:
- --epoch-time
- Print timestamps as epoch time (number of seconds since midnight GMT on 1970-01-01).
- --integer-ips
- Print IPs as integers. By default, IP addresses are printed as dotted decimal.
- --zero-pad-ips
-
Print IP addresses in dotted decimal, but use three digits per octet
by adding zero-padding, e.g,
000.000.000.000. - --integer-sensors
- Print the integer ID of the sensor rather than its name.
- --no-titles
- Turn off column titles. By default, titles are printed.
- --no-columns
- Disable fixed-width columnar output.
- --column-separator=C
- Use specified character between columns and after the final column. When this switch is not specified, the default of '|' is used.
- --no-final-delimiter
- Do not print the column separator after the final column. Normally a delimiter is printed.
- --delimited
- --delimited=C
- Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default '|'.
- --print-filenames
- Prints to the standard error the names of input files as they are opened.
- --copy-input=PATH
-
Copy all binary input to the specified file or named pipe. PATH
can be
stdoutto print flows to the standard output as long as the --output-path switch has been used to redirect rwuniq's ASCII output. - --output-path=PATH
- Determines where the output of rwuniq (ASCII text) is written. If this option is not given, output is written to the standard output.
- --pager=PAGER_PROG
- When output is to a terminal, invoke the program PAGER_PROG to view the output one screen full at a time. This switch overrides the SILK_PAGER environment variable, which in turn overrides the PAGER variable. If the value of the pager is determined to be the empty string, no paging will be performed and all output will be printed to the terminal.
- --ipv6-policy=POLICY
- Determine how IPv4 and IPv6 flows are handled when SiLK has been compiled with IPv6 support. When the switch is not provided, the SILK_IPV6_POLICY environment variable is checked for a policy. If it is also unset or contains an invalid policy, the POLICY is mixed. When SiLK has not been compiled with IPv6 support; IPv6 flows are always ignored, regardless of the value passed to this switch or in the SILK_IPV6_POLICY variable. The supported values for POLICY are:
- ignore
- Completely ignore IPv6 flows. Only IPv4 flows will be printed.
- asv4
- Convert IPv6 addresses to IPv4 if possible, otherwise ignore the IPv6 flows.
- mix
- Process the input as a mixture of IPv4 and IPv6 flows.
- force
- Force IPv4 flows to be converted to IPv6.
- only
- Only process flows that were marked as IPv6 and completely ignore IPv4 flows.
- --site-config-file=FILENAME
- Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK_CONFIG_FILE environment variable is used if that variable is not empty. The value of SILK_CONFIG_FILE should include the name of the file. Otherwise, the application looks for a file named silk.conf in the following directories: the directory specified in the SILK_DATA_ROOTDIR environment variable; the data root directory that is compiled into SiLK (use the --version switch to view this value); the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/; and the share/silk/ and share/ directories parallel to the application's directory.
- --legacy-timestamps
- --legacy-timestamps=NUM
-
Specify the format for human readable timestamps, either the default
(new) style,
YYYY/MM/DDThh:mm:ss, or the legacy style,MM/DD/YYYY hh:mm:ss. When this switch is not present, the timestamps will be in the default format. When this switch is present and no argument is given, timestamps are in the legacy format. When an argument is supplied, timestamps will be in the new format if the argument begins with 0, and in the old format if the argument begins with 1. Any other argument to the switch is an error. - --pmap-file=PATH
- When the pmapfilter(3) plug-in is used, this switch gives the path to mapping file.
- --pmap-column-width=NUM
- When the pmapfilter plug-in is used, this switch gives the maximum number of characters to use when displaying the textual value of any field.
- --threshold=THRESHOLD
- The --threshold switch is deprecated. Use the --flows switch instead.
-
This switch causes rwuniq to count flows. THRESHOLD is an integer argument, and indicates the minimal number of records that must be seen before the key is reported.
When none of --flows, --packets, --bytes, --stime, or --etime is specified, rwuniq acts as if --flows=1 was given.
When one of --bytes, --packets, --stime, --etime, --sip-distinct, or --dip-distinct is given, rwuniq will not compute the total flows unless the --flows switch is also specified.
EXAMPLES
To look for the most prevalent source IP addresses:
rwfilter ... --pass=stdout | rwuniq --field=1 | \
sort -t '|' -r -k 2 | head -11
sIP| Records|
10.1.1.9| 133|
10.1.1.1| 120|
10.1.1.5| 100|
10.1.1.2| 91|
10.1.1.4| 82|
10.1.1.6| 64|
10.1.1.3| 41|
10.1.1.7| 29|
10.1.1.10| 28|
10.1.1.8| 16|
To look for all TCP traffic with less than 3 packets, and which also appears at least 3 times, combine rwfilter and rwuniq:
rwfilter --start-date=2003/01/01:00 --end-date=2003/01/01:23 \
--proto=6 --packets=1-3 --pass=stdout \
| rwuniq --field=1 --threshold=3 | head -11
sIP| Records|
10.1.1.4| 6|
10.1.1.6| 4|
10.1.1.1| 102|
10.1.1.9| 3|
10.1.1.2| 4|
10.1.1.8| 17|
10.1.1.10| 6|
10.1.1.7| 7|
10.1.1.3| 7|
10.1.1.5| 2491|
ENVIRONMENT
- SILK_IPV6_POLICY
- This environment variable is used as the value for the --ipv6-policy when that switch is not provided.
- SILK_PAGER
- When set to a non-empty string, rwuniq automatically invokes this program to display its output a screen at a time. If set to an empty string, rwuniq does not automatically page its output.
- PAGER
- When set and SILK_PAGER is not set, rwuniq automatically invokes this program to display its output a screen at a time.
- SILK_CONFIG_FILE
- This environment variable is used as the value for the --site-config-file when that switch is not provided.
- SILK_DATA_ROOTDIR
- When the --site-config-file switch is not provided and the SILK_CONFIG_FILE environment variable is not set, rwuniq looks for the site configuration file in $SILK_DATA_ROOTDIR/silk.conf.
- SILK_PATH
- This environment variable gives the root of the install tree. As part of its search for the SiLK site configuration file, rwuniq checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share. These directories are also searched when any other configuration file is required (e.g., the country code map). In addition, rwuniq looks for plug-ins in $SILK_PATH/lib/silk, $SILK_PATH/share/lib and $SILK_PATH/lib.
- SILK_DYNLIB_DEBUG
- When set to 1, rwuniq print status messages to the standard error as it tries to open each of its plug-ins.
NOTES
If multiple thresholds are given (e.g., --bytes=80 --flows=2),
the values must meet all thresholds before the record is printed. For
example, if a given key saw a single 100-byte flow, the entry would
not printed given the switches above.
Unless the --presorted-input switch is given, rwuniq uses an hash table internally. The more information requested (complex key or the total of flows, packets, and bytes for each key), the higher the memory usage.
If rwuniq's hash table does run out of memory, rwuniq will stop processing input, print a warning to the standard error, output the entries it has computed to that point, and exit with code 16.
rwuniq functionally replaces the combination of
rwcut | sort | uniq -c
however, rwuniq does not sort its output.
To sort the output of rwuniq that includes IP numbers, consider using the --zero-pad-ips switch or the --integer-ips switch. To convert integer IPs back to dotted-decimal notation, use num2dot(1):
rwuniq --fields=1 --integer-ip | sort -n | num2dot
To get a list of unique IP addresses in a data set without the counting or threshold abilities of rwuniq, consider using IPsets:
rwset --sip-set=stdout | rwsetcat --print-ips
For situations where the key and value are each a single field, consider using Bags:
rwbag --sip-bytes=stdout | rwbagcat
A Top-N list can be created from rwuniq's output by passing the output through the UNIX sort(1) and head(1) commands. For example, to see the top-10 dIP-dPort pairs by packet counts, one would issue:
rwuniq --fields=2,4 --byte --packet --flow --no-title | \
sort -n -r -t '|' -k 4 | head -10
For additional Top-N and Bottom-N lists (but list less control over the fields that make up the key), see rwstats(1).
rwuniq supports counting distinct source and/or destination IPs. To see the number of distinct sources for each 10 minute bin, run:
rwuniq --fields=9 --bin-time=600 --sip-distinct | sort
The final sort is required so that data appears in time-order.
Use of the --sip-distinct and/or --dip-distinct switches can quickly exhaust the available RAM; their implementation uses a binary IPset (see rwset(1)) to count the number of distinct IPs, and these can become large. For each unique key, a new IPset is created, and the size of an IPset size depends on the number of unique /16's that are seen (once an IPset sees an IP in a /16, no additional memory is required to keep track of other IPs in that /16). It is strongly suggested that --sip-distinct or --dip-distinct be used only on small files where the known set of keys is reasonable.
When computing distinct counts over a field, the field may not be part
of the key; that is, you cannot have --fields=sip --sip-distinct.
In a future release of rwuniq, the --sip-distinct and
--dip-distinct switches may be removed in favor of a
--distinct=FIELDS switch.
rwuniq's strength is its ability to build an arbitrary key. For a key of a single IP address, see rwaddrcount(1) and rwbag(1); for a key made up of a single CIDR block (/8, /16, /24 only), a single port, or a single protocol, use rwtotal(1) or rwbag(1).
SEE ALSO
rwfilter(1), rwbag(1), rwset(1), rwaddrcount(1), rwstats(1), rwsort(1), rwtotal(1), rwcount(1), addrtype(3), ccfilter(3), pmapfilter(3)
BUGS
rwuniq should support sorting its output, by subsets of the key or by value fields.
When time-binning is active and the three time fields (starting-time (sTime), ending-time (eTime), and duration) are present in the key, the duration field's value will be modified to be the difference between the ending and starting times.


