rwuniq - Bin SiLK Flow records by a key and print the bins' volume
rwuniq --fields=KEY [--values=VALUES]
[--all-counts] [{--bytes | --bytes=MIN | --bytes=MIN-MAX}]
[{--packets | --packets=MIN | --packets=MIN-MAX}]
[{--flows | --flows=MIN | --flows=MIN-MAX}]
[--stime] [--etime]
[{--sip-distinct | --sip-distinct=MIN | --sip-distinct=MIN-MAX}]
[{--dip-distinct | --dip-distinct=MIN | --dip-distinct=MIN-MAX}]
[--presorted-input] [--sort-output]
[{--bin-time | --bin-time=SECONDS}] [--epoch-time]
[{--integer-ips | --zero-pad-ips}] [--integer-sensors]
[--integer-tcp-flags] [--no-titles] [--no-columns]
[--column-separator=CHAR]
[--no-final-delimiter] [{--delimited | --delimited=CHAR}]
[--print-filenames] [--copy-input=PATH] [--output-path=PATH]
[--pager=PAGER_PROG] [--temp-directory=DIR_PATH]
[{--legacy-timestamps | --legacy-timestamps=NUM}]
[--ipv6-policy={ignore,asv4,mix,force,only}]
[--site-config-file=FILENAME]
[--plugin=PLUGIN [--plugin=PLUGIN ...]]
[--python-file=PATH [--python-file=PATH ...]]
[--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[--pmap-column-width=NUM]
[FILES...]
rwuniq [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[--plugin=PLUGIN ...] [--python-file=PATH ...] --help
rwuniq --version
rwuniq reads SiLK Flow records and groups them by a key composed of user-specified attributes of the flows. For each group (or bin), a collection of user-specified aggregate values is computed; these values are typically related to the volume of the bin, such as the sum of the bytes fields for all records that match the key. Once all the SiLK Flow records are read, the key fields and the aggregate values are printed. For some of the built-in aggregate values, it is possible to limit the output to the bins where the aggregate value meets a user-specified minimum and/or maximum.
The SiLK Flow records are read from the files named on the command
line or from the standard input when no file names are given and the
standard input is not a terminal. To read from both standard input
and files, use stdin or - as the name of an input file.
The flow attribute(s) (or field(s)) that make up the key for each bin are selected by the user, with the available fields being similar to those supported by rwcut. See the description of the --fields switch in the OPTIONS section below for the names of the keys. The list of fields can be extended by loading PySiLK files (see silkpython(3)) or plug-ins. The user must specify the --fields switch. The fields will be printed in the order in which they occur in the --fields switch. The size of the key is basically unlimited, but a larger key will more quickly use the available the memory leading to slower performance.
The aggregate value(s) to compute for each bin are also chosen by the user. As with the key fields, the user can extend the list of aggregate fields by using PySiLK or plug-ins. The preferred way to specify the aggregate fields is to use the --values switch; the aggregate fields will be printed in the order they occur in the --values switch. The thresholding switches (e.g., --bytes) can also be used to specify the aggregate values to compute. Aggregate values that are only specified with thresholding switches will be printed after those that appear in --values, in the following order for backward compatibility: bytes, packets, flows, stime, etime, sip-distinct, dip-distinct. If the user does not select any aggregate value(s), rwuniq defaults to computing the number of flow records for each bin and printing all bins. As with the key fields, requesting more aggregate values slows performance.
The --presorted-input switch may allow rwuniq to process data
more efficiently by causing rwuniq to assume the input has been
previously sorted with the rwsort(1) command. With this switch,
rwuniq does not need large amounts of memory because it does not
bin each flow; instead, it keeps a running summation and outputs the
bin whenever the key changes. For the output to be meaningful,
rwsort and rwuniq must be invoked with the same --fields
value. When multiple input files are specified and
--presorted-input is given, rwuniq will merge-sort the flow
records from the input files. rwuniq will usually run faster if
you do not include the --presorted-input switch when counting
distinct IP addresses, even when reading sorted input. Finally, you
may get unusual results with --presorted-input when the --fields
switch contains multiple time-related key fields (sTime, dur,
eTime), or when the time-related key is not the final key listed in
--fields.
rwuniq attempts to keep all key and aggregate value data in the computer's memory. If rwuniq runs out of memory, the current key and aggregate value data is written to a temporary file. Once all input has been processed, the data from the temporary files is merged to produce the final output. By default, these temporary files are stored in the /tmp directory. Because these files can be large, it is strongly recommended that /tmp not be used as the temporary directory. To modify the temporary directory used by rwuniq, provide the --temp-directory switch, set the SILK_TMPDIR environment variable, or set the TMPDIR environment variable.
When SiLK is compiled with IPv6 support, computing the number of
distinct addresses is limited. Specifically, only one distinct IP
count is supported for unsorted input, and no distinct IP counts are
supported when when --presorted-input is given. Setting the
--ipv6-policy switch to ignore or asv4 will get around this
limitation at the expense of ignoring IPv6 addresses.
rwuniq may run out of memory when computing distinct IP counts, causing the counts for some bins to be smaller than the actual number of distinct IPs. When this occurs, a single warning is printed the standard error noting that rwuniq has run out of memory, processing continues, and rwuniq exits with status 16.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
The --fields switch is required. rwuniq will fail when it is not provided.
KEY contains the list of flow attributes (a.k.a. fields or columns) that make up the key into which flows are binned. The columns will be displayed in the order the fields are specified. Each field may be specified once only. KEY is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Field-names are case insensitive. Example:
--fields=stime,10,1-5
There is no default value for the --fields switch; the switch must be specified.
The complete list of built-in fields that the SiLK tool suite supports follows, though note that not all fields are present in all SiLK file formats; when a field is not present, its value is 0.
source IP address
destination IP address
source port for TCP and UDP, or equivalent
destination port for TCP and UDP, or equivalent
IP protocol
packet count
byte count
bit-wise OR of TCP flags over all packets
starting time of flow (seconds resolution). When the time-related
fields sTime,dur,eTime are all in use, rwuniq will ignore
the final time field when binning the records.
duration of flow (seconds resolution). See note at sTime,9.
end time of flow (seconds resolution). See note at sTime,9.
name or ID of the sensor where the flow was collected
class assigned to the flow by rwflowpack(8). Binning by class
and/or type equates to binning by the integer value used internally
to represent the class/type pair. When --fields contains class
but not type, rwuniq's output will have multiple rows with the
same value(s) for the key field(s).
type assigned to the flow by rwflowpack(8). See note on previous entry.
include two columns, iType and iCode that contain the ICMP type
and code for ICMP flows; for non-ICMP flows, these columns are empty
Many SiLK file formats do not store the following fields and their values will always be 0; they are listed here for completeness:
router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))
router SNMP output interface or portVlanId
router next hop IP
SiLK can store flows generated by enhanced collection software that provides more information than NetFlow v5. These flows may support some or all of these additional fields; for flows without this additional information, the field's value is always 0.
TCP flags on first packet in the flow
bit-wise OR of TCP flags over all packets except the first in the flow
flow attributes set by the flow generator:
Fflow generator saw additional packets in this flow following a packet with a FIN flag (excluding ACK packets)
Tflow generator prematurely created a record for a long-running
connection due to a timeout. (When the flow generator yaf(1) is
run with the --silk switch, it will prematurely create a flow and
mark it with T if the byte count of the flow cannot be stored in a
32-bit value.)
Cflow generator created this flow as a continuation of long-running connection, where the previous flow for this connection met a timeout (or a byte threshold in the case of yaf).
Consider a long-running ssh session that exceeds the flow generator's
active timeout. (This is the active timeout since the flow
generator creates a flow for a connection that still has activity).
The flow generator will create multiple flow records for this ssh
session, each spanning some portion of the total session. The first
flow record will be marked with a T indicating that it hit the
timeout. The second through next-to-last records will be marked with
TC indicating that this flow both timed out and is a continuation
of a flow that timed out. The final flow will be marked with a C,
indicating that it was created as a continuation of an active flow.
guess as to the content the flow. Some software that generates flow records from packet data, such as yaf, will inspect the contents of the packets that make up a flow and use traffic signatures to label the content of the flow. SiLK calls this label the application; yaf refers to it as the appLabel. The application is the port number that is traditionally used for that type of traffic (see the /etc/services file on most UNIX systems). For example, traffic that the flow generator recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard HTTP/web port (80).
The following fields provide a way to label the IPs or ports on a record. These fields require external files to provide the mapping from the IP or port to the label:
for the source IP address, the value 0 if the address is non-routable, 1 if it is internal, or 2 if it is routable and external. Uses the mapping file specified by the SILK_ADDRESS_TYPES environment variable, or the address_types.pmap mapping file, as described in addrtype(3).
as sType for the destination IP address
for the source IP address, a two-letter country code abbreviation denoting the country where that IP address is located. Uses the mapping file specified by the SILK_COUNTRY_CODES environment variable, or the country_codes.pmap mapping file, as described in ccfilter(3). The abbreviations are those used by the Root-Zone Whois Index (see for example http://www.iana.org/cctld/cctld-whois.htm) or the following special codes: -- N/A (e.g. private and experimental reserved addresses); a1 anonymous proxy; a2 satellite provider; o1 other
as scc for the destination IP
label determined by passing the source IP or the protocol/source-port to the user-defined mapping defined in the prefix map associated with MAPNAME. See the description of the --pmap-file switch below and the pmapfilter(3) manual page.
as src-MAPNAME for the destination IP or protocol/destination-port.
These are deprecated field names created by pmapfilter that correspond to src-MAPNAME and dst-MAPNAME, respectively. These fields are available when a prefix map is used that is not associated with a MAPNAME.
Finally, the list of built-in fields may be augmented by the run-time loading of PySiLK code or plug-ins written in C (also called shared object files or dynamic libraries), as described by the --python-file and --plugin switches.
Specify the aggregate values to compute for each bin as a comma separated list of names. Names are case insensitive. When a thresholding switch specifies an aggregate value field that does appear in VALUES, that field is added to end of VALUES. When neither the --values switch nor any thresholding switch is specified, rwuniq counts the number of flow records for each bin. The aggregate fields are printed in the order they occur in VALUES. The names of the built-in value fields follow. This list can be augmented through the use of PySiLK and plug-ins.
Count the number of flow records that mapped to each bin.
Sum the number of packets across all records that mapped to each bin.
Sum the number of bytes across all records that mapped to each bin.
Count the number of distinct source IP addresses that were seen for each bin.
Count the number of distinct destination IP addresses that were seen for each bin.
Augment the list of key fields or aggregate value fields by using
run-time loading of the plug-in (shared object) whose path is
PLUGIN. The switch may be repeated to load multiple plug-ins. The
creation of these plug-ins is described in the silk-plugin(3)
manual page.
When PLUGIN contains a slash (/), rwuniq assumes the path to
PLUGIN is correct. Otherwise, rwuniq will attempt to find the
file in $SILK_PATH/lib/silk, $SILK_PATH/share/lib,
$SILK_PATH/lib, and in these directories parallel to the
application's directory: lib/silk, share/lib, and lib. If
rwuniq does not find the file, it assumes the plug-in is in the
current directory. To force rwuniq to look in the current
directory first, specify --plugin=./PLUGIN. When the
SILK_PLUGIN_DEBUG environment variable is non-empty, rwuniq prints
status messages to the standard error as it tries to open each of its
plug-ins.
The next eight options will add the appropriate aggregate field to --values if the field is not present. The options are processed in the order they appear here, regardless of the order they occur on the command line. Use of these switches without a threshold value is deprecated.
Enable the next five sets of options with their default thresholds; i.e., all possible counts (except the distinct counts) are computed and printed. This switch is deprecated.
Cause rwuniq to total, for each unique key, the number of bytes in each flow record. When MIN is provided, bins are printed only when they had at least MIN total bytes. When MAX is also provided, bins are printed only when they had no more than MAX total bytes. A MIN of 0 is treated as 1. When MIN is not provided, a default of 1 is used.
Cause rwuniq to sum, for each unique key, the number of packets in each flow record. When MIN is provided, bins are printed only when they had at least MIN sum of packets. When MAX is also provided, bins are printed only when they had no more than MAX sum of packets. A MIN of 0 is treated as 1. When MIN is not provided, a default of 1 is used.
Cause rwuniq to sum the number of flow records in each uniquely keyed bin. When MIN is provided, bins are printed only when they had at least MIN number of flows. When MAX is also provided, bins are printed only when they had no more than MAX flows. A MIN of 0 is treated as 1. When MIN is not provided, a default of 1 is used.
Cause rwuniq to keep track of the earliest time at which it saw a flow that matched each bin's unique key. This option does not support thresholds, and it is deprecated.
Cause rwuniq to keep track of the latest (most recent) time at which it saw a flow that matched each bin's unique key. This option does not support thresholds, and it is deprecated.
Cause rwuniq to count the number of distinct source IP addresses that were seen for each uniquely keyed bin. When MIN is provided, bins are printed only when they had at least MIN distinct sources. When MAX is also provided, bins are printed only when they had no more than MAX distinct sources. A MIN of 0 is treated as 1. When MIN is not provided, a default of 1 is used. When this switch is provided, the sIP field cannot be part of the key.
As --sip-distinct for destination IP addresses.
Miscellaneous options:
Cause rwuniq to assume that it is reading sorted input; i.e., that rwuniq's input file(s) were generated by rwsort(1) using the exact same value for the --fields switch. This option allows rwuniq to process an endless stream of records. When multiple input files are specified, rwuniq will merge-sort the flow records from the input files.
Cause rwuniq to present the output in sorted numerical order. The key rwuniq uses for sorting is the same key it uses to index each bin.
Adjust the key fields 'sTime' and 'eTime' to appear on SECONDS-second boundaries (the floor of the time is used). When no value is provided to the switch, 60-second time bins are used. (When the start-time is the only key field and time binning is desired, consider using rwcount(1) instead.)
Print timestamps as epoch time (number of seconds since midnight GMT on 1970-01-01).
Print IPs as integers. By default, IP addresses are printed in their canonical form.
Print IP addresses in their canonical form, but add zeros to the IP
address so it fully fills the width of column. For IPv4, use three
digits per octet, e.g, 127.000.000.001. For IPv6, use four digits
per hexadectet and expand empty hexadectets, e.g.;
0000:0000:0000:0000:0000:FFFF:FF00:0001.
Print the integer ID of the sensor rather than its name.
Print the TCP flag fields (flags, initialFlags, sessionFlags) as an
integer value. Typically, the characters F,S,R,P,A,U,E,C are used
to represent the TCP flags.
Turn off column titles. By default, titles are printed.
Disable fixed-width columnar output.
Use specified character between columns and after the final column. When this switch is not specified, the default of '|' is used.
Do not print the column separator after the final column. Normally a delimiter is printed.
Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default '|'.
Prints to the standard error the names of input files as they are opened.
Copy all binary input to the specified file or named pipe. PATH
can be stdout to print flows to the standard output as long as the
--output-path switch has been used to redirect rwuniq's ASCII
output.
Determines where the output of rwuniq (ASCII text) is written. If this option is not given, output is written to the standard output.
When output is to a terminal, invoke the program PAGER_PROG to view the output one screen full at a time. This switch overrides the SILK_PAGER environment variable, which in turn overrides the PAGER variable. If the value of the pager is determined to be the empty string, no paging will be performed and all output will be printed to the terminal.
Determine how IPv4 and IPv6 flows are handled when SiLK has been compiled with IPv6 support. When the switch is not provided, the SILK_IPV6_POLICY environment variable is checked for a policy. If it is also unset or contains an invalid policy, the POLICY is mixed. When SiLK has not been compiled with IPv6 support; IPv6 flows are always ignored, regardless of the value passed to this switch or in the SILK_IPV6_POLICY variable. The supported values for POLICY are:
Completely ignore IPv6 flows. Only IPv4 flows will be printed.
Convert IPv6 addresses to IPv4 if possible, otherwise ignore the IPv6 flows.
Process the input as a mixture of IPv4 and IPv6 flows.
Force IPv4 flows to be converted to IPv6.
Only process flows that were marked as IPv6 and completely ignore IPv4 flows.
Specify the name of the directory in which to store data files temporarily when the memory is not large enough to store all the bins and their aggregate values. This switch overrides the directory specified in the SILK_TMPDIR environment variable, which overrides the directory specified in the TMPDIR variable, which overrides the default, /tmp.
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK_CONFIG_FILE environment variable is used if that variable is not empty. The value of SILK_CONFIG_FILE should include the name of the file. Otherwise, the application looks for a file named silk.conf in the following directories: the directory specified in the SILK_DATA_ROOTDIR environment variable; the data root directory that is compiled into SiLK (use the --version switch to view this value); the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/; and the share/silk/ and share/ directories parallel to the application's directory.
Specify the format for human readable timestamps, either the default
(new) style, YYYY/MM/DDThh:mm:ss, or the legacy
style, MM/DD/YYYY hh:mm:ss. When this switch
is not present, the timestamps will be in the default format. When
this switch is present and no argument is given, timestamps are in the
legacy format. When an argument is supplied, timestamps will be in
the new format if the argument begins with 0, and in the old format if
the argument begins with 1. Any other argument to the switch is an
error.
Print the available options and exit. Options that add fields can be specified before --help so that the new options appear in the output.
Print the version number and information about how SiLK was configured, then exit the application.
This switch is deprecated. It is an alias for --plugin.
Instruct rwuniq to load the mapping file located at PATH and create the src-MAPNAME and dst-MAPNAME fields. When MAPNAME is provided explicitly, it will be used to refer to the fields specific to that prefix map. If MAPNAME is not provided, rwuniq will check the prefix map file to see if a map-name was specified when the file was created. If no map-name is available, rwuniq creates the fields sval and dval. Multiple --pmap-file switches are supported as long as each uses a unique value for map-name. The --pmap-file switch(es) must precede the --fields switch. For more information, see pmapfilter(3).
When printing a label associated with a prefix map, this switch gives the maximum number of characters to use when displaying the textual value of the field.
When the SiLK Python plug-in is used, rwuniq reads the Python code from the file PATH to define additional fields that can be used as part of the key. This file should call register_plugin_field() for each field it wishes to define. For details and examples, see the silkpython(3) and pysilk(3) manual pages.
To look for the most prevalent source IP addresses (similar to rwstats(1)):
rwuniq --field=sip mydata.rwf | \
sort -t '|' -r -k 2 | head -11
sIP| Records|
10.1.1.9| 133|
10.1.1.1| 120|
10.1.1.5| 100|
10.1.1.2| 91|
10.1.1.4| 82|
10.1.1.6| 64|
10.1.1.3| 41|
10.1.1.7| 29|
10.1.1.10| 28|
10.1.1.8| 16|
To look for all TCP traffic with less than 3 packets, and which also appears at least 3 times, combine rwfilter and rwuniq:
rwfilter --start-date=2003/01/01:00 --end-date=2003/01/01:23 \
--proto=6 --packets=1-3 --pass=stdout \
| rwuniq --field=sip --flows=3 | head -11
sIP| Records|
10.1.1.4| 6|
10.1.1.6| 4|
10.1.1.1| 102|
10.1.1.9| 3|
10.1.1.2| 4|
10.1.1.8| 17|
10.1.1.10| 6|
10.1.1.7| 7|
10.1.1.3| 7|
10.1.1.5| 2491|
The silkpython(3) manual page provides examples that use PySiLK to create arbitrary fields to use as part of the key for rwuniq.
This environment variable is used as the value for the --ipv6-policy when that switch is not provided.
When set to a non-empty string, rwuniq automatically invokes this program to display its output a screen at a time. If set to an empty string, rwuniq does not automatically page its output.
When set and SILK_PAGER is not set, rwuniq automatically invokes this program to display its output a screen at a time.
When set and --temp-directory is not specified, rwuniq writes the temporary files it creates to this directory. SILK_TMPDIR overrides the value of TMPDIR.
When set and SILK_TMPDIR is not set, rwuniq writes the temporary files it creates to this directory.
This environment variable is used by Python to locate modules. When --python-file is specified, rwuniq loads Python which in turn loads the PySiLK module which is comprised of several files (silk/pysilk_nl.so, silk/__init__.py, etc). If this silk/ directory is located outside Python's normal search path (for example, in the SiLK installation tree), it may be necessary to set or modify the PYTHONPATH environment variable to include the parent directory of silk/ so that Python can find the PySiLK module.
When set, Python plug-ins will output traceback information on Python errors to the standard error.
This environment variable allows the user to specify the country code mapping file that rwuniq will use when computing the scc and dcc fields. The value may be a complete path or a file relative to the SILK_PATH. If the variable is not specified, rwuniq checks the following locations:
$SILK_PATH/share/silk/country_codes.pmap $SILK_PATH/share/country_codes.pmap /usr/local/share/silk/country_codes.pmap /usr/local/share/country_codes.pmap
This environment variable allows the user to specify the address type mapping file that rwuniq will use when computing the sType and dType fields. The value may be a complete path or a file relative to the SILK_PATH. If the variable is not specified, rwuniq checks the following locations:
$SILK_PATH/share/silk/address_types.pmap $SILK_PATH/share/address_types.pmap /usr/local/share/silk/address_types.pmap /usr/local/share/address_types.pmap
This environment variable is used as the value for the --site-config-file when that switch is not provided.
When the --site-config-file switch is not provided and the SILK_CONFIG_FILE environment variable is not set, rwuniq looks for the site configuration file in $SILK_DATA_ROOTDIR/silk.conf.
This environment variable gives the root of the install tree. As part of its search for the SiLK site configuration file, rwuniq checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share. These directories are also searched when any other configuration file is required (e.g., the country code map). In addition, rwuniq looks for plug-ins in $SILK_PATH/lib/silk, $SILK_PATH/share/lib and $SILK_PATH/lib.
When set to 1, rwuniq prints status messages to the standard error as it tries to open each of its plug-ins.
If multiple thresholds are given (e.g., --bytes=80 --flows=2), the
values must meet all thresholds before the record is printed. For
example, if a given key saw a single 100-byte flow, the entry would
not printed given the switches above.
rwuniq functionally replaces the combination of
rwcut | sort | uniq -c
To get a list of unique IP addresses in a data set without the counting or threshold abilities of rwuniq, consider using IPsets:
rwset --sip-set=stdout | rwsetcat --print-ips
For situations where the key and value are each a single field, consider using Bags:
rwbag --sip-bytes=stdout | rwbagcat
rwgroup(1) works similarly to rwuniq, except the data remains in the form of SiLK Flow records, and the next-hop-IP field is modified to denote the records that form a bin.
rwstats(1) can do the same binning as rwuniq, and then sort the data by an aggregate field.
When the --bin-time switch is given and the three time fields
(starting-time (sTime), ending-time (eTime), and duration
(dur)) are present in the key, the duration field's value will be
modified to be the difference between the ending and starting times.
When the three time-related key fields (sTime,dur,eTime) are
all in use, rwuniq will ignore the final time field when binning
the records, but the field will appear in the output. Due to
truncation of the milliseconds values, rwuniq will print a
different number of rows depending on the order in which those three
values appear in the --fields switch.
rwuniq supports counting distinct source and/or destination IPs. To see the number of distinct sources for each 10 minute bin, run:
rwuniq --fields=stime --values=sip-distinct --bin-time=600 --sort-output
When computing distinct counts over a field, the field may not be part
of the key; that is, you cannot have --fields=sip
--values=sip-distinct.
When SiLK is compiled with IPv6 support, computing the number of
distinct addresses is limited. Specifically, only one distinct IP
count is supported for unsorted input, and no distinct IP counts are
supported when when --presorted-input is given. Setting the
--ipv6-policy switch to ignore or asv4 will get around this
limitation at the expense of ignoring IPv6 addresses.
Using the --presorted-input switch sometimes introduces more issues than it solves, and --presorted-input is less necessary now that rwuniq can use temporary files while processing input.
When using the --presorted-input switch, it is highly recommended
that you use no more than one time-related key field (sTime,
dur, eTime) in the --fields switch and that the time-related
key appear last in --fields. The issue is caused by rwsort
considering the millisecond values on the times when sorting, while
rwuniq truncates the millisecond value. The result may be unsorted
output and multiple rows in the output that have the same values for
the key fields:
$ rwsort --fields=stime,dur data.rwf \
| rwuniq --fields=stime,dur --presorted
sTime| dur| Records|
...
2009/02/12T00:00:57| 0| 2|
2009/02/12T00:00:57| 29| 2|
2009/02/12T00:00:57| 0| 2|
2009/02/12T00:00:57| 13| 2|
...
When computing distinct IP counts, rwuniq will typically run faster if you do not use the --presorted-input switch, even if the data was previously sorted.
rwuniq may run out of memory when computing distinct IP counts, causing the counts for some bins to be smaller than the actual number of distinct IPs. When this occurs, a single warning is printed the standard error noting that rwuniq has run out of memory, processing continues, and rwuniq exits with status 16.
rwuniq's strength is its ability to build arbitrary keys and aggregate fields. For a key of a single IP address, see rwaddrcount(1) and rwbag(1); for a key made up of a single CIDR block (/8, /16, /24 only), a single port, or a single protocol, use rwtotal(1) or rwbag(1).
rwfilter(1), rwbag(1), rwset(1), rwaddrcount(1), rwgroup(1), rwstats(1), rwsort(1), rwtotal(1), rwcount(1), addrtype(3), ccfilter(3), pmapfilter(3), silkpython(3), pysilk(3), sensor.conf(5), rwflowpack(8), yaf(1)