NAME
rwsort - Sort SiLK Flow records on one or more fields
SYNOPSIS
rwsort --fields=KEY [--presorted-input] [--dynamic-library=DYNLIB]
[--temp-directory=DIR_PATH] [--sort-buffer-size=SIZE]
[--compression-method=COMP_METHOD] [--output-path=PATH]
[--site-config-file=FILENAME]
[ {--input-pipe=PATH | FILE [FILES ...] } ]
DESCRIPTION
rwsort reads SiLK Flow records from the specified --input-pipe,
from the files named on the command line, or from the standard input.
The records are sorted on the field(s) listed by the --fields
switch, and the SiLK Flow records are written sent to the
--output-path or to the standard output if it is not connected to a
terminal. The output from rwsort is binary SiLK Flow records; the
output must be passed into another tool for human-readable output.
rwsort will try to allocate a large (near 2GB) in-memory array to hold the records. If 2GB cannot be allocated, rwsort reduces the requested size until it succeeds. (Use the --sort-buffer-size switch to change this default buffer size.) If more records are read than will fit into memory, the in-core records are sorted and temporarily stored on disk as described by the --temp-directory switch. When all records have been read, the on-disk files are merged.
Because of the sizes of the temporary files, it is strongly recommended that /tmp not be used as the temporary directory, and rwsort will print a warning when /tmp is used. To modify the temporary directory used by rwsort, provide the --temp-directory switch, set the SILK_TMPDIR environment variable, or set the TMPDIR environment variable.
To merge previously sorted files into a sorted stream, run rwsort with the --presorted-input switch. rwsort will merge-sort all the input files, reducing it's memory requirements considerably. It is the user's responsibility to ensure that all the input files have been sorted with the same --fields value. rwsort may still require use of a temporary directory while merging the files (for example, if rwsort does not have enough available file handles to open all the input files at once).
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
The --fields switch is required. rwsort will fail when it is not provided.
- --fields=KEY
- KEY contains the list of flow attributes (a.k.a. fields or columns) that make up the key by which flows are sorted. The fields are in listed in order from primary sort key, secondary key, etc. Each field may be specified once only.
-
KEY is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-), e.g.,
-
--fields=stime,10,1-5
-
There is no default value for the --fields switch; the switch must be specified.
-
The complete list of built-in fields that the SiLK tool suite supports follows, though note that not all fields are present in all SiLK file formats; when a field is not present, its value is 0.
- sIP,sip,1
- source IP address
- dIP,dip,2
- destination IP address
- sPort,sport,3
- source port for TCP and UDP, or equivalent
- dPort,dport,4
- destination port for TCP and UDP, or equivalent
- protocol,5
- IP protocol
- packets,pkts,6
- packet count
- bytes,7
- byte count
- flags,8
- bit-wise OR of TCP flags over all packets
- sTime,stime,9,sTime+msec,stime+msec,22
- starting time of flow (milliseconds resolution)
- dur,10,dur+msec,24
- duration of flow (milliseconds resolution)
- eTime,etime,11,eTime+msec,etime+msec,23
- end time of flow (milliseconds resolution)
- sensor,12
- name or ID of sensor at the collection point
- class,20
- class of sensor at the collection point
- type,21
- type of sensor at the collection point
- icmpTypeCode,icmptypecode,25
- the ICMP type and code
- initialFlags,initialflags,26
- TCP flags on first packet in the flow
- sessionFlags,sessionflags,27
- bit-wise OR of TCP flags over all packets except the first in the flow
- attributes,28
- flow attributes set by flow collector:
T
- flow collector generated a flow record for a long-running connection due to timeout.
C
- this flow is a continuation of a long-running connection that the collector terminated.
F
- additional non-ACK packets seen after a packet with the FIN flag set.
- application,29
- guess as to the application generating the flow; value will be standard port for the application, such as 80 for web traffic
- stype,16
-
categorize the source IP address as
non-routable,internal, orexternaland sort based on the category. See addrtype(3). - dtype,17
- as stype for the destination IP address
- scc,18
- the country code of the source IP address. See ccfilter(3).
- dcc,19
- as scc for the destination IP
- sval
- value from the user-defined mapping (see the --pmap-file switch) for the source. For an IP-based map, this corresponds to sip. For a proto-port-based map, it is protocol/sport. See pmapfilter(3)
- dval
- as sval for the destination IP or proto/dport.
- --temp-directory=DIR_PATH
- Specify the name of the directory in which to store data files temporarily when more records have been read that will fit into RAM. This switch overrides the directory specified in the SILK_TMPDIR environment variable, which overrides the directory specified in the TMPDIR variable, which overrides /tmp.
- --sort-buffer-size=SIZE
-
Set the initial (maximum) size of the buffer to use for sorting the
records, in bytes. A larger buffer means fewer temporary files need
to be created, reducing the I/O wait times. The default maximum for
this buffer is near 2GB. If the buffer cannot be allocated, the
requested size is reduced by 25% and the allocation is attempted
again. This cycle continues until a buffer is allocated or the
minimum buffer size is reached. The SIZE may be given as an
ordinary integer, or as a real number followed by a suffix
K,MorG, which represents the numerical value multiplied by 1,024 (kilo), 1,048,576 (mega), and 1,073,741,824 (giga), respectively. For example, 1.5K represents 1,536 bytes, or one and one-half kilobytes. (This value does not represent the absolute maximum amount of RAM that rwsort will allocate, since additional buffers will be allocated for reading the input and writing the output.) The sort buffer is not used when the --presorted-input switch is specified. - --presorted-input
- Instruct rwsort to merge-sort the input files; that is, rwsort assumes the input files have been previously sorted using the same --fields value that was given for this invocation. This switch can greatly reduce rwsort's memory requirements as a large buffer is not required for sorting the records.
- --dynamic-library=DYNLIB
-
Augment the list of fields by using run-time loading of the plug-in
(shared object) whose path is DYNLIB. The creation of these
plug-ins is beyond the scope of this manual page. When DYNLIB
contains a slash (
/), rwsort assumes the path to DYNLIB is correct. Otherwise, rwsort will attempt to find the file in $SILK_PATH/lib/silk, $SILK_PATH/share/lib, $SILK_PATH/lib, and in these directories parallel to the application's directory: lib/silk, share/lib, and lib. If rwsort does not find the file, it assumes the plug-in is in the current directory. To force rwsort to look in the current directory first, specify --dynamic-library=./DYNLIB. When the SILK_DYNLIB_DEBUG environment variable is non-empty, rwsort prints status messages to the standard error as it tries to open each of its plug-ins. - --compression-method=COMP_METHOD
- Set the compression method of the output to COMP_METHOD. Some SiLK tools can use an external library to compress their binary output. The list of available compression methods and the default method are set when SiLK is compiled (the --help and --version switches print the available and default compression methods) and depend on which supported libraries are found. SiLK can support:
- none
- Do not compress the output using an external library
- zlib
- Use the zlib(3) library for compressing the output
- lzo1x
- Use the lzo1x algorithm from the LZO real time compression library for compression
- best
-
Use whichever available method gives the
bestcompression in general, though not necessarily thebestfor this particular output. - --site-config-file=FILENAME
- Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK_CONFIG_FILE environment variable is used if that variable is not empty. The value of SILK_CONFIG_FILE should include the name of the file. Otherwise, the application looks for a file named silk.conf in the following directories: the directory specified in the SILK_DATA_ROOTDIR environment variable; the data root directory that is compiled into SiLK (use the --version switch to view this value); the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/; and the share/silk/ and share/ directories parallel to the application's directory.
- --output-path=PATH
-
The pathname of the file or named pipe to write the sorted records to.
This switch must not name an existing regular file. When the standard
output is not a terminal and this switch is not provided or its
argument is
stdout, the sorted records are written to the standard output. - --input-pipe=PATH
- The pathname of a named pipe from which to read data. Use of this switch is not required, since rwsort will automatically read data from the standard input when no file names are specified on the command line.
- --pmap-file=PATH
- When the pmapfilter(3) plug-in is used, this switch gives the path to mapping file.
Many SiLK file formats do not store the following fields and their values will always be 0; they are listed here for completeness:
SiLK can store flows generated by enhanced collection software that provides more information than NetFlow v5. These flows may support some or all of these additional fields; for flows without this additional information, the field's value is always 0.
The list of built-in fields may be augmented by run-time loading of plug-ins (shared object files or dynamic libraries) when the plug-in is available. rwsort automatically looks for the following plug-ins:
ADDRESS TYPE (addrytype.so)
COUNTRY CODE (ccfilter.so)
PREFIX MAP (pmapfilter.so)
LIMITATIONS
When the temporary files and the final output are stored on the same file volume, rwsort will require approximately twice as much free disk space as the size of data to be sorted.
When the temporary files and the final output are on different volumes, rwsort will require between 1 and 1.5 times as much free space on the temporary volume as the size of the data to be sorted.
EXAMPLES
To sort the records in fileA based primarily on destination port and secondarily on source IP and write the binary output to fileB, run:
rwsort --fields=4,1 --output-path=fileB fileA
ENVIRONMENT
- SILK_TMPDIR
- When set, rwsort writes the temporary files it creates to this directory. SILK_TMPDIR overrides the value of TMPDIR.
- TMPDIR
- When set and SILK_TMPDIR is not set, rwsort writes the temporary files it creates to this directory.
- SILK_CONFIG_FILE
- This environment variable is used as the value for the --site-config-file when that switch is not provided.
- SILK_DATA_ROOTDIR
- When the --site-config-file switch is not provided and the SILK_CONFIG_FILE environment variable is not set, rwsort looks for the site configuration file in $SILK_DATA_ROOTDIR/silk.conf.
- SILK_PATH
- This environment variable gives the root of the install tree. As part of its search for the SiLK site configuration file, rwsort checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share. These directories are also searched when any other configuration file is required (e.g., the country code map). In addition, rwsort looks for plug-ins in $SILK_PATH/lib/silk, $SILK_PATH/share/lib and $SILK_PATH/lib.
- SILK_DYNLIB_DEBUG
- When set to 1, rwsort print status messages to the standard error as it tries to open each of its plug-ins.
SEE ALSO
rwfilter(1), rwcut(1), rwuniq(1), addrtype(3), ccfilter(3), pmapfilter(3)
NOTES
If an output path is not specified, rwsort will write to the standard output unless it is connected to a terminal, in which case an error is printed and rwsort exits.
If an input pipe or a set of input files are not specified, rwsort will read records from the standard input unless it is connected to a terminal, in which case an error is printed and rwsort exits.
Note that rwsort produces binary output. Use rwcut(1) to view the records.
Do not spend the resources to sort the data if you are going to be passing it to an aggregation tool like rwtotal or rwaddrcount, which have their on internal data structures that will ingore the sorted data.
rwuniq(1) can take advantage of previously sorted data if it is instructed to do so with its --presorted-input switch.


