CERT/CC
background
background
CERT NetSA Security Suite 
Open Source Tools for Network Monitoring 
News | Documentation | Downloads
YAF 0.8.1 | NAF 0.6.0 | SiLK 1.0.1 | RAVE 1.9.9
fixbuf 0.7.3 | ipa 0.2.1 | airdbc 0.2.2 | airframe 0.7.2 | Portal 0.8.0
SiLK - Documentation - rwfilter
Documentation | Downloads | Release Notes | FAQ | License | Credits | Reference Data | Live CD


NAME

rwfilter - Choose which SiLK Flow records to process


SYNOPSIS

  rwfilter [--threads=N] [--dynamic-library=DYNLIB]
        [--pass-destination=PASS_PATH]
        [--fail-destination=FAIL_PATH] [--all-destination=ALL_PATH]
        [--input-pipe=INPUT_PATH] [--xargs=INPUT_STREAM]
        [{ --print-statistics | --print-volume-statistics }]
        [--print-filenames] [--print-missing-filenames]
        [--dry-run] [--max-pass-records=N]
        [--note-add=TEXT] [--note-file-add=FILE]
        [--compression-method=COMP_METHOD]
        [--start-date=YYYY/MM/DD[:HH] [--end-date=YYYY/MM/DD[:HH]]]
        [--class=CLASS] [--type={all | TYPE[,TYPE ...]}]
        [--sensors=SENSOR[,SENSOR ...]]
        [--data-rootdir=PATH] [--site-config-file=FILENAME]
        [--stime=DATE_RANGE] [--etime=DATE_RANGE]
        [--active-time=DATE_RANGE] [--duration=INTEGER_RANGE]
        [--sport=INTEGER_LIST] [--dport=INTEGER_LIST]
        [--aport=INTEGER_LIST] [--protocol=INTEGER_LIST]
        [--icmp-type=INTEGER_LIST] [--icmp-code=INTEGER_LIST]
        [--bytes=INTEGER_RANGE] [--packets=INTEGER_RANGE]
        [--bytes-per-packet=DECIMAL_RANGE]
        [{--saddress=IP_ADDR_MASK | --not-saddress=IP_ADDR_MASK}]
        [{--daddress=IP_ADDR_MASK | --not-daddress=IP_ADDR_MASK}]
        [{--any-address=IP_ADDR_MASK | --not-any-address=IP_ADDR_MASK}]
        [{--next-hop-id=IP_ADDR_MASK | --not-next-hop-id=IP_ADDR_MASK}]
        [{--sipset=IP_SET_FILENAME | --not-sipset=IP_SET_FILENAME}]
        [{--dipset=IP_SET_FILENAME | --not-dipset=IP_SET_FILENAME}]
        [{--anyset=IP_SET_FILENAME | --not-anyset=IP_SET_FILENAME}]
        [{--nhipset=IP_SET_FILENAME | --not-nhipset=IP_SET_FILENAME}]
        [--input-index=INTEGER_LIST] [--output-index=INTEGER_LIST]
        [--tcp-flags=TCP_FLAGS] [--flags-all=HIGH_MASK_FLAGS]
        [--fin-flag=SCALAR] [--syn-flag=SCALAR] [--rst-flag=SCALAR]
        [--psh-flag=SCALAR] [--ack-flag=SCALAR] [--urg-flag=SCALAR]
        [--ece-flag=SCALAR] [--cwr-flag=SCALAR]
        [--flags-initial=HIGH_MASK_FLAGS]
        [--flags-session=HIGH_MASK_FLAGS]
        [--attributes=ATTRIBUTES] [--application=INTEGER_LIST]
        [--ip-version=INTEGER_LIST]
        [--scc=COUNTRY_CODE_LIST] [--dcc=COUNTRY_CODE_LIST]
        [--stype=SCALAR] [--dtype=SCALAR]
        [--ippair-any=FILENAME] [--ipport-any=FILENAME]
        [--tuple-file=FILENAME { [--tuple-fields=FIELDS]
                                 [--tuple-direction=DIRECTION]
                                 [--tuple-delimiter=CHAR] } ]
        [--python-expr=PYTHON_EXPR] [--python-file=FILENAME]
        [--pmap-file=FILENAME { [--pmap-saddress=LABELS]
                                [--pmap-daddress=LABELS]
                                [--pmap-dport-proto=LABELS]
                                [--pmap-sport-proto=LABELS] } ]


DESCRIPTION

rwfilter serves two purposes: (1) It acts as an interface to the data store to select which SiLK Flow records to process, and (2) it partitions those records into one or more pass and/or fail streams.

The selection switches let one choose records by where the flow was collected (its sensor), the date of collection, and the flow's direction.

The partitioning switches describe various types of traffic behavior (e.g., TCP traffic, or all traffic going to port 80). rwfilter identifies records matching or violating the behavior(s), and partitions them into appropriate output streams (i.e., files) as specified.

These output streams from rwfilter are always binary. The output must be passed through another tool in the SiLK Tool Suite for further processing to get human-readable output.


OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

Output Switches

At least one of the following output switches must be provided:

--pass-destination=PASS_PATH
PASS_PATH refers to a non-existent file, a named pipe, or stdout. The pass-destination will output records which have passed ALL of the partitioning predicates.

--fail-destination=FAIL_PATH
FAIL_PATH refers to a non-existent file, a named pipe, or stdout. The fail-destination will output records which failed ANY of the partitioning predicates.

--all-destination=ALL_PATH
ALL_PATH refers to a file, a named pipe, or stdout. This output will output all records read by rwfilter.

--print-statistics
--print-statistics=PATH
Prints out the statistics on files read - the number of records which passed, the number which failed and the total read. If a PATH is provided, the statistics will be printed there; otherwise they are printed to the standard error.

--print-volume-statistics
--print-volume-statistics=PATH
An enhanced version of --print-statistics, in that the statistics include the number of records, packets, and bytes that passed and failed the filter.

Additional Switches

--threads=N
Invoke rwfilter with N threads reading the input files. When this switch is not provided, the value in the SILK_RWFILTER_THREADS environment variable is used. If that variable is not set, rwfilter runs with a single thread. Using multiple threads, performance of rwfilter is greatly improved for queries that look at many files but return few records. Preliminary testing has found that performance peaks around four threads per CPU, but performance will vary depending on the type of query and the number of records returned.

--input-pipe=INPUT_PATH
INPUT_PATH is a named pipe or the string stdin. This refers to another source of rwfilter records. Note that rwfilter will not read from the standard input by default, to get this behavior, you must use --input-pipe=stdin.

--xargs=INPUT_PATH
Causes rwfilter to read file names from INPUT_PATH; the input should have one file name per line. rwfilter will open each file in turn and read records from it.

--print-filenames
Print the names of input files as they are read

--dry-run
Perform a sanity check on the input arguments to check that the arguments are acceptable.

--max-pass-records=N
Stop reading input after N records have been written to the pass-destination.

--note-add=TEXT
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--compression-method=COMP_METHOD
Set the compression method of the output to COMP_METHOD. Some SiLK tools can use an external library to compress their binary output. The list of available compression methods and the default method are set when SiLK is compiled (the --help and --version switches print the available and default compression methods) and depend on which supported libraries are found. SiLK can support:
none
Do not compress the output using an external library

zlib
Use the zlib(3) library for compressing the output

lzo1x
Use the lzo1x algorithm from the LZO real time compression library for compression

best
Use whichever available method gives the best compression in general, though not necessarily the best for this particular output.

Selection Options

The following options determine which files are read from the data store to provide the records.

--start-date=YYYY/MM/DD[:HH]
--end-date=YYYY/MM/DD[:HH]
The date predicates indicate which time to start and end the search; these predicates are expressed in YYYY/MM/DD:HH format. In all cases, express values less than 10 with a zero, so 09 for 9, 08 for 8, and so on.

For example, 2003/01/18:00 represents the first hour of January 18th, 2003, while 2002/10/01:22 corresponds to 22:00 GMT on October 1st, 2002.

When the hour of the start-date is given and end-date is not specified, files for that single hour are processed.

When the hour of the start-date is not given, the hour of the end-date is ignored, and files for all dates between midnight on start-date and 23:59 on end-date are processed.

When --start-date is not given, rwfilter processes all files for the current day.

--class=CLASS
CLASS is used to select groups of data. Currently only a single class may be selected. If the --class option is not given, a class is selected by default. Use the --help option to see the list of available classes and the default class.

--type={all | TYPE[,TYPE ...]}
The --type predicate is used to further specify data by specifying the TYPE of traffic using the scheme for your deployment. TYPEs typically refer to the direction of the flow; TYPEs depend on the class and on the site where SiLK is installed. The switch takes a comma-separated list of types or the keyword all which specifies all types for the specified class. If the --type switch is not given, a list of default types is used. Use the --help option to get the list of available types for each class.

--sensors=SENSOR[,SENSOR ...]
Sensor is used to select data files from specific sensors. This is a comma separated list of sensor names and/or sensor IDs (integers) that will depend on your installation. If not given, the default is all sensors.

--data-rootdir=PATH
This option causes rwfilter to use PATH as the root of the data store directory, which overrides the location given in the SILK_DATA_ROOTDIR environment variable, which overrides the location that was compiled into rwfilter. The default data store directory is available via the --version option.

--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK_CONFIG_FILE environment variable is used if that variable is not empty. The value of SILK_CONFIG_FILE should include the name of the file. Otherwise, the application looks for a file named silk.conf in the following directories: the root of the data directory (see --data-rootdir); the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/; and the share/silk/ and share/ directories parallel to the application's directory.

--print-missing-files
This option prints to the standard error file names that the selection engine expected to find but did not. This list can be misleading, so use it judiciously.

Partitioning Switches

rwfilter supports the following partitioning switches, at least one of which must be specified. The switches are AND'ed together; i.e., to pass the filter, the record must pass the test implied by each switch. Any record that does not pass will be sent to the fail-destination(s), if specified.

SWITCH PARAMETERS

The forms of the parameters to these partitioning switches are:

SWITCHES

The switches are:

--stime=DATE_RANGE
Pass the record if its starting time is in this DATE_RANGE.

--etime=DATE_RANGE
As --stime for the ending time.

--active-time=DATE_RANGE
Pass the record if the record was active at ANY time during this DATE_RANGE. If a single time is specified, pass the record if it was active at that instant.

--duration=INTEGER_RANGE
Pass the record if its duration (eTime-sTime) is in this INTEGER_RANGE.

--sport=INTEGER_LIST
Pass the record if its source port is in this INTEGER_LIST, possible values are 0-65535.

--dport=INTEGER_LIST
Pass the record if its destination port is in this INTEGER_LIST, possible values are 0-65535

--aport=INTEGER_LIST
Pass the record if its source port and/or its destination port is in this INTEGER_LIST, possible values are 0-65535. For example, use --aport=25 to see all SMTP conversions regardless or where they originated.

--protocol=INTEGER_LIST
Pass the record if its IP Suite Protocol is in this INTEGER_LIST, possible values are 0-255.

--icmp-type=INTEGER_LIST
Pass the record if its ICMP type is in this INTEGER_LIST; possible values 0-255. This switch will act as if --protocol=1 has been specified; it is an error to specify any other values for the protocol.

--icmp-code=INTEGER_LIST
Pass the record if its ICMP code is in this INTEGER_LIST; possible values 0-255. This switch will act as if --protocol=1 has been specified; it is an error to specify any other values for the protocol.

--bytes=INTEGER_RANGE
Pass the record if its byte count is in this INTEGER_RANGE.

--packets=INTEGER_RANGE
Pass the record if its packet count is in this INTEGER_RANGE.

--bytes-per-packet=DECIMAL_RANGE
Pass the record if its average bytes per packet count (bytes/packet) is in this DECIMAL_RANGE.

--saddress=IP_ADDR_MASK
Pass the record if its source IP address is matched by this IP_ADDR_MASK. To match on multiple IPs, use an IPset (see --sipset).

--daddress=IP_ADDR_MASK
Pass the record if its destination IP address is matched by this IP_ADDR_MASK (see also --dipset).

--any-address=IP_ADDR_MASK
Pass the record if either its source or its destination IP address is matched by this IP_ADDR_MASK (see also --anyset). Does not consider the next-hop IP address.

--not-saddress=IP_ADDR_MASK
Pass the record if its source IP address is not matched by this IP_ADDR_MASK (see also --not-sipset).

--not-daddress=IP_ADDR_MASK
Pass the record if its destination IP address is not matched by this IP_ADDR_MASK (see also --not-dipset).

--not-any-address=IP_ADDR_MASK
Pass the record if neither its source nor its destination IP address is matched by this IP_ADDR_MASK (see also --not-anyset). Does not consider the next-hop IP address.

--sipset=IP_SET_FILENAME
Pass the record if its source IP address is in the list of IPs contained in the binary set file IP_SET_FILENAME

--dipset=IP_SET_FILENAME
As --sipset for the destination IP address.

--anyset=IP_SET_FILENAME
Pass the record if either its source IP address or its destination IP address is in the list of IPs contained in the binary set file IP_SET_FILENAME. Does not consider the next-hop IP.

--nhipset=IP_SET_FILENAME
As --sipset for the next-hop IP address.

--not-sipset=IP_SET_FILENAME
Pass the record if its source IP address is not in the list of IPs contained in the binary set file IP_SET_FILENAME

--not-dipset=IP_SET_FILENAME
As --not-sipset for the destination IP address.

--not-anyset=IP_SET_FILENAME
Pass the record if neither its source IP address nor its destination IP address is in the list of IPs contained in the binary set file IP_SET_FILENAME. Does not consider the next-hop IP.

--not-nhipset=IP_SET_FILENAME
As --not-sipset for the next-hop IP address.

--tcp-flags=TCP_FLAGS
Pass the record if, for any one of its packets, any of the specified TCP_FLAGS was on.

--flags-all=HIGH_MASK_FLAGS
HIGH_MASK_FLAGS is a set of HIGH_FLAGS/MASK_FLAGS; HIGH_FLAGS must be a subset of MASK_FLAGS. Pass the record if the flags listed in HIGH_FLAGS are set and the flags listed in MASK_FLAGS but not listed in HIGH_FLAGS are not-set. This switch may be repeated up to eight times, so that --flags-all=S/S --flags-all=A/A will pass flows that have either only-SYN high or only-ACK high.

--fin-flag=SCALAR
Set to 0, only passes records where the FIN Flag is Low, Set to 1, only passes records where the FIN Flag is high.

--syn-flag=SCALAR
As --fin-flag except for the SYN Flag

--rst-flag=SCALAR
As --fin-flag except for the RST Flag

--psh-flag=SCALAR
As --fin-flag except for the PSH Flag

--ack-flag=SCALAR
As --fin-flag except for the ACK Flag

--urg-flag=SCALAR
As --fin-flag except for the URG Flag

--ece-flag=SCALAR
As --fin-flag except for the ECE Flag

--cwr-flag=SCALAR
As --fin-flag except for the CWR Flag

--dynamic-library=DYNLIB
Augment the partitioning switches by using run-time loading of the plug-in (shared object) whose path is DYNLIB. The creation of thes plug-ins is beyond the scope of this manual page; the process is described in Analysts' Handbook: Using SiLK for Network Traffic Analysis. When multiple Partitioning Switches are given, the code specified by this plug-in will be last to be invoked. When DYNLIB contains a slash (/), rwfilter assumes the path to DYNLIB is correct. Otherwise, rwfilter will attempt to find the file in $SILK_PATH/lib/silk, $SILK_PATH/share/lib, $SILK_PATH/lib, and in these directories parallel to the application's directory: lib/silk, share/lib, and lib. If rwfilter does not find the file, it assumes the plug-in is in the current directory. To force rwfilter to look in the current directory first, specify --dynamic-library=./DYNLIB. When the SILK_DYNLIB_DEBUG environment variable is non-empty, rwfilter prints status messages to the standard error as it tries to open each of its plug-ins.

SiLK can store flows generated by enhanced collection software that provides more information than NetFlow v5. These flows may support some or all of these additional switches; for flows without this additional information, the field's value is always 0.

--flags-initial=HIGH_MASK_FLAGS
As --flags-all, except this switch considers only the initial packet in the flow.

--flags-session=HIGH_MASK_FLAGS
As --flags-all, except this switch ignores the initial packet in the flow.

--attributes=HIGH_ATTRIBUTES/CARE_ATTRIBUTES
Passes the flow if the attribute of the flow matches this ATTRIBUTE. Attributes are F,T,C; see above for a description of these values.

--application=INTEGER_LIST
Passes the flow if the application that the flow collection software assigned to the flow is in the specified INTEGER_LIST. Some flow generation software will guess the application based on the contents of the packets that make up the flow. This will be the standard port that application; for example, HTTP traffic on non-standard ports will have an application of 80.

--ip-version=INTEGER_LIST
Passes the flow if the IP Version is in the specified INTEGER_LIST. INTEGER_LIST can be 4, 6, or 4,6 when SiLK has been compiled with IPv6 support. If SiLK does not have IPv6 support, the only legal value for this switch is 4.

For the following three filter tests, some file formats do not store these values, in which case the value is always 0:

--next-hop-id=IP_ADDR_MASK
Pass the record if its next hop IP address is matched by this IP_ADDR_MASK.

--not-next-hop-id=IP_ADDR_MASK
Pass the record if its next hop IP address is not matched by this IP_ADDR_MASK.

--input-index=INTEGER_LIST
Pass the record if its incoming SNMP interface is in this INTEGER_LIST.

--output-index=INTEGER_LIST
Pass the record if its outgoing SNMP interface is in this INTEGER_LIST.

Additional filtering switches are provided by run-time loading of plug-ins (shared object files or dynamic libraries) when the plug-in is available. rwfilter automatically looks for the following plug-ins:

ADDRESS TYPE (addrtype.so)

--stype=SCALAR
When SCALAR is 0, pass the record if its source IP address is non-routable. When 1, pass if internal. When 2, pass if external (i.e., routable but not internal). When 3, pass if not internal (non-routable or external). See addrtype(3).

--dtype=SCALAR
As --stype for the destination IP address.

COUNTRY CODE (ccfilter.so)

--scc=COUNTRY_CODE_LIST
Pass the record if the country code of its source IP address is in the specified COUNTRY_CODE_LIST. See ccfilter(3).

--dcc=COUNTRY_CODE_LIST
As --scc for the destination IP address.

PREFIX MAP (pmapfilter.so)

--pmap-file=FILENAME
FILENAME refers to a prefixmap file generated using rwpmapbuild(1). This switch must precede all other --pmap-* switches. See pmapfilter(3).

--pmap-saddress=LABELS
For an IP prefix map, pass the record if the source IP address maps to a label contained in the list of labels in LABELS.

--pmap-daddress=LABELS
As --pmap-saddress for the destination IP address.

--pmap-sport-proto=LABELS
For a port/protocol map, pass the record if the source port and protocol combination maps to a label contained in the list of labels in LABELS.

--pmap-dport-proto=LABELS
As --pmap-sport-proto for destination port and protocol.

TUPLE (tuple.so)

This plug-in provides support for partitioning by arbitrary subsets of the basic five-tuple:

 {source-ip,destination-ip,source-port,destination-ip-port,protocol}

For the plug-in to pass the SiLK Flow record, the record's fields must match one of the tuples. Any subset of the five-tuple is supported, but the same subset must be used per invocation of rwfilter. The tuples are read from a text file containing lines of delimited fields. The default delimiter is |, but may be specified with the --tuple-delimiter switch. Each field contains one member of the tuple; the fields may appear in any order. If you want the field to match any value, it is best that you not include that field in your input. A field that is present but has no value will generate an error.

The IP fields may contain an IPv4 address, an integer, or a IP in CIDR block notation. Comma-separated lists (80,443) and ranges (0-1023,8080) are supported for the ports and protocol fields. Note that currently the code is not clever in its support for CIDR notation and ranges (each occurrence is fully expanded), and the memory required to hold the search tree can quickly grow.

In addition to the tuple-lines, FILENAME may contain blank lines and comments (which begin with # and continue to the end of the line).

The --tuple-fields switch must list the fields in FILENAME in the order in which they appear. When you do not specify the --tuple-fields switch, the plug-in will attempt to guess the fields from the first line in the input (a la rwtuc(1)), and exit if it cannot. If you do specify --tuple-fields, a title appearing on the first line will be ignored.

The --tuple-direction allows you to look for traffic in the reverse direction (or both directions) without having to write all of your rules twice.

--tuple-file=FILENAME
FILENAME refers to a file containing lines of delimited textual fields. This switch is required if the plug-in is to be used.

--tuple-fields=FIELDS
FIELDS contains the list of fields (columns) to parse. When this switch is not provided, the plug-in will attempt to parse the first line in the file to determine the fields. FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Names can be abbreviated to their shortest unique prefix. The field names and their descriptions are:
sIP,sip,1
source IP address

dIP,dip,2
destination IP address

sPort,sport,3
source port

dPort,dport,4
destination port

protocol,5
IP protocol

--tuple-direction=DIRECTION
Allows you to change the comparison between the tuple and the SiLK Flow record. The available directions are:
forward
The tuple's fields are compared against the corresponding fields on the flow; that is, sIP is compared with sIP, dIP with dIP, sPort with sPort, dPort with dPort, and protocol with protocol. This is the default.

reverse
The tuple's fields are compared against the opposite fields on the flow; that is, sIP is compared with dIP, dIP with sIP, sPort with dPort, dPort with sPort, and protocol with protocol.

both
Both of the above comparisons are performed.

--tuple-delimiter=CHAR
Specifies the character separating the input fields. When the switch is not provided, the default of | is used.

The following two switches are implemented in terms of the tuple plug-in and they are supported for backwards compatibility. They are deprecated and will be removed in a future release. These switches are incompatible with --tuple-file and with each other.

--ippair-any=FILENAME
Pass the record if the source IP and destination IP (in either order) match one of the IP-pairs listed in the text file FILENAME. Each line of FILENAME should contain two IP addresses separated by whitespace. This switch is equivalent to --tuple-file=FILENAME --tuple-fields=sIP,dIP --tuple-direction=both --tuple-delimiter=' '.

--ipport-any=FILENAME
Pass the record if either the source IP and port pair or the destination IP and port pair are listed in the text file FILENAME. Each line in FILENAME should contain an IP address and port list of interest for that IP separated by whitespace. The format of the IP address and port list may be any format supported by the plug-in.

PYTHON (python.so)

This plug-in provides support for filtering by expressions written in the Python programming language. Using Python, one can write complex expressions that cannot be written with a single rwfilter command line. See the SiLK in Python documentation for information on how to use Python to manipulate SiLK data structures.

When multiple Partitioning Switches are given, the Python plug-in will be the next-to-last to be invoked. Only the code specified by the --dynamic-library switch is called after the Python code.

--python-expr=PYTHON_EXPRESSION
Pass the record if the result of the processing the flow with the specified PYTHON_EXPRESSION is true. The expression is evaluated in the following context:
--python-file=FILENAME
Pass the record if the result of the processing the flow with the function named rwfilter in FILENAME is true. The function should take a single argument, which is a silk.RWRec object.


EXAMPLES

The most basic filtering involves looking at specific traffic over a specific time. For example:

  rwfilter --start-date=2003/02/19:00 --end-date=2003/02/19:23 \
        --pass=alltcp.rwf --proto=6

will create a file, alltcp.rwf containing all TCP traffic. This file contains SiLK Flow data in a binary format. To examine the contents, use the command rwcut(1).

Please note that the output file described above could be extremely large.

Once a file is written, rwfilter can filter the file again, for example:

  rwfilter --aport=80 alltcp.rwf --pass=allweb.rwf

will generate allweb.rwf. This progressive filtering can also be done at the command line, but the interim files can be examined with rwcut, rwuniq and other tools.

Multiple filters can be chained at the command line using pipes:

  rwfilter --start-date=2003/02/19:00 --end-date=2003/02/19:23 \
        --proto=6 --pass=stdout | \
        rwfilter --input-pipe=stdin --aport=80 --packets=1-5 \
        --pass=smallweb.rwf


ENVIRONMENT

SILK_RWFILTER_THREADS
The number of threads to use while reading input files or files selected from the data store.

PYTHONPATH
The Python module for rwfilter (python.so) is installed under SiLK's installation tree. It may be necessary to set or modify the PYTHONPATH environment variable so Python can find this module. For information on using Python from within rwfilter, see SiLK in Python.

SILK_DATA_ROOTDIR
When set, overrides the compiled-in value for the location of the directory tree containing the files of SiLK Flow records collected and stored by the packing system (rwflowpack(8)).

SILK_PATH
This environment variable gives the root of the install tree. As part of its search for the SiLK site configuration file, rwfilter checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share. These directories are also searched when any other configuration file is required (e.g., the country code map). In addition, rwfilter looks for plug-ins in $SILK_PATH/lib/silk, $SILK_PATH/share/lib and $SILK_PATH/lib.

SILK_DYNLIB_DEBUG
When set to 1, rwfilter prints status messages to the standard error as it tries to open each of its plug-ins.

SILK_LOGSTATS
When set to a non-empty value, rwfilter will treat the value as a program to execute with information about this rwfilter invocation. The arguments to the function are:
SILK_LOGSTATS_RWFILTER
If set, this environment variable overrides the value specified in SILK_LOGSTATS.


NOTES

rwfilter is the most commonly used application in the suite. It provides access to the data files and performs all the basic queries.

rwfilter supports a variety of I/O options - in addition to reading from the data store, rwfilter results can be chained together with named pipes to output results to multiple files simultaneously. An introduction to named pipes is outside the scope of this document, however.

Two often underused options are --dry-run and --print-statistics

--dry-run does a sanity check on the input arguments and should be used, especially for complicated arguments, to check that the arguments are acceptable.

--print-statistics used without --pass-destination or --fail-destination simply dumps aggregate statistics to stderr (not stdout) in the following format:

  File <#input files> Read <# of recs read> \
  Pass <# of recs passing the filter> \
  Fail <# of recs failing the filter>

and can be used to do a quick pass through the data to get aggregate counts before going in deeper into the phenomenon being investigated.

--print-filename can be used as a progress meter; during long jobs, it shows which file is currently being read by the application. --print-filename will not provide meaningful results with piped input.

Filters are applied in the order given on the command line. It is best to apply the biggest filters first.

The switches used to create a filter output file are stored in the file itself. Use the rwfileinfo(1) command to see this information.


SEE ALSO

silk(7), Analysts' Handbook: Using SiLK for Network Traffic Analysis, SiLK in Python, rwcount(1), rwcut(1), rwfileinfo(1), rwset(1), rwsort(1), rwstats(1), rwtotal(1), rwuniq(1), rwsetbuild(1), addrtype(3), ccfilter(3), pmapfilter(3)