NAME
rwfilter - Choose which SiLK Flow records to process
SYNOPSIS
rwfilter [--threads=N] [--dynamic-library=DYNLIB]
[--pass-destination=PASS_PATH]
[--fail-destination=FAIL_PATH] [--all-destination=ALL_PATH]
[--input-pipe=INPUT_PATH] [--xargs=INPUT_STREAM]
[{ --print-statistics | --print-volume-statistics }]
[--print-filenames] [--print-missing-filenames]
[--dry-run] [--max-pass-records=N]
[--note-add=TEXT] [--note-file-add=FILE]
[--compression-method=COMP_METHOD]
[--start-date=YYYY/MM/DD[:HH] [--end-date=YYYY/MM/DD[:HH]]]
[--class=CLASS] [--type={all | TYPE[,TYPE ...]}]
[--sensors=SENSOR[,SENSOR ...]]
[--data-rootdir=PATH] [--site-config-file=FILENAME]
[--stime=DATE_RANGE] [--etime=DATE_RANGE]
[--active-time=DATE_RANGE] [--duration=INTEGER_RANGE]
[--sport=INTEGER_LIST] [--dport=INTEGER_LIST]
[--aport=INTEGER_LIST] [--protocol=INTEGER_LIST]
[--icmp-type=INTEGER_LIST] [--icmp-code=INTEGER_LIST]
[--bytes=INTEGER_RANGE] [--packets=INTEGER_RANGE]
[--bytes-per-packet=DECIMAL_RANGE]
[{--saddress=IP_ADDR_MASK | --not-saddress=IP_ADDR_MASK}]
[{--daddress=IP_ADDR_MASK | --not-daddress=IP_ADDR_MASK}]
[{--any-address=IP_ADDR_MASK | --not-any-address=IP_ADDR_MASK}]
[{--next-hop-id=IP_ADDR_MASK | --not-next-hop-id=IP_ADDR_MASK}]
[{--sipset=IP_SET_FILENAME | --not-sipset=IP_SET_FILENAME}]
[{--dipset=IP_SET_FILENAME | --not-dipset=IP_SET_FILENAME}]
[{--anyset=IP_SET_FILENAME | --not-anyset=IP_SET_FILENAME}]
[{--nhipset=IP_SET_FILENAME | --not-nhipset=IP_SET_FILENAME}]
[--input-index=INTEGER_LIST] [--output-index=INTEGER_LIST]
[--tcp-flags=TCP_FLAGS] [--flags-all=HIGH_MASK_FLAGS]
[--fin-flag=SCALAR] [--syn-flag=SCALAR] [--rst-flag=SCALAR]
[--psh-flag=SCALAR] [--ack-flag=SCALAR] [--urg-flag=SCALAR]
[--ece-flag=SCALAR] [--cwr-flag=SCALAR]
[--flags-initial=HIGH_MASK_FLAGS]
[--flags-session=HIGH_MASK_FLAGS]
[--attributes=ATTRIBUTES] [--application=INTEGER_LIST]
[--ip-version=INTEGER_LIST]
[--scc=COUNTRY_CODE_LIST] [--dcc=COUNTRY_CODE_LIST]
[--stype=SCALAR] [--dtype=SCALAR]
[--ippair-any=FILENAME] [--ipport-any=FILENAME]
[--tuple-file=FILENAME { [--tuple-fields=FIELDS]
[--tuple-direction=DIRECTION]
[--tuple-delimiter=CHAR] } ]
[--python-expr=PYTHON_EXPR] [--python-file=FILENAME]
[--pmap-file=FILENAME { [--pmap-saddress=LABELS]
[--pmap-daddress=LABELS]
[--pmap-dport-proto=LABELS]
[--pmap-sport-proto=LABELS] } ]
DESCRIPTION
rwfilter serves two purposes: (1) It acts as an interface to the data store to select which SiLK Flow records to process, and (2) it partitions those records into one or more pass and/or fail streams.
The selection switches let one choose records by where the flow was collected (its sensor), the date of collection, and the flow's direction.
The partitioning switches describe various types of traffic behavior (e.g., TCP traffic, or all traffic going to port 80). rwfilter identifies records matching or violating the behavior(s), and partitions them into appropriate output streams (i.e., files) as specified.
These output streams from rwfilter are always binary. The output must be passed through another tool in the SiLK Tool Suite for further processing to get human-readable output.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
Output Switches
At least one of the following output switches must be provided:
- --pass-destination=PASS_PATH
-
PASS_PATH refers to a non-existent file, a named pipe, or
stdout. The pass-destination will output records which have passed ALL of the partitioning predicates. - --fail-destination=FAIL_PATH
-
FAIL_PATH refers to a non-existent file, a named pipe, or
stdout. The fail-destination will output records which failed ANY of the partitioning predicates. - --all-destination=ALL_PATH
-
ALL_PATH refers to a file, a named pipe, or
stdout. This output will output all records read by rwfilter. - --print-statistics
- --print-statistics=PATH
- Prints out the statistics on files read - the number of records which passed, the number which failed and the total read. If a PATH is provided, the statistics will be printed there; otherwise they are printed to the standard error.
- --print-volume-statistics
- --print-volume-statistics=PATH
- An enhanced version of --print-statistics, in that the statistics include the number of records, packets, and bytes that passed and failed the filter.
Additional Switches
- --threads=N
- Invoke rwfilter with N threads reading the input files. When this switch is not provided, the value in the SILK_RWFILTER_THREADS environment variable is used. If that variable is not set, rwfilter runs with a single thread. Using multiple threads, performance of rwfilter is greatly improved for queries that look at many files but return few records. Preliminary testing has found that performance peaks around four threads per CPU, but performance will vary depending on the type of query and the number of records returned.
- --input-pipe=INPUT_PATH
-
INPUT_PATH is a named pipe or the string
stdin. This refers to another source of rwfilter records. Note that rwfilter will not read from the standard input by default, to get this behavior, you must use --input-pipe=stdin. - --xargs=INPUT_PATH
- Causes rwfilter to read file names from INPUT_PATH; the input should have one file name per line. rwfilter will open each file in turn and read records from it.
- --print-filenames
- Print the names of input files as they are read
- --dry-run
- Perform a sanity check on the input arguments to check that the arguments are acceptable.
- --max-pass-records=N
- Stop reading input after N records have been written to the pass-destination.
- --note-add=TEXT
- Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
- --note-file-add=FILENAME
- Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.
- --compression-method=COMP_METHOD
- Set the compression method of the output to COMP_METHOD. Some SiLK tools can use an external library to compress their binary output. The list of available compression methods and the default method are set when SiLK is compiled (the --help and --version switches print the available and default compression methods) and depend on which supported libraries are found. SiLK can support:
- none
- Do not compress the output using an external library
- zlib
- Use the zlib(3) library for compressing the output
- lzo1x
- Use the lzo1x algorithm from the LZO real time compression library for compression
- best
-
Use whichever available method gives the
bestcompression in general, though not necessarily thebestfor this particular output.
Selection Options
The following options determine which files are read from the data store to provide the records.
- --start-date=YYYY/MM/DD[:HH]
- --end-date=YYYY/MM/DD[:HH]
-
The date predicates indicate which time to start and end the search;
these predicates are expressed in
YYYY/MM/DD:HHformat. In all cases, express values less than 10 with a zero, so 09 for 9, 08 for 8, and so on. -
For example,
2003/01/18:00represents the first hour of January 18th, 2003, while2002/10/01:22corresponds to 22:00 GMT on October 1st, 2002. -
When the hour of the start-date is given and end-date is not specified, files for that single hour are processed.
-
When the hour of the start-date is not given, the hour of the end-date is ignored, and files for all dates between midnight on start-date and 23:59 on end-date are processed.
-
When --start-date is not given, rwfilter processes all files for the current day.
- --class=CLASS
- CLASS is used to select groups of data. Currently only a single class may be selected. If the --class option is not given, a class is selected by default. Use the --help option to see the list of available classes and the default class.
- --type={
all| TYPE[,TYPE ...]}
-
The --type predicate is used to further specify data by specifying
the TYPE of traffic using the scheme for your deployment. TYPEs
typically refer to the direction of the flow; TYPEs depend on the
class and on the site where SiLK is installed. The switch takes a
comma-separated list of types or the keyword
allwhich specifies all types for the specified class. If the --type switch is not given, a list of default types is used. Use the --help option to get the list of available types for each class. - --sensors=SENSOR[,SENSOR ...]
- Sensor is used to select data files from specific sensors. This is a comma separated list of sensor names and/or sensor IDs (integers) that will depend on your installation. If not given, the default is all sensors.
- --data-rootdir=PATH
- This option causes rwfilter to use PATH as the root of the data store directory, which overrides the location given in the SILK_DATA_ROOTDIR environment variable, which overrides the location that was compiled into rwfilter. The default data store directory is available via the --version option.
- --site-config-file=FILENAME
- Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK_CONFIG_FILE environment variable is used if that variable is not empty. The value of SILK_CONFIG_FILE should include the name of the file. Otherwise, the application looks for a file named silk.conf in the following directories: the root of the data directory (see --data-rootdir); the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/; and the share/silk/ and share/ directories parallel to the application's directory.
- --print-missing-files
- This option prints to the standard error file names that the selection engine expected to find but did not. This list can be misleading, so use it judiciously.
Partitioning Switches
rwfilter supports the following partitioning switches, at least one of which must be specified. The switches are AND'ed together; i.e., to pass the filter, the record must pass the test implied by each switch. Any record that does not pass will be sent to the fail-destination(s), if specified.
SWITCH PARAMETERS
The forms of the parameters to these partitioning switches are:
-
DATE_RANGE is a range of two dates, start-range and end-range, each
in the form
YYYY/MM/DD[:HH[:MM[:SS[.ssssss]]]], for example
2003/01/31:23:45:00.000-2003/01/31:23:59:59.999 represents the last
fifteen minutes of Jan 31, 2003. The start-range and end-range must
be set to at least day precision. For the start-range, unspecified
hour, minute, second, and millisecond values set to 0; for the
end-range, those values are set to 23, 59, 59, and 999 respectively.
Thus 2003/01/31:23-2003/01/31:23 will become
2003/01/31:23:00:00.000-2003/01/31:23:59:59.999. If an end-range
is not given, it is set to the start-range, giving a range of a single
millisecond.
SCALAR is a single integer; for example 4.
INTEGER_RANGE is a range of two positive integers: MIN-MAX;
for example 1-500. For many options, the upper limit of the range
may be omitted, such as 1-.
INTEGER_LIST is a comma separated list of SCALARs and
INTEGER_RANGEs; for example, 1,2,3,5-10,99-103.
DECIMAL_RANGE is a range of decimal values with accuracy up to
10^-4 expressed as MIN-MAX; for example, 5.0-10.031.
IP_ADDR_MASK are expressed in one of two forms. As CIDR blocks
(192.168.0.0/16) or as four INTEGER_LISTs joined by dot .. The
character x can be used as an abbreviation for 0-255. For example,
10.10,16-31.x.x represents the following CIDR blocks:
10.10.0.0/16 10.16.0.0/20TCP_FLAGS is any combination of the letters
F,S,R,P,A,U,E,C, where F=FIN flag;
S=SYN; R=RST; P=PSH; A=ACK; U=URG; E=ECE; C=CWR
HIGH_MASK_FLAGS is a pair of TCP_FLAGS strings separated
by a slash (/). Flags to the right of the slash are the
mask; any flag not listed in the mask may have any value
in the flow. Flags to the left of the slash are the
expected high flags; they must be set in the flow.
Thus, flags listed in mask but not in high must be off
for all packets in the flow. It is an error if a flag is
listed in high but not in mask. Some examples:
AS/ASFR
- means ACK,SYN must be high, FIN,RST must be low, and the other flags (PSH, URG, ECE, CWR) may have any value.
A/A
- means the ACK packet must be SET. All other flags may have any value.
/F
- means the FIN packet must be OFF. All other flags may have any value.
F/S
-
is an error; use
F/FSinstead, which means FIN must be high, SYN must be low, and other flags can have any value.
--
- N/A (e.g. private and experimental reserved addresses)
a1
- anonymous proxy
a2
- satellite provider
o1
- other
An example: cx,uk,kr,jp,--
F,T,C, where
F
- collector saw additional packets in the flow following a packet with a FIN flag (excluding ACK)
T
- collector prematurely created a record for a long-running connection due to a timeout
C
- this flow is a continuation of flow that the collector was forced to close due to a timeout
SWITCHES
The switches are:
- --stime=DATE_RANGE
- Pass the record if its starting time is in this DATE_RANGE.
- --etime=DATE_RANGE
- As --stime for the ending time.
- --active-time=DATE_RANGE
- Pass the record if the record was active at ANY time during this DATE_RANGE. If a single time is specified, pass the record if it was active at that instant.
- --duration=INTEGER_RANGE
- Pass the record if its duration (eTime-sTime) is in this INTEGER_RANGE.
- --sport=INTEGER_LIST
- Pass the record if its source port is in this INTEGER_LIST, possible values are 0-65535.
- --dport=INTEGER_LIST
- Pass the record if its destination port is in this INTEGER_LIST, possible values are 0-65535
- --aport=INTEGER_LIST
- Pass the record if its source port and/or its destination port is in this INTEGER_LIST, possible values are 0-65535. For example, use --aport=25 to see all SMTP conversions regardless or where they originated.
- --protocol=INTEGER_LIST
- Pass the record if its IP Suite Protocol is in this INTEGER_LIST, possible values are 0-255.
- --icmp-type=INTEGER_LIST
-
Pass the record if its ICMP type is in this INTEGER_LIST; possible
values 0-255. This switch will act as if
--protocol=1has been specified; it is an error to specify any other values for the protocol. - --icmp-code=INTEGER_LIST
-
Pass the record if its ICMP code is in this INTEGER_LIST; possible
values 0-255. This switch will act as if
--protocol=1has been specified; it is an error to specify any other values for the protocol. - --bytes=INTEGER_RANGE
- Pass the record if its byte count is in this INTEGER_RANGE.
- --packets=INTEGER_RANGE
- Pass the record if its packet count is in this INTEGER_RANGE.
- --bytes-per-packet=DECIMAL_RANGE
- Pass the record if its average bytes per packet count (bytes/packet) is in this DECIMAL_RANGE.
- --saddress=IP_ADDR_MASK
- Pass the record if its source IP address is matched by this IP_ADDR_MASK. To match on multiple IPs, use an IPset (see --sipset).
- --daddress=IP_ADDR_MASK
- Pass the record if its destination IP address is matched by this IP_ADDR_MASK (see also --dipset).
- --any-address=IP_ADDR_MASK
- Pass the record if either its source or its destination IP address is matched by this IP_ADDR_MASK (see also --anyset). Does not consider the next-hop IP address.
- --not-saddress=IP_ADDR_MASK
- Pass the record if its source IP address is not matched by this IP_ADDR_MASK (see also --not-sipset).
- --not-daddress=IP_ADDR_MASK
- Pass the record if its destination IP address is not matched by this IP_ADDR_MASK (see also --not-dipset).
- --not-any-address=IP_ADDR_MASK
- Pass the record if neither its source nor its destination IP address is matched by this IP_ADDR_MASK (see also --not-anyset). Does not consider the next-hop IP address.
- --sipset=IP_SET_FILENAME
- Pass the record if its source IP address is in the list of IPs contained in the binary set file IP_SET_FILENAME
- --dipset=IP_SET_FILENAME
- As --sipset for the destination IP address.
- --anyset=IP_SET_FILENAME
- Pass the record if either its source IP address or its destination IP address is in the list of IPs contained in the binary set file IP_SET_FILENAME. Does not consider the next-hop IP.
- --nhipset=IP_SET_FILENAME
- As --sipset for the next-hop IP address.
- --not-sipset=IP_SET_FILENAME
- Pass the record if its source IP address is not in the list of IPs contained in the binary set file IP_SET_FILENAME
- --not-dipset=IP_SET_FILENAME
- As --not-sipset for the destination IP address.
- --not-anyset=IP_SET_FILENAME
- Pass the record if neither its source IP address nor its destination IP address is in the list of IPs contained in the binary set file IP_SET_FILENAME. Does not consider the next-hop IP.
- --not-nhipset=IP_SET_FILENAME
- As --not-sipset for the next-hop IP address.
- --tcp-flags=TCP_FLAGS
- Pass the record if, for any one of its packets, any of the specified TCP_FLAGS was on.
- --flags-all=HIGH_MASK_FLAGS
-
HIGH_MASK_FLAGS is a set of HIGH_FLAGS/MASK_FLAGS;
HIGH_FLAGS must be a subset of MASK_FLAGS. Pass the record if
the flags listed in HIGH_FLAGS are set and the flags listed in
MASK_FLAGS but not listed in HIGH_FLAGS are not-set. This
switch may be repeated up to eight times, so that
--flags-all=S/S --flags-all=A/Awill pass flows that have either only-SYN high or only-ACK high. - --fin-flag=SCALAR
- Set to 0, only passes records where the FIN Flag is Low, Set to 1, only passes records where the FIN Flag is high.
- --syn-flag=SCALAR
- As --fin-flag except for the SYN Flag
- --rst-flag=SCALAR
- As --fin-flag except for the RST Flag
- --psh-flag=SCALAR
- As --fin-flag except for the PSH Flag
- --ack-flag=SCALAR
- As --fin-flag except for the ACK Flag
- --urg-flag=SCALAR
- As --fin-flag except for the URG Flag
- --ece-flag=SCALAR
- As --fin-flag except for the ECE Flag
- --cwr-flag=SCALAR
- As --fin-flag except for the CWR Flag
- --dynamic-library=DYNLIB
-
Augment the partitioning switches by using run-time loading of the
plug-in (shared object) whose path is DYNLIB. The creation of thes
plug-ins is beyond the scope of this manual page; the process is
described in Analysts' Handbook: Using SiLK for Network Traffic
Analysis. When multiple Partitioning Switches are given, the code
specified by this plug-in will be last to be invoked. When DYNLIB
contains a slash (
/), rwfilter assumes the path to DYNLIB is correct. Otherwise, rwfilter will attempt to find the file in $SILK_PATH/lib/silk, $SILK_PATH/share/lib, $SILK_PATH/lib, and in these directories parallel to the application's directory: lib/silk, share/lib, and lib. If rwfilter does not find the file, it assumes the plug-in is in the current directory. To force rwfilter to look in the current directory first, specify --dynamic-library=./DYNLIB. When the SILK_DYNLIB_DEBUG environment variable is non-empty, rwfilter prints status messages to the standard error as it tries to open each of its plug-ins.
SiLK can store flows generated by enhanced collection software that provides more information than NetFlow v5. These flows may support some or all of these additional switches; for flows without this additional information, the field's value is always 0.
- --flags-initial=HIGH_MASK_FLAGS
- As --flags-all, except this switch considers only the initial packet in the flow.
- --flags-session=HIGH_MASK_FLAGS
- As --flags-all, except this switch ignores the initial packet in the flow.
- --attributes=HIGH_ATTRIBUTES/CARE_ATTRIBUTES
-
Passes the flow if the attribute of the flow matches this
ATTRIBUTE. Attributes are
F,T,C; see above for a description of these values. - --application=INTEGER_LIST
-
Passes the flow if the application that the flow collection software
assigned to the flow is in the specified INTEGER_LIST. Some flow
generation software will guess the application based on the contents
of the packets that make up the flow. This will be the standard port
that application; for example, HTTP traffic on non-standard ports will
have an application of
80. - --ip-version=INTEGER_LIST
-
Passes the flow if the IP Version is in the specified INTEGER_LIST.
INTEGER_LIST can be
4,6, or4,6when SiLK has been compiled with IPv6 support. If SiLK does not have IPv6 support, the only legal value for this switch is4.
For the following three filter tests, some file formats do not store these values, in which case the value is always 0:
- --next-hop-id=IP_ADDR_MASK
- Pass the record if its next hop IP address is matched by this IP_ADDR_MASK.
- --not-next-hop-id=IP_ADDR_MASK
- Pass the record if its next hop IP address is not matched by this IP_ADDR_MASK.
- --input-index=INTEGER_LIST
- Pass the record if its incoming SNMP interface is in this INTEGER_LIST.
- --output-index=INTEGER_LIST
- Pass the record if its outgoing SNMP interface is in this INTEGER_LIST.
Additional filtering switches are provided by run-time loading of plug-ins (shared object files or dynamic libraries) when the plug-in is available. rwfilter automatically looks for the following plug-ins:
ADDRESS TYPE (addrtype.so)
- --stype=SCALAR
- When SCALAR is 0, pass the record if its source IP address is non-routable. When 1, pass if internal. When 2, pass if external (i.e., routable but not internal). When 3, pass if not internal (non-routable or external). See addrtype(3).
- --dtype=SCALAR
- As --stype for the destination IP address.
COUNTRY CODE (ccfilter.so)
- --scc=COUNTRY_CODE_LIST
- Pass the record if the country code of its source IP address is in the specified COUNTRY_CODE_LIST. See ccfilter(3).
- --dcc=COUNTRY_CODE_LIST
- As --scc for the destination IP address.
PREFIX MAP (pmapfilter.so)
- --pmap-file=FILENAME
- FILENAME refers to a prefixmap file generated using rwpmapbuild(1). This switch must precede all other --pmap-* switches. See pmapfilter(3).
- --pmap-saddress=LABELS
- For an IP prefix map, pass the record if the source IP address maps to a label contained in the list of labels in LABELS.
- --pmap-daddress=LABELS
- As --pmap-saddress for the destination IP address.
- --pmap-sport-proto=LABELS
- For a port/protocol map, pass the record if the source port and protocol combination maps to a label contained in the list of labels in LABELS.
- --pmap-dport-proto=LABELS
- As --pmap-sport-proto for destination port and protocol.
TUPLE (tuple.so)
This plug-in provides support for partitioning by arbitrary subsets of the basic five-tuple:
{source-ip,destination-ip,source-port,destination-ip-port,protocol}
For the plug-in to pass the SiLK Flow record, the record's fields must
match one of the tuples. Any subset of the five-tuple is supported,
but the same subset must be used per invocation of rwfilter. The
tuples are read from a text file containing lines of delimited fields.
The default delimiter is |, but may be specified with the
--tuple-delimiter switch. Each field contains one member of the
tuple; the fields may appear in any order. If you want the field to
match any value, it is best that you not include that field in your
input. A field that is present but has no value will generate an
error.
The IP fields may contain an IPv4 address, an integer, or a IP in CIDR
block notation. Comma-separated lists (80,443) and ranges
(0-1023,8080) are supported for the ports and protocol fields.
Note that currently the code is not clever in its support for CIDR
notation and ranges (each occurrence is fully expanded), and the
memory required to hold the search tree can quickly grow.
In addition to the tuple-lines, FILENAME may contain blank lines
and comments (which begin with # and continue to the end of the
line).
The --tuple-fields switch must list the fields in FILENAME in the order in which they appear. When you do not specify the --tuple-fields switch, the plug-in will attempt to guess the fields from the first line in the input (a la rwtuc(1)), and exit if it cannot. If you do specify --tuple-fields, a title appearing on the first line will be ignored.
The --tuple-direction allows you to look for traffic in the reverse direction (or both directions) without having to write all of your rules twice.
- --tuple-file=FILENAME
- FILENAME refers to a file containing lines of delimited textual fields. This switch is required if the plug-in is to be used.
- --tuple-fields=FIELDS
- FIELDS contains the list of fields (columns) to parse. When this switch is not provided, the plug-in will attempt to parse the first line in the file to determine the fields. FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Names can be abbreviated to their shortest unique prefix. The field names and their descriptions are:
- sIP,sip,1
- source IP address
- dIP,dip,2
- destination IP address
- sPort,sport,3
- source port
- dPort,dport,4
- destination port
- protocol,5
- IP protocol
- --tuple-direction=DIRECTION
- Allows you to change the comparison between the tuple and the SiLK Flow record. The available directions are:
- forward
- The tuple's fields are compared against the corresponding fields on the flow; that is, sIP is compared with sIP, dIP with dIP, sPort with sPort, dPort with dPort, and protocol with protocol. This is the default.
- reverse
- The tuple's fields are compared against the opposite fields on the flow; that is, sIP is compared with dIP, dIP with sIP, sPort with dPort, dPort with sPort, and protocol with protocol.
- both
- Both of the above comparisons are performed.
- --tuple-delimiter=CHAR
-
Specifies the character separating the input fields. When the switch
is not provided, the default of
|is used.
The following two switches are implemented in terms of the tuple plug-in and they are supported for backwards compatibility. They are deprecated and will be removed in a future release. These switches are incompatible with --tuple-file and with each other.
- --ippair-any=FILENAME
- Pass the record if the source IP and destination IP (in either order) match one of the IP-pairs listed in the text file FILENAME. Each line of FILENAME should contain two IP addresses separated by whitespace. This switch is equivalent to --tuple-file=FILENAME --tuple-fields=sIP,dIP --tuple-direction=both --tuple-delimiter=' '.
- --ipport-any=FILENAME
- Pass the record if either the source IP and port pair or the destination IP and port pair are listed in the text file FILENAME. Each line in FILENAME should contain an IP address and port list of interest for that IP separated by whitespace. The format of the IP address and port list may be any format supported by the plug-in.
PYTHON (python.so)
This plug-in provides support for filtering by expressions written in the Python programming language. Using Python, one can write complex expressions that cannot be written with a single rwfilter command line. See the SiLK in Python documentation for information on how to use Python to manipulate SiLK data structures.
When multiple Partitioning Switches are given, the Python plug-in will be the next-to-last to be invoked. Only the code specified by the --dynamic-library switch is called after the Python code.
- --python-expr=PYTHON_EXPRESSION
- Pass the record if the result of the processing the flow with the specified PYTHON_EXPRESSION is true. The expression is evaluated in the following context:
- --python-file=FILENAME
-
Pass the record if the result of the processing the flow with the
function named
rwfilterin FILENAME is true. The function should take a single argument, which is asilk.RWRecobject.
-
The record is represented by the variable named
rec.
There is an implicit from silk import * in effect.
EXAMPLES
The most basic filtering involves looking at specific traffic over a specific time. For example:
rwfilter --start-date=2003/02/19:00 --end-date=2003/02/19:23 \
--pass=alltcp.rwf --proto=6
will create a file, alltcp.rwf containing all TCP traffic. This file contains SiLK Flow data in a binary format. To examine the contents, use the command rwcut(1).
Please note that the output file described above could be extremely large.
Once a file is written, rwfilter can filter the file again, for example:
rwfilter --aport=80 alltcp.rwf --pass=allweb.rwf
will generate allweb.rwf. This progressive filtering can also be done at the command line, but the interim files can be examined with rwcut, rwuniq and other tools.
Multiple filters can be chained at the command line using pipes:
rwfilter --start-date=2003/02/19:00 --end-date=2003/02/19:23 \
--proto=6 --pass=stdout | \
rwfilter --input-pipe=stdin --aport=80 --packets=1-5 \
--pass=smallweb.rwf
ENVIRONMENT
- SILK_RWFILTER_THREADS
- The number of threads to use while reading input files or files selected from the data store.
- PYTHONPATH
- The Python module for rwfilter (python.so) is installed under SiLK's installation tree. It may be necessary to set or modify the PYTHONPATH environment variable so Python can find this module. For information on using Python from within rwfilter, see SiLK in Python.
- SILK_DATA_ROOTDIR
- When set, overrides the compiled-in value for the location of the directory tree containing the files of SiLK Flow records collected and stored by the packing system (rwflowpack(8)).
- SILK_PATH
- This environment variable gives the root of the install tree. As part of its search for the SiLK site configuration file, rwfilter checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share. These directories are also searched when any other configuration file is required (e.g., the country code map). In addition, rwfilter looks for plug-ins in $SILK_PATH/lib/silk, $SILK_PATH/share/lib and $SILK_PATH/lib.
- SILK_DYNLIB_DEBUG
- When set to 1, rwfilter prints status messages to the standard error as it tries to open each of its plug-ins.
- SILK_LOGSTATS
- When set to a non-empty value, rwfilter will treat the value as a program to execute with information about this rwfilter invocation. The arguments to the function are:
- SILK_LOGSTATS_RWFILTER
- If set, this environment variable overrides the value specified in SILK_LOGSTATS.
-
The application name, i.e.,
rwfiler. Note that rwfilter is
always used as this argument, regardless of the name of the
executable.
The version number of this command line, currently v0001.
The start time of this invocation, as seconds since the UNIX epoch.
The end time of this invocation, as seconds since the UNIX epoch.
The number of data files opened for reading.
The number of records read.
The number of records written.
A variable number of arguments that are the complete command line used
to invoke rwfilter, including the name of the executable.
NOTES
rwfilter is the most commonly used application in the suite. It provides access to the data files and performs all the basic queries.
rwfilter supports a variety of I/O options - in addition to reading from the data store, rwfilter results can be chained together with named pipes to output results to multiple files simultaneously. An introduction to named pipes is outside the scope of this document, however.
Two often underused options are --dry-run and --print-statistics
--dry-run does a sanity check on the input arguments and should be used, especially for complicated arguments, to check that the arguments are acceptable.
--print-statistics used without --pass-destination or --fail-destination simply dumps aggregate statistics to stderr (not stdout) in the following format:
File <#input files> Read <# of recs read> \ Pass <# of recs passing the filter> \ Fail <# of recs failing the filter>
and can be used to do a quick pass through the data to get aggregate counts before going in deeper into the phenomenon being investigated.
--print-filename can be used as a progress meter; during long jobs, it shows which file is currently being read by the application. --print-filename will not provide meaningful results with piped input.
Filters are applied in the order given on the command line. It is best to apply the biggest filters first.
The switches used to create a filter output file are stored in the file itself. Use the rwfileinfo(1) command to see this information.
SEE ALSO
silk(7), Analysts' Handbook: Using SiLK for Network Traffic Analysis, SiLK in Python, rwcount(1), rwcut(1), rwfileinfo(1), rwset(1), rwsort(1), rwstats(1), rwtotal(1), rwuniq(1), rwsetbuild(1), addrtype(3), ccfilter(3), pmapfilter(3)


