NAME
rwfilter - Choose which SiLK Flow records to process
SYNOPSIS
rwfilter [--threads=N] [--plugin=PLUGIN [--plugin=PLUGIN ...]]
[--pass-destination=PASS_PATH]
[--fail-destination=FAIL_PATH] [--all-destination=ALL_PATH]
[--input-pipe=INPUT_PATH] [--xargs=INPUT_STREAM]
[{ --print-statistics | --print-volume-statistics }]
[--print-filenames] [--print-missing-filenames]
[--dry-run] [--max-pass-records=N] [--max-fail-records=N]
[--note-add=TEXT] [--note-file-add=FILE]
[--compression-method=COMP_METHOD]
[--start-date=YYYY/MM/DD[:HH] [--end-date=YYYY/MM/DD[:HH]]]
{ [--class=CLASS] [--type={all | TYPE[,TYPE ...]}]
| [--flowtype=CLASS/TYPE[,CLASS/TYPE ...]] }
[--sensors=SENSOR[,SENSOR ...]]
[--data-rootdir=PATH] [--site-config-file=FILENAME]
[--stime=DATE_RANGE] [--etime=DATE_RANGE]
[--active-time=DATE_RANGE] [--duration=DECIMAL_RANGE]
[--sport=INTEGER_LIST] [--dport=INTEGER_LIST]
[--aport=INTEGER_LIST] [--protocol=INTEGER_LIST]
[--icmp-type=INTEGER_LIST] [--icmp-code=INTEGER_LIST]
[--bytes=INTEGER_RANGE] [--packets=INTEGER_RANGE]
[--bytes-per-packet=DECIMAL_RANGE]
[{--saddress=IP_ADDR_MASK | --not-saddress=IP_ADDR_MASK}]
[{--daddress=IP_ADDR_MASK | --not-daddress=IP_ADDR_MASK}]
[{--any-address=IP_ADDR_MASK | --not-any-address=IP_ADDR_MASK}]
[{--next-hop-id=IP_ADDR_MASK | --not-next-hop-id=IP_ADDR_MASK}]
[{--sipset=IP_SET_FILENAME | --not-sipset=IP_SET_FILENAME}]
[{--dipset=IP_SET_FILENAME | --not-dipset=IP_SET_FILENAME}]
[{--anyset=IP_SET_FILENAME | --not-anyset=IP_SET_FILENAME}]
[{--nhipset=IP_SET_FILENAME | --not-nhipset=IP_SET_FILENAME}]
[--input-index=INTEGER_LIST] [--output-index=INTEGER_LIST]
[--tcp-flags=TCP_FLAGS] [--flags-all=HIGH_MASK_FLAGS_LIST]
[--fin-flag=SCALAR] [--syn-flag=SCALAR] [--rst-flag=SCALAR]
[--psh-flag=SCALAR] [--ack-flag=SCALAR] [--urg-flag=SCALAR]
[--ece-flag=SCALAR] [--cwr-flag=SCALAR]
[--flags-initial=HIGH_MASK_FLAGS_LIST]
[--flags-session=HIGH_MASK_FLAGS_LIST]
[--attributes=ATTRIBUTES_LIST] [--application=INTEGER_LIST]
[--ip-version=INTEGER_LIST]
[--scc=COUNTRY_CODE_LIST] [--dcc=COUNTRY_CODE_LIST]
[--stype=SCALAR] [--dtype=SCALAR]
[--ippair-any=FILENAME] [--ipport-any=FILENAME]
[--tuple-file=TUPLE_FILENAME { [--tuple-fields=FIELDS]
[--tuple-direction=DIRECTION]
[--tuple-delimiter=CHAR] } ]
[--python-expr=PYTHON_EXPR]
[--python-file=FILENAME [--python-file=FILENAME ...]]
[--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]
{ [--pmap-src-MAPNAME=LABELS] [--pmap-dst-MAPNAME=LABELS]
[--pmap-any-MAPNAME=LABELS] } ]
rwfilter [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[--plugin=PLUGIN ...] [--python-file=PATH]
[--data-rootdir=PATH] [--site-config-file=FILENAME]
--help
rwfilter --version
DESCRIPTION
rwfilter serves two purposes: (1) It acts as an interface to the data store to select which SiLK Flow records to process, and (2) it partitions those records into one or more pass and/or fail streams.
The selection switches let one choose records by where the flow was collected (its sensor), the date of collection, and the flow's direction.
The partitioning switches describe various types of traffic behavior (e.g., TCP traffic, or all traffic going to port 80). rwfilter identifies records matching or violating the behavior(s), and partitions them into appropriate output streams (i.e., files) as specified.
These output streams from rwfilter are always binary. The output must be passed through another tool in the SiLK Tool Suite for further processing to get human-readable output.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
Output Switches
At least one of the following output switches must be provided:
- --pass-destination=PASS_PATH
-
PASS_PATH refers to a non-existent file, a named pipe, or
stdout. The pass-destination will output records which have passed ALL of the partitioning predicates. - --fail-destination=FAIL_PATH
-
FAIL_PATH refers to a non-existent file, a named pipe, or
stdout. The fail-destination will output records which failed ANY of the partitioning predicates. - --all-destination=ALL_PATH
-
ALL_PATH refers to a file, a named pipe, or
stdout. This output will output all records read by rwfilter. - --print-statistics
- --print-statistics=PATH
-
Prints out the statistics on files read - the number of records which passed, the number which failed and the total read. If a PATH is provided, the statistics will be printed there; otherwise they are printed to the standard error.
- --print-volume-statistics
- --print-volume-statistics=PATH
-
An enhanced version of --print-statistics, in that the statistics include the number of records, packets, and bytes that passed and failed the filter.
- --help
-
Print the available options and exit. Options that add fields can be specified before --help so that the new options appear in the output. The available classes and types will be included in output; you may specify a different root directory or site configuration file before --help to see the classes and types available for that site.
- --version
-
Print the version number and information about how SiLK was configured, then exit the application.
Additional Switches
- --threads=N
-
Invoke rwfilter with N threads reading the input files. When this switch is not provided, the value in the SILK_RWFILTER_THREADS environment variable is used. If that variable is not set, rwfilter runs with a single thread. Using multiple threads, performance of rwfilter is greatly improved for queries that look at many files but return few records. Preliminary testing has found that performance peaks around four threads per CPU, but performance will vary depending on the type of query and the number of records returned.
- --input-pipe=INPUT_PATH
-
INPUT_PATH is a named pipe or the string
stdin. This refers to another source of rwfilter records. Note that rwfilter will not read from the standard input by default, to get this behavior, you must use --input-pipe=stdin. - --xargs=INPUT_PATH
-
Causes rwfilter to read file names from INPUT_PATH; the input should have one file name per line. rwfilter will open each file in turn and read records from it.
- --print-filenames
-
Print the names of input files as they are read. This can be useful feedback for a long-running rwfilter process.
- --dry-run
-
Perform a sanity check on the input arguments to check that the arguments are acceptable. In addition, prints to the standard output the names of the files that would be accessed (and the names of missing files if --print-missing is specified). rwfglob(1) can also be used to generate the lists of files that rwfilter will access.
- --max-pass-records=N
-
Write N records to each --pass-destination. rwfilter will stop reading input once it has written these N records unless the --fail-destination or --all-destination switches were specified.
- --max-fail-records=N
-
Write N records to each --fail-destination. rwfilter will stop reading input once it has written these N records unless the --pass-destination or --all-destination switches were specified.
- --note-add=TEXT
-
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
- --note-file-add=FILENAME
-
Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.
- --compression-method=COMP_METHOD
-
Set the compression method of the output to COMP_METHOD. Some SiLK tools can use an external library to compress their binary output. The list of available compression methods and the default method are set when SiLK is compiled (the --help and --version switches print the available and default compression methods) and depend on which supported libraries are found. SiLK can support:
- none
-
Do not compress the output using an external library
- zlib
-
Use the zlib(3) library for compressing the output
- lzo1x
-
Use the lzo1x algorithm from the LZO real time compression library for compression
- best
-
Use whichever available method gives the
bestcompression in general, though not necessarily thebestfor this particular output.
File Selection Options
The following options determine which files are read from the data store to provide the records.
- --start-date=YYYY/MM/DD[:HH]
- --end-date=YYYY/MM/DD[:HH]
-
The date predicates indicate which days and hours to consider when creating the list of files. The dates are expressed in
YYYY/MM/DD:HHformat. For example,2003/01/18:00represents the first hour of January 18th, 2003, while2002/10/01:22corresponds to 22:00 on October 1st, 2002. -
Whether the date strings represent times in GMT or the local timezone depend on how SiLK was compiled. See the output from --help or check the
Timezone supportsetting in the --version output to determine how your version of SiLK was compiled. -
When both --start-date and --end-date are specified to hour precision, all hours within that time range are processed.
-
When --start-date is specified to day precision, the hour specified in --end-date (if any) is ignored, and files for all dates between midnight on start-date and 23:59 on end-date are processed.
-
When --end-date is not specified and --start-date is specified to day precision, files for that complete day are processed.
-
When --end-date is not specified and --start-date is specified to hour precision, files for that single hour are processed.
-
It is an error to specify --end-date without specifying --start-date.
-
When neither --start-date nor --end-date is given, rwfilter processes all files for the current day.
- --class=CLASS
-
The --class switch is used to specify a group of data to process. Only a single class may be selected. Classes are defined in the silk.conf(5) site configuration file. If the --class option is not given, the default-class as specified in silk.conf is used. Use the --help option to see the list of available classes and the default class.
- --type={
all| TYPE[,TYPE]} -
The --type predicate further specifies data within the selected CLASS by listing the TYPEs of traffic to process. The switch takes a comma-separated list of types or the keyword
allwhich specifies all types for the specified CLASS. Types are defined in silk.conf, they typically refer to the direction of the flow, and they may vary by class. Classes typically define default-types to use when the --type switch is not specified. Use the --help option to get the list of available types for each class. - --flowtypes=CLASS/TYPE[,CLASS/TYPE ...]
-
The --flowtype predicate provides an alternate way to specify class/type pairs. The --flowtype switch allows a single rwfilter invocation to process data from multiple classes. The keyword
allmay be used for the CLASS and/or TYPE to select all classes and/or types. - --sensors=SENSOR[,SENSOR ...]
-
The --sensor switch is used to select data from specific sensors. The parameter is a comma separated list of sensor names, sensor IDs (integers), and/or ranges of sensor IDs. Sensors are defined in the silk.conf(5) site configuration file, and the mapsid(1) command can be used to print a mapping of sensor names to IDs and classes. When the --sensor switch is not specified, the default is to use all sensors which are valid for the specified class(es).
- --data-rootdir=PATH
-
This option causes rwfilter to use PATH as the root of the data store directory, which overrides the location given in the SILK_DATA_ROOTDIR environment variable, which overrides the location that was compiled into rwfilter. The default data store directory will be shown when the --version option is given.
- --site-config-file=FILENAME
-
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK_CONFIG_FILE environment variable is used if that variable is not empty. The value of SILK_CONFIG_FILE should include the name of the file. Otherwise, the application looks for a file named silk.conf in the following directories: the root of the data directory (see --data-rootdir); the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/; and the share/silk/ and share/ directories parallel to the application's directory.
- --print-missing-files
-
This option prints to the standard error file names that rwfilter's file selection switches expected to find but did not. This switch is useful for debugging, but the list of files it produces can be misleading. For example, suppose there is a decommissioned sensor that still appears in the silk.conf file to permit retrieval of historical data; these data files will be missing even though their absence is expected. Use the output from this switch judiciously.
Partitioning Switches
rwfilter supports the following partitioning switches, at least one of which must be specified. The switches are AND'ed together; i.e., to pass the filter, the record must pass the test implied by each switch. Any record that does not pass will be sent to the fail-destination(s), if specified.
SWITCH PARAMETERS
The forms of the parameters to these partitioning switches are:
-
DATE_RANGE is a range of two dates, start-range and end-range, each in the form
YYYY/MM/DD[:HH[:MM[:SS[.ssssss]]]], for example2003/01/31:23:45:00.000-2003/01/31:23:59:59.999represents the last fifteen minutes of Jan 31, 2003. The start-range and end-range must be set to at least day precision. For the start-range, unspecified hour, minute, second, and millisecond values set to 0; for the end-range, those values are set to 23, 59, 59, and 999 respectively. Thus2003/01/31:23-2003/01/31:23will become2003/01/31:23:00:00.000-2003/01/31:23:59:59.999. If an end-range is not given, it is set to the start-range, giving a range of a single millisecond. -
SCALAR is a single integer; for example
4. -
INTEGER_RANGE is a range of two positive integers: MIN-MAX; for example
1-500. If a single value is given, the range consists of that single value. For many options, the upper limit of the range may be omitted, such as1-, in which case the limit is set to the maximum value. -
INTEGER_LIST is a comma separated list of SCALARs and INTEGER_RANGEs; for example,
1,2,3,5-10,99-103. -
DECIMAL_RANGE is a range of decimal values with accuracy up to 10^-4 expressed as MIN-MAX; for example,
5.0-10.031. If a single value is given, the range consists of that single value. If the upper limit of the range may be omitted, such as1.5-, the limit is set to the maximum value. -
IP_ADDR_MASK are expressed in one of two forms. As CIDR blocks (192.168.0.0/16) or as four INTEGER_LISTs joined by dot
.. The characterxcan be used as an abbreviation for 0-255. For example,10.10,16-31.x.xrepresents the following CIDR blocks:10.10.0.0/16 10.16.0.0/20
-
TCP_FLAGS is any combination of the letters
F,S,R,P,A,U,E,C, whereF=FIN flag;S=SYN;R=RST;P=PSH;A=ACK;U=URG;E=ECE;C=CWR -
HIGH_MASK_FLAGS is a pair of TCP_FLAGS strings separated by a slash (
/). Flags to the right of the slash are the mask; any flag not listed in the mask may have any value. Flags to the left of the slash are the expected high flags; they must be set in the flow. Thus, flags listed in mask but not in high must be off for all packets in the flow. It is an error if a flag is listed in high but not in mask. Some examples:AS/ASFR-
means ACK,SYN must be high, FIN,RST must be low, and the other flags (PSH, URG, ECE, CWR) may have any value.
A/A-
means the ACK packet must be SET. All other flags may have any value.
/F-
means the FIN packet must be OFF. All other flags may have any value.
F/S-
is an error; use
F/FSinstead, which means FIN must be high, SYN must be low, and other flags can have any value. -
HIGH_MASK_FLAGS_LIST is a comma separated list of HIGH_MASK_FLAGS.
-
IP_SET_FILENAME is the name of a file containing a binary IPset. Binary IPsets are created from rwfilter output with the rwset tool, or from text input with the rwsetbuild(1) tool.
-
COUNTRY_CODE_LIST is a comma separated list of lowercase two-letter country codes, as well as the following special codes:
---
N/A (e.g. private and experimental reserved addresses)
a1-
anonymous proxy
a2-
satellite provider
o1-
other
-
ATTRIBUTES is any combination of the letters
F,T,C, whereF-
flow generator saw additional packets in this flow following a packet with a FIN flag (excluding ACK packets)
T-
flow generator prematurely created a record for a long-running connection due to a timeout. (When the flow generator yaf(1) is run with the --silk switch, it will prematurely create a flow and mark it with
Tif the byte count of the flow cannot be stored in a 32-bit value.) C-
flow generator created this flow as a continuation of long-running connection, where the previous flow for this connection met a timeout
-
HIGH_MASK_ATTRIBUTES is similar to HIGH_MASK_FLAGS: It is a pair of ATTRIBUTES strings separated by a slash (
/). Attributes to the right of the slash are the mask; an attribute not listed in the mask may have any value in the flow. Attributes to the left of the slash are the expected high attributes; they must be set in the flow. Thus, attributes listed in mask but not in high must be off for all packets in the flow. It is an error if an attribute is listed in high but not in mask. -
ATTRIBUTES_LIST is a comma separated list of HIGH_MASK_ATTRIBUTES.
An example: cx,uk,kr,jp,--
Consider a long-running ssh session that exceeds the flow generator's
active timeout. (This is the active timeout since the flow
generator creates a flow for a connection that still has activity).
The flow generator will create multiple flow records for this ssh
session, each spanning some portion of the total session. The first
flow record will be marked with a T indicating that it hit the
timeout. The second through next-to-last records will be marked with
TC indicating that this flow timed out and that this flow is a
continuation of a connection that timed out. The final flow will be
marked with a C, indicating that it was created as a continuation
of an active flow.
SWITCHES
The switches are:
- --stime=DATE_RANGE
-
Pass the record if its starting time is in this DATE_RANGE.
- --etime=DATE_RANGE
-
As --stime for the ending time.
- --active-time=DATE_RANGE
-
Pass the record if the record was active at ANY time during this DATE_RANGE. If a single time is specified, pass the record if it was active at that instant.
- --duration=DECIMAL_RANGE
-
Pass the record if its duration (eTime-sTime) is in this DECIMAL_RANGE. The DECIMAL_RANGE represents the time in seconds; use floating point numbers to specify millisecond ranges.
- --sport=INTEGER_LIST
-
Pass the record if its source port is in this INTEGER_LIST, possible values are 0-65535.
- --dport=INTEGER_LIST
-
Pass the record if its destination port is in this INTEGER_LIST, possible values are 0-65535
- --aport=INTEGER_LIST
-
Pass the record if its source port and/or its destination port is in this INTEGER_LIST, possible values are 0-65535. For example, use --aport=25 to see all SMTP conversions regardless or where they originated.
- --protocol=INTEGER_LIST
-
Pass the record if its IP Suite Protocol is in this INTEGER_LIST, possible values are 0-255.
- --icmp-type=INTEGER_LIST
-
Pass the record if its ICMP (or ICMPv6) type is in this INTEGER_LIST; possible values 0-255. This switch will also verify that the flow's protocol is 1 (or 58 if the flow is IPv6). It is an error to specify a --protocol that does not include 1 and/or 58.
- --icmp-code=INTEGER_LIST
-
Pass the record if its ICMP (or ICMPv6) code is in this INTEGER_LIST; possible values 0-255. This switch will also verify that the flow's protocol is 1 (or 58 if the flow is IPv6). It is an error to specify a --protocol that does not include 1 and/or 58.
- --bytes=INTEGER_RANGE
-
Pass the record if its byte count is in this INTEGER_RANGE.
- --packets=INTEGER_RANGE
-
Pass the record if its packet count is in this INTEGER_RANGE.
- --bytes-per-packet=DECIMAL_RANGE
-
Pass the record if its average bytes per packet count (bytes/packet) is in this DECIMAL_RANGE.
- --saddress=IP_ADDR_MASK
-
Pass the record if its source IP address is matched by this IP_ADDR_MASK. To match on multiple IPs, use an IPset (see --sipset).
- --daddress=IP_ADDR_MASK
-
Pass the record if its destination IP address is matched by this IP_ADDR_MASK (see also --dipset).
- --any-address=IP_ADDR_MASK
-
Pass the record if either its source or its destination IP address is matched by this IP_ADDR_MASK (see also --anyset). Does not consider the next-hop IP address.
- --not-saddress=IP_ADDR_MASK
-
Pass the record if its source IP address is not matched by this IP_ADDR_MASK (see also --not-sipset).
- --not-daddress=IP_ADDR_MASK
-
Pass the record if its destination IP address is not matched by this IP_ADDR_MASK (see also --not-dipset).
- --not-any-address=IP_ADDR_MASK
-
Pass the record if neither its source nor its destination IP address is matched by this IP_ADDR_MASK (see also --not-anyset). Does not consider the next-hop IP address.
- --sipset=IP_SET_FILENAME
-
Pass the record if its source IP address is in the list of IPs contained in the binary set file IP_SET_FILENAME
- --dipset=IP_SET_FILENAME
-
As --sipset for the destination IP address.
- --anyset=IP_SET_FILENAME
-
Pass the record if either its source IP address or its destination IP address is in the list of IPs contained in the binary set file IP_SET_FILENAME. Does not consider the next-hop IP.
- --nhipset=IP_SET_FILENAME
-
As --sipset for the next-hop IP address.
- --not-sipset=IP_SET_FILENAME
-
Pass the record if its source IP address is not in the list of IPs contained in the binary set file IP_SET_FILENAME
- --not-dipset=IP_SET_FILENAME
-
As --not-sipset for the destination IP address.
- --not-anyset=IP_SET_FILENAME
-
Pass the record if neither its source IP address nor its destination IP address is in the list of IPs contained in the binary set file IP_SET_FILENAME. Does not consider the next-hop IP.
- --not-nhipset=IP_SET_FILENAME
-
As --not-sipset for the next-hop IP address.
- --tcp-flags=TCP_FLAGS
-
Pass the record if, for any one of its packets, any of the specified TCP_FLAGS was on.
- --flags-all=HIGH_MASK_FLAGS_LIST
-
HIGH_MASK_FLAGS_LIST is a comma separated list of up to 16 HIGH_FLAGS/MASK_FLAGS pairs, where HIGH_FLAGS and MASK_FLAGS are lists of TCP_FLAGS. HIGH_FLAGS must be a subset of MASK_FLAGS. Pass the record if the flags listed in HIGH_FLAGS are set and the flags listed in MASK_FLAGS but not listed in HIGH_FLAGS are not-set. This switch accepts a list of values, so that
--flags-all=S/S,A/Awill pass flows that have either only-SYN high or only-ACK high. - --fin-flag=SCALAR
-
Set to 0, only passes records where the FIN Flag is Low, Set to 1, only passes records where the FIN Flag is high.
- --syn-flag=SCALAR
-
As --fin-flag except for the SYN Flag
- --rst-flag=SCALAR
-
As --fin-flag except for the RST Flag
- --psh-flag=SCALAR
-
As --fin-flag except for the PSH Flag
- --ack-flag=SCALAR
-
As --fin-flag except for the ACK Flag
- --urg-flag=SCALAR
-
As --fin-flag except for the URG Flag
- --ece-flag=SCALAR
-
As --fin-flag except for the ECE Flag
- --cwr-flag=SCALAR
-
As --fin-flag except for the CWR Flag
- --tuple-file=TUPLE_FILENAME
-
This switch provides support for partitioning by arbitrary subsets of the basic five-tuple:
-
{source-ip,destination-ip,source-port,destination-ip-port,protocol} -
A SiLK Flow record will pass the test when the record's fields match one of the tuples; if the SiLK record does not match any tuple, the record fails. The tuples are read from the text file TUPLE_FILENAME which must contain lines of delimited fields. The default delimiter is
|, but may be specified with the --tuple-delimiter switch. Each field contains one member of the tuple; the fields may appear in any order. The fields may represent any subset of the five-tuple, but each line in the file must define the same subset. A field that is present but has no value will generate an error. If you want the field to match any value, it is best that you not include that field in your input. -
In addition to the tuple-lines, TUPLE_FILENAME may contain blank lines and comments (which begin with
#and continue to the end of the line). The first line of TUPLE_FILENAME may contain a title labeling the fields in the file. This title line will be ignored when the --tuple-fields switch is given. -
The IP fields may contain an IPv4 address, an integer, or a IP in CIDR block notation. Comma-separated lists (
80,443) and ranges (0-1023,8080) are supported for the ports and protocol fields. NOTE: Currently the code is not clever in its support for CIDR notation and ranges in that each occurrence is fully expanded. When this occurs, the memory required to hold the search tree will quickly grow. - --tuple-fields=FIELDS
-
FIELDS contains the list of fields (columns) to parse from the TUPLE_FILENAME in the order in which they appear in the file. When this switch is not provided, rwfilter will treat the first line in TUPLE_FILENAME as a title line and attempt to determine the fields (a la rwtuc(1)); rwfilter will exit if it cannot determine the fields.
-
FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Names can be abbreviated to their shortest unique prefix. The field names and their descriptions are:
- sIP,sip,1
-
source IP address
- dIP,dip,2
-
destination IP address
- sPort,sport,3
-
source port
- dPort,dport,4
-
destination port
- protocol,5
-
IP protocol
- --tuple-direction=DIRECTION
-
Allows you to change the comparison between the tuple and the SiLK Flow record. This switch allows one to look for traffic in the reverse direction (or both directions) without having to write all of the rules twice. The available directions are:
- forward
-
The tuple's fields are compared against the corresponding fields on the flow; that is, sIP is compared with sIP, dIP with dIP, sPort with sPort, dPort with dPort, and protocol with protocol. This is the default.
- reverse
-
The tuple's fields are compared against the opposite fields on the flow; that is, sIP is compared with dIP, dIP with sIP, sPort with dPort, dPort with sPort, and protocol with protocol.
- both
-
Both of the above comparisons are performed.
- --tuple-delimiter=CHAR
-
Specifies the character separating the input fields. When the switch is not provided, the default of
|is used. - --ippair-any=FILENAME
-
Pass the record if the source IP and destination IP (in either order) match one of the IP-pairs listed in the text file FILENAME. Each line of FILENAME should contain two IP addresses separated by whitespace. This switch is equivalent to --tuple-file=FILENAME --tuple-fields=sIP,dIP --tuple-direction=both --tuple-delimiter=' '. You cannot use this switch in conjunction with --tuple-file or --ipport-any. This switch is deprecated and it exists for backward compatibility only; it may be removed in a future release.
- --ipport-any=FILENAME
-
Pass the record if either the source IP and port pair or the destination IP and port pair are listed in the text file FILENAME. Each line in FILENAME should contain an IP address and port list of interest for that IP separated by whitespace. This switch is equivalent to --tuple-file=FILENAME --tuple-fields=sIP,sPort --tuple-direction=both --tuple-delimiter=' '. You cannot use this switch in conjunction with --tuple-file or --ippair-any. This switch is deprecated and it exists for backward compatibility only; it may be removed in a future release.
- --plugin=PLUGIN
-
Augment the partitioning switches by using run-time loading of the plug-in (shared object) whose path is PLUGIN. The switch may be repeated to load multiple plug-ins. The creation of plug-ins is beyond the scope of this manual page; the process is described in Analysts' Handbook: Using SiLK for Network Traffic Analysis. When multiple Partitioning Switches are given, the code specified by the --plugin
switch(es)will be last to be invoked. When PLUGIN contains a slash (/), rwfilter assumes the path to PLUGIN is correct. Otherwise, rwfilter will attempt to find the file in $SILK_PATH/lib/silk, $SILK_PATH/share/lib, $SILK_PATH/lib, and in these directories parallel to the application's directory: lib/silk, share/lib, and lib. If rwfilter does not find the file, it assumes the plug-in is in the current directory. To force rwfilter to look in the current directory first, specify --plugin=./PLUGIN. When the SILK_PLUGIN_DEBUG environment variable is non-empty, rwfilter prints status messages to the standard error as it tries to open each of its plug-ins. - --dynamic-library=PLUGIN
-
This switch is deprecated. It is an alias for --plugin.
SiLK can store flows generated by enhanced collection software that provides more information than NetFlow v5. These flows may support some or all of these additional switches; for flows without this additional information, the field's value is always 0.
- --flags-initial=HIGH_MASK_FLAGS_LIST
-
As --flags-all, except this switch considers only the initial packet in the flow.
- --flags-session=HIGH_MASK_FLAGS_LIST
-
As --flags-all, except this switch ignores the initial packet in the flow.
- --attributes=ATTRIBUTES_LIST
-
ATTRIBUTES_LIST is a comma separated list of up to 8 HIGH_ATTRIBUTES/MASK_ATTRIBUTES pairs, where HIGH_ATTRIBUTES and MASK_ATTRIBUTES is a string of the ATTRIBUTE characters
F,T,C; see above for a description of these values. HIGH_ATTRIBUTES must be a subset of MASK_ATTRIBUTES. Pass the record if the attributes listed in HIGH_ATTRIBUTES are set and the attributes listed in MASK_ATTRIBUTES but not listed in HIGH_ATTRIBUTES are not-set. - --application=INTEGER_LIST
-
Some software that generates flow records from packet data, such as yaf(1), will inspect the contents of the packets that make up a flow and use traffic signatures to label the content of the flow. SiLK calls this label the application; yaf refers to it as the appLabel. The application is the port number that is traditionally used for that type of traffic (see the /etc/services file on most UNIX systems). For example, traffic that the flow generator recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard HTTP/web port (80). The flow generator uses a value for 0 if the application cannot be determined. The --application switch passes the flow if the flow's application value is in the specified INTEGER_LIST. For example, passing a value of
21to this switch will find traffic that the flow generation software labeled as FTP regardless of which port the traffic actually used. - --ip-version=INTEGER_LIST
-
Passes the flow if the IP Version is in the specified INTEGER_LIST. INTEGER_LIST can be
4,6, or4,6when SiLK has been compiled with IPv6 support. If SiLK does not have IPv6 support, the only legal value for this switch is4. - --scc=COUNTRY_CODE_LIST
-
Pass the record if the country code of its source IP address is in the specified COUNTRY_CODE_LIST. This switch requires that the country code mapping file is installed. See ccfilter(3).
- --dcc=COUNTRY_CODE_LIST
-
As --scc for the destination IP address.
For the following three filter tests, some file formats do not store these values, in which case the value is always 0:
- --next-hop-id=IP_ADDR_MASK
-
Pass the record if its next hop IP address is matched by this IP_ADDR_MASK.
- --not-next-hop-id=IP_ADDR_MASK
-
Pass the record if its next hop IP address is not matched by this IP_ADDR_MASK.
- --input-index=INTEGER_LIST
-
Pass the record if its incoming SNMP interface is in this INTEGER_LIST.
- --output-index=INTEGER_LIST
-
Pass the record if its outgoing SNMP interface is in this INTEGER_LIST.
Additional filtering switches are provided by run-time loading of plug-ins (shared object files or dynamic libraries) when the plug-in is available. rwfilter automatically looks for the following plug-ins:
ADDRESS TYPE (addrtype.so)
- --stype=SCALAR
-
When SCALAR is 0, pass the record if its source IP address is non-routable. When 1, pass if internal. When 2, pass if external (i.e., routable but not internal). When 3, pass if not internal (non-routable or external). See addrtype(3).
- --dtype=SCALAR
-
As --stype for the destination IP address.
PREFIX MAP (pmapfilter.so)
- --pmap-file=MAPNAME:PATH
- --pmap-file=PATH
-
When the prefix map plug-in is used, rwfilter reads the mapping file located at PATH. When MAPNAME is provided, it will be used to refer to the switches specific to that prefix map. If MAPNAME is not provided, rwfilter will check the prefix map file to see if a map-name was specified when the file was created. Using multiple --prefix-map switches allows additional prefix map files to be read as long as each uses a unique map-name. The --pmap-file
switch(es)must precede all other --pmap-* switches. For more information, see pmapfilter(3). - --pmap-src-MAPNAME=LABELS
-
If the prefix map associated with MAPNAME is an IP prefix map, this matches records with a source IPv4 address that maps to a label contained in the list of labels in LABELS.
-
If the prefix map associated with MAPNAME is a proto-port prefix map, this matches records with a protocol and source port combination that maps to a label contained in the list of labels in LABELS.
- --pmap-dst-MAPNAME=LABELS
-
Similar to --pmap-src-MAPNAME, but uses the destination IP or the protocol and destination port.
- --pmap-any-MAPNAME=LABELS
-
If the prefix map associated with MAPNAME is an IP prefix map, this matches records with a source IP address or a destination IP address that maps to a label contained in the list of labels in LABELS.
-
If the prefix map associated with MAPNAME is a port/protocol prefix map, this matches records with a protocol and source port or destination port combination that maps to a label contained in the list of labels in LABELS.
- --pmap-saddress=LABELS
- --pmap-daddress=LABELS
- --pmap-any-address=LABELS
- --pmap-daddress=LABELS
-
These are deprecated switches created by pmapfilter that correspond to --pamp-src-MAPNAME, --pmap-dst-MAPNAME, and --pmap-any-MAPNAME, respectively. These switches are available when an IP prefix map is used that is not associated with a MAPNAME.
- --pmap-sport-proto=LABELS
- --pmap-dport-proto=LABELS
- --pmap-any-port-proto=LABELS
- --pmap-dport-proto=LABELS
-
These are deprecated switches created by pmapfilter that correspond to --pamp-src-MAPNAME, --pmap-dst-MAPNAME, and --pmap-any-MAPNAME, respectively. These switches are available when a proto-port prefix map is used that is not associated with a MAPNAME.
PYTHON (silkpython.so)
The SiLK Python plug-in provides support for filtering by expressions or complex functions written in the Python programming language. See the silkpython(3) and pysilk(3) manual pages for information and examples for how to use Python to manipulate SiLK data structures. When multiple Partitioning Switches are given, the Python plug-in will be the next-to-last to be invoked. Only the code specified by the --plugin switch is called after the Python code.
- --python-file=FILENAME
-
Pass the record if the result of the processing the flow with the function named rwfilter() in FILENAME is true. The function should take a single silk.RWRec object as an argument. See silkpython(3) for details.
- --python-expr=PYTHON_EXPRESSION
-
Pass the record if the result of the processing the flow with the specified PYTHON_EXPRESSION is true. The expression is evaluated as if it appeared in the following context:
-
from silk import * def rwfilter(rec): return (PYTHON_EXPRESSION)
EXAMPLES
The most basic filtering involves looking at specific traffic over a specific time. For example:
rwfilter --start-date=2003/02/19:00 --end-date=2003/02/19:23 \
--pass=alltcp.rwf --proto=6
will create a file, alltcp.rwf containing all TCP traffic. This file contains SiLK Flow data in a binary format. To examine the contents, use the command rwcut(1).
Please note that the output file described above could be extremely large.
Once a file is written, rwfilter can filter the file again, for example:
rwfilter --aport=80 alltcp.rwf --pass=allweb.rwf
will generate allweb.rwf. This progressive filtering can also be done at the command line, but the interim files can be examined with rwcut, rwuniq(1) and other tools.
Multiple filters can be chained at the command line using pipes:
rwfilter --start-date=2003/02/19:00 --end-date=2003/02/19:23 \
--proto=6 --pass=stdout | \
rwfilter --input-pipe=stdin --aport=80 --packets=1-5 \
--pass=smallweb.rwf
ENVIRONMENT
- SILK_RWFILTER_THREADS
-
The number of threads to use while reading input files or files selected from the data store.
- PYTHONPATH
-
This environment variable is used by Python to locate modules. When --python-file or --python-expr is specified, rwfilter loads Python which in turn loads the PySiLK module which is comprised of several files (silk/pysilk_nl.so, silk/__init__.py, etc). If this silk/ directory is located outside Python's normal search path (for example, in the SiLK installation tree), it may be necessary to set or modify the PYTHONPATH environment variable to include the parent directory of silk/ so that Python can find the PySiLK module. For information on using Python from within rwfilter, see pysilk(3).
- SILK_PYTHON_TRACEBACK
-
When set, Python plug-ins will output traceback information on Python errors to stderr.
- SILK_COUNTRY_CODES
-
This environment variable allows the user to specify the country code mapping file that the --scc and --dcc switches use. The value may be a complete path or a file relative to the SILK_PATH. If the variable is not specified, the code looks for a file named country_codes.pmap in the location specified by SILK_PATH.
- SILK_CONFIG_FILE
-
This environment variable is used as the value for the --site-config-file when that switch is not provided.
- SILK_DATA_ROOTDIR
-
When set, overrides the compiled-in value for the location of the directory tree containing the files of SiLK Flow records collected and stored by the packing system (rwflowpack(8)). In addition, when the --site-config-file switch is not provided and the SILK_CONFIG_FILE environment variable is not set, rwfilter looks for the site configuration file in $SILK_DATA_ROOTDIR/silk.conf.
- SILK_PATH
-
This environment variable gives the root of the install tree. As part of its search for the SiLK site configuration file, rwfilter checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share. These directories are also searched when any other configuration file is required (e.g., the country code map). In addition, rwfilter looks for plug-ins in $SILK_PATH/lib/silk, $SILK_PATH/share/lib and $SILK_PATH/lib.
- SILK_PLUGIN_DEBUG
-
When set to 1, rwfilter prints status messages to the standard error as it tries to open each of its plug-ins.
- SILK_LOGSTATS
-
When set to a non-empty value, rwfilter will treat the value as the path to an external program to execute with information about this rwfilter invocation. If the value in SILK_LOGSTATS does not contain a slash or if it references a file that does not exist, is not a regular file, or is not executable, the SILK_LOGSTATS value is silently ignored. The arguments to the external program are:
-
The application name, i.e.,
rwfilter. Note thatrwfilteris always used as this argument, regardless of the name of the executable. -
The version number of this command line, currently
v0001. -
The start time of this invocation, as seconds since the UNIX epoch.
-
The end time of this invocation, as seconds since the UNIX epoch.
-
The number of data files opened for reading.
-
The number of records read.
-
The number of records written.
-
A variable number of arguments that are the complete command line used to invoke rwfilter, including the name of the executable.
- SILK_LOGSTATS_RWFILTER
-
If set, this environment variable overrides the value specified in SILK_LOGSTATS.
- SILK_LOGSTATS_DEBUG
-
If the environment variable is set to a non-empty value, rwfilter will print messages to the standard error about the SILK_LOGSTATS value being used and either the reason why the value cannot be used or the arguments to the external program being executed.
NOTES
rwfilter is the most commonly used application in the suite. It provides access to the data files and performs all the basic queries.
rwfilter supports a variety of I/O options - in addition to reading from the data store, rwfilter results can be chained together with named pipes to output results to multiple files simultaneously. An introduction to named pipes is outside the scope of this document, however.
Two often underused options are --dry-run and --print-statistics
--dry-run does a sanity check on the input arguments and should be used, especially for complicated arguments, to check that the arguments are acceptable.
--print-statistics used without --pass-destination or --fail-destination simply dumps aggregate statistics to stderr (not stdout) in the following format:
File <#input files> Read <# of recs read> \ Pass <# of recs passing the filter> \ Fail <# of recs failing the filter>
and can be used to do a quick pass through the data to get aggregate counts before going in deeper into the phenomenon being investigated.
--print-filename can be used as a progress meter; during long jobs, it shows which file is currently being read by the application. --print-filename will not provide meaningful results with piped input.
Filters are applied in the order given on the command line. It is best to apply the biggest filters first.
The switches used to create a filter output file are stored in the file itself. Use the rwfileinfo(1) command to see this information.
SEE ALSO
rwcount(1), rwcut(1), rwfglob(1), rwfileinfo(1), rwset(1), rwsort(1), rwstats(1), rwtotal(1), rwuniq(1), rwtuc(1), rwsetbuild(1), mapsid(1), addrtype(3), ccfilter(3), pmapfilter(3), pysilk(3), silkpython(3), silk.conf(5), silk(7), rwflowpack(8), yaf(1), zlib(3), Analysts' Handbook: Using SiLK for Network Traffic Analysis


