NAME

rwmatch - Match SiLK records from two streams into a common stream

SYNOPSIS

rwmatch --relate=FIELD_PAIR [--relate=FIELD_PAIR ...]
      [--time-delta=DELTA] [--symmetric-delta]
      [{ --absolute-delta | --relative-delta | --infinite-delta }]
      [--unmatched={q|r|b}]
      [--note-add=TEXT] [--note-file-add=FILE]
      [--ipv6-policy={ignore,asv4,mix,force,only}]
      [--compression-method=COMP_METHOD]
      [--site-config-file=FILENAME]
      QUERY_FILE RESPONSE_FILE OUTPUT_FILE

rwmatch --help

rwmatch --help-relate

rwmatch --version

DESCRIPTION

rwmatch provides a facility for relating (or matching) SiLK Flow records contained in two sorted input files, labeling those flow records, and writing the records to an output file.

The two input files are called QUERY_FILE and RESPONSE_FILE, respectively. The purpose of rwmatch is to find a record in QUERY_FILE that represents some network stimulus that caused a reply which is represented by a record in RESPONSE_FILE. When rwmatch discovers this relationship, it assigns a numeric ID to the match, searches both input files for additional records that are part of the same event, stores the numeric ID in each matching record's next hop IP field, and writes all records that are part of that event to OUTPUT_FILE.

When the --symmetric-delta switch is specified, rwmatch also checks for a stimulus in RESPONSE_FILE that triggered a reply in QUERY_FILE. This is useful when matching flows where either side may have initiated the conversation.

The input files must be sorted as described in "Sorting the input" below. To use the standard input in place of one of the input streams, specify stdin or - in its place.

The criteria for defining a match are given by one of more uses of the --relate switch and by the timestamps on the flow records:

Once rwmatch establishes a match between records in the two input files, it searches for additional records from both input files to add to the match.

To do this, rwmatch denotes one of the records that comprise the initial match pair as a base record. When possible, the base record is the record with the earlier start time. In the case of a tie, the base is determined by ports for TCP and UDP with the base being that with the lower port if one is above 1024 and the other below 1024. If that also fails, the base record is the record read from QUERY_FILE. With millisecond time resolution, ties should be rare.

To determine whether a match exists between the base record and a candidate record, rwmatch uses the FIELD_PAIRs specified by --relate. When the base record and the candidate record were read from the same file, only one side of each FIELD_PAIR is used.

In addition to the records having identical values for each field in FIELD_PAIRs, the candidate record must be within a time window determined by the --time-delta switch and the --absolute-delta, --relative-delta, and --infinite-delta switches.

Because long-lived sessions are often broken into multiple flows, rwmatch may discard records that are part of a long-lived session. The --relative-delta switch may compensate for this if the gap between flows is less that the time specified in the --time-delta switch. The --infinite-delta will compensate for arbitrarily long gaps, but it may add records to a match that are not part of a true session. DNS flows that use port 53/udp as both a service and reply port are an example.

When rwmatch establishes a match, it increments the match ID, with the first match having a match ID of 1. To label the records that comprise the match, rwmatch uses a 32-bit number where the lower 24-bits hold the match ID and the upper 8-bits is set to 0 or 255 to indicate whether the record was read from QUERY_FILE or RESPONSE_FILE, respectively. rwmatch stores this 32-bit number in the next hop IP field of the records. If the record is IPv6, rwmatch maps the number into the ::ffff:0:0/96 netblock before modifying setting the next hop IP. Apart from the change to the next hop IP field, the query and response records are not modified.

By default, only matched records are written to the OUTPUT_FILE and any record that could not be determined to be part of a match is discarded.

Specifying the --unmatched switch tells rwmatch to write unmatched query and/or response records to OUTPUT_FILE. The required parameter is one of q, r, or b to write the query records, the response records, or both to OUTPUT_FILE. Unmatched query records have their next hop IP set to 0.0.0.0, and unmatched response records have their next hop IP set to 255.0.0.0.

Sorting the input

As rwmatch reads QUERY_FILE and RESPONSE_FILE, it expects the SiLK Flow records to appear in a particular order that is best achieved by using rwsort(1). In particular:

When rwmatch processes the following command

$ rwmatch --relate=1,2 --relate=2,1 --relate=5,5 Q.rw R.rw out.rw

it assumes the file1.rw and file2.rw were created by

$ rwsort --fields=1,2,5,stime --output=Q.rw input1.rw ....
$ rwsort --fields=2,1,5,stime --output=R.rw input2.rw ....

If the files source_ips.s.rw and dest_ips.s.rw are created by the following commands:

$ rwsort --field=1,9 source_ips.rw > source_ips.s.rw
$ rwsort --field=2,9 dest_ips.rw > dest_ips.s.rw

The following call to rwmatch works correctly:

$ rwmatch --relate=1,2 source_ips.s.rw dest_ips.s.rw matched.rw

Note that the following command produces very few matches since source_ips.s.rw was sorted on field 1 and dest_ips.s.rw was sorted on field 2.

$ rwmatch --relate=2,1 source_ips.s.rw dest_ips.s.rw stdout

The recommended sort ordering for TCP and UDP is shown below. This correctly handles multiple flows occurring during the same time interval which involve multiple ports:

$ rwsort --fields=1,4,2,3,5,stime incoming.rw > incoming-query.rw
$ rwsort --fields=2,3,1,4,5,stime outgoing.rw > outgoing-response.rw

The corresponding rwmatch command is:

$ rwmatch --relate=1,2 --relate=4,3 --relate=2,1 --relate=3,4 \
       --relate=5,5 incoming-query.rw outgoing-response.rw matched.rw

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--relate=FIELD_PAIR

Specify a pair of fields where the value of these fields in two records must be identical for the records to be considered part of a match. The first field is for records from QUERY_FILE and the second for records from RESPONSE_FILE. At least one FIELD_PAIR must be provided; up to 128 FIELD_PAIRs may be provided. The FIELD_PAIR must contain two field names or field IDs separated by a comma, such as --relate=dip,sip or --relate=proto,proto.

Each FIELD_PAIR is unidirectional; specifying --relate=sip,dip matches records where the query record's source IP matches the response record's destination IP, but does not imply any relationship between the response's source IP and query's destination IP. To match symmetric flow records between hosts, specify:

--relate=sip,dip --relate=dip,sip

When using a port-based protocol (e.g., TCP or UDP), refine the match further by specifying the ports:

--relate=2,1 --relate=1,2 --relate=3,4 --relate=4,3

Matching becomes more specific as more fields are added. Since rwmatch discards unmatched records, a highly specific match (such as the last one specified above) generates more matches (resulting in higher match IDs), but may result in fewer total flows due to certain records being unmatched.

The available fields are listed here. For a better description of some of these fields, see the rwcut(1) manual page.

sIP,1

source IP address

dIP,2

destination IP address

sPort,3

source port for TCP and UDP, or equivalent

dPort,4

destination port for TCP and UDP, or equivalent

protocol,5

IP protocol

packets,pkts,6

packet count

bytes,7

byte count

flags,8

bit-wise OR of TCP flags over all packets

sensor,12

name or ID of sensor at the collection point

class,20

class of sensor at the collection point

type,21

type of sensor at the collection point

iType

the ICMP type value for ICMP or ICMPv6 flows and empty for non-ICMP flows. This field was introduced in SiLK 3.8.1.

iCode

the ICMP code value for ICMP or ICMPv6 flows and empty for non-ICMP flows. See note at iType.

in,13

router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))

out,14

router SNMP output interface or postVlanId

initialFlags,26

TCP flags on first packet in the flow

sessionFlags,27

bit-wise OR of TCP flags over all packets except the first in the flow

attributes,28

flow attributes set by the flow generator

application,29

guess as to the content of the flow

--time-delta=DELTA

Specify the number of seconds by which a response record may start after a query record has ended. DELTA may contain fractional seconds to millisecond precision; for example, 0.500 represents a 500 millisecond delay. Responses match queries if

query.sTime <= response.sTime <= query.eTime + DELTA

When --time-delta is not specified, DELTA defaults to 0 and the response must begin before the query ends.

--symmetric-delta

Allow matching of flows where the RESPONSE_FILE contains the initial flow. In this case, a query record matches a response record when

response.sTime <= query.sTime <= response.eTime + DELTA
--absolute-delta

When adding additional records to an established match, only include candidate flows that start less than DELTA seconds after the end of the initial flow. This is the default behavior. This switch is incompatible with --relative-delta and --infinite-delta.

--relative-delta

When adding additional records to an established match, include candidate flows that start within DELTA seconds of the greatest end time for all records in the current match. This switch is incompatible with --absolute-delta and --infinite-delta.

--infinite-delta

When adding additional records to an established match, include candidate records based on the FIELD_PAIRS alone, ignoring time. This switch is incompatible with --absolute-delta and --relative-delta.

--unmatched=q|r|b

Write unmatched query and/or response records to OUTPUT_FILE. The parameter determines whether the query records, the response records, or both are written to OUTPUT_FILE. Unmatched query records have their next hop IPv4 address set to 0.0.0.0, and unmatched response records have their next hop IPv4 address set to 255.0.0.0. When the b value is used, OUTPUT_FILE contains a complete merge of QUERY_FILE and RESPONSE_FILE.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--ipv6-policy=POLICY

Determine how IPv4 and IPv6 flows are handled when SiLK has been compiled with IPv6 support. When the switch is not provided, the SILK_IPV6_POLICY environment variable is checked for a policy. If it is also unset or contains an invalid policy, the POLICY is mix. When SiLK has not been compiled with IPv6 support, IPv6 flows are always ignored, regardless of the value passed to this switch or in the SILK_IPV6_POLICY variable. The supported values for POLICY are:

ignore

Ignore any flow record marked as IPv6, regardless of the IP addresses it contains.

asv4

Convert IPv6 flow records that contain addresses in the ::ffff:0:0/96 netblock (that is, IPv4-mapped IPv6 addresses) to IPv4 and ignore all other IPv6 flow records.

mix

Process the input as a mixture of IPv4 and IPv6 flow records. Should rwmatch need to compare an IPv4 and IPv6 address, it maps the IPv4 address into the ::ffff:0:0/96 netblock.

force

Convert IPv4 flow records to IPv6, mapping the IPv4 addresses into the ::ffff:0:0/96 netblock.

only

Process only flow records that are marked as IPv6 and ignore IPv4 flow records in the input.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwmatch searches for the site configuration file in the locations specified in the "FILES" section.

--help

Print the available options and exit.

--help-relate

Print the description and aliases of each field that may be used as arguments to the --relate switch and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

Matching TCP Flows

rwmatch is a generalized matching tool; the most basic function provided by rwmatch is the ability to match both sides of a TCP connection. Given incoming and outgoing web traffic in two files web_in.rw and web_out.rw, the following sequence of commands will generate a file, web-sessions.rw consisting of matched sessions for every complete web session in web_in.rw and web_out.rw:

$ rwsort --field=1,2,3,4,stime web_in.rw  > web_in-s.rw
$ rwsort --field=2,1,4,3,stime web_out.rw > web_out-s.rw

$ rwmatch --relate=1,2 --relate=2,1 --relate=3,4 --relate=4,3      \
       web_in-s.rw  web_out-s.rw  web-sessions.rw

Finding Responses to a Scan

Because rwmatch can match fields arbitrarily, you can also match records across different protocols. Suppose there are two SiLK Flow files, indata.rw and outdata.rw, that contain the incoming and outgoing data, respectively, for a particular time period.

To trace responses to a scan attempt, we start by identifying a specific horizontal scan. In this example, we use an SMTP scan on TCP port 25. Assume that we have an IPset file, smtp-scanners.set, that contains the external IP addresses that scanned us port port 25. (Perhaps this file was obtained by using rwscan(1) and rwscanquery(1).)

First, use rwfilter(1) to find the flow records matching these scan attempts in the incoming data file. Sort the output of rwfilter by source IP, source port, destination IP, destination port, and time, and store the results in smtp-scans.rw:

$ rwfilter --proto=6 --sip-set=smtp-scanners.set --dport=25        \
       --pass=-  indata.rw                                         \
  | rwsort --field=sip,sport,dip,dport,stime > smtp-scans.rw

We can identify hosts that responded to the scan (we consider a accepting the TCP connection as a response) by finding potential replies in the outgoing data file, sorting them, and storing the results in scan-response.rw. For this command on the outgoing data, note that we must swap source and destination from the values used for the incoming data:

$ rwfilter --proto=6 --dip-set=smtp-scanners.set --sport=25        \
       --pass=-  outdata.rw                                        \
  | rwsort --field=dip,dport,sip,sport,stime > scan-response.rw

We can now match the flow records to produce the file matched-scans.rw:

$ rwmatch --relate=1,2 --relate=3,4 --relate=2,1 --relate=4,3      \
       smtp-scans.rw  scan-response.rw  matched-scans.rw

The results file, matched-scans.rw, will contain all the exchanges between the scanning hosts and the responders on port 25. Examination of these flows may show evidence of buffer overflows, data exfiltration, or similar attacks.

Next, we want to identify responses to the scan that were produced by our routers, such as ICMP destination unreachable messages.

Use rwfilter to find the ICMP messages going to the scanning hosts, sort the flow records, and store the results in icmp.rw:

$ rwfilter --proto=1 --icmp-type=3 --pass=stdout  outdata.rw       \
  | rwsort --field=dip,stime > icmp.rw

Run rwmatch and match exclusively on the IP address.

$ rwmatch --relate=2,1  icmp.rw  smtp-scans.rw  result.rw

The resulting file, result.rw will consist of single packet flows (from smtp-scans.rw) with an ICMP response (from icmp.rw).

Similar queries can be used to identify other multiple-protocol phenomena, such as the results of a traceroute.

Displaying the Results

These examples assume matched.rw is an output file produced by rwmatch.

When using rwcut(1) to display the records in matched.rw, you may specify the next hop IP field (nhIP) to see the match identifier:

$ rwcut --num-rec=8 --fields=sip,sport,dip,dport,type,nhip matched.rw
            sIP|sPort|            dIP|dPort|   type|           nhIP|
    10.4.52.235|29631|192.168.233.171|   80|  inweb|        0.0.0.1|
192.168.233.171|   80|    10.4.52.235|29631| outweb|      255.0.0.1|
    10.9.77.117|29906| 192.168.184.65|   80|  inweb|        0.0.0.2|
 192.168.184.65|   80|    10.9.77.117|29906| outweb|      255.0.0.2|
  10.14.110.214|29989| 192.168.249.96|   80|  inweb|        0.0.0.3|
 192.168.249.96|   80|  10.14.110.214|29989| outweb|      255.0.0.3|
    10.18.66.79|29660| 192.168.254.69|   80|  inweb|        0.0.0.4|
 192.168.254.69|   80|    10.18.66.79|29660| outweb|      255.0.0.4|

The first record is a query from the external host 10.4.52.235 to the web server on the internal host 192.168.233.171, and the second record is the web server's response. The third and fourth records represent another query/response pair.

The cutmatch(3) plug-in is an alternate way to display the match parameter that rwmatch writes into the next hop IP field. The cutmatch plug-in defines a match field that displays the direction of the flow (-> represents a query and <- a response) and the match ID. To use the plug-in, you must explicit load it into rwcut by specifying the --plugin switch. You can then add match to the list of --fields to print:

$ rwcut --plugin=cutmatch.so --num-rec=8  \
       --fields=sip,sport,match,dip,dport,type matched.rw
            sIP|sPort| <->Match#|            dIP|dPort|   type|
    10.4.52.235|29631|->       1|192.168.233.171|   80|  inweb|
192.168.233.171|   80|<-       1|    10.4.52.235|29631| outweb|
    10.9.77.117|29906|->       2| 192.168.184.65|   80|  inweb|
 192.168.184.65|   80|<-       2|    10.9.77.117|29906| outweb|
  10.14.110.214|29989|->       3| 192.168.249.96|   80|  inweb|
 192.168.249.96|   80|<-       3|  10.14.110.214|29989| outweb|
    10.18.66.79|29660|->       4| 192.168.254.69|   80|  inweb|
 192.168.254.69|   80|<-       4|    10.18.66.79|29660| outweb|

Using the sIP and dIP fields is confusing when the file you are examining contains both incoming and outgoing flow records. To make the output from rwmatch more clear, use the int-ext-fields(3) plug-in as well. That plug-in allows you to display the external IPs in one column and the internal IPs in a another column. See its manual page for additional information.

$ export INCOMING_FLOWTYPES=all/in,all/inweb
$ export OUTGOING_FLOWTYPES=all/out,all/outweb
$ rwcut --plugin=cutmatch.so --plugin=int-ext-fields.so --num-rec=8 \
     --fields=ext-ip,ext-port,match,int-ip,int-port,proto matched.rw
        ext-ip|ext-p| <->Match#|         int-ip|int-p|   type|
   10.4.52.235|29631|->       1|192.168.233.171|   80|  inweb|
   10.4.52.235|29631|<-       1|192.168.233.171|   80| outweb|
   10.9.77.117|29906|->       2| 192.168.184.65|   80|  inweb|
   10.9.77.117|29906|<-       2| 192.168.184.65|   80| outweb|
 10.14.110.214|29989|->       3| 192.168.249.96|   80|  inweb|
 10.14.110.214|29989|<-       3| 192.168.249.96|   80| outweb|
   10.18.66.79|29660|->       4| 192.168.254.69|   80|  inweb|
   10.18.66.79|29660|<-       4| 192.168.254.69|   80| outweb|

ENVIRONMENT

SILK_IPV6_POLICY

This environment variable is used as the value for --ipv6-policy when that switch is not provided.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the "FILES" section, rwmatch may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwmatch may use this environment variable. See the "FILES" section for details.

FILES

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/share/silk/silk.conf
/usr/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwfilter(1), rwsort(1), rwcut(1), rwfileinfo(1), rwscan(1), rwscanquery(1), cutmatch(3), int-ext-fields(3), sensor.conf(5), silk(7), zlib(3)

NOTES

SiLK 3.9.0 expanded the set of fields accepted by the --relate switch and added support for IPv6 flow records.