NAME
rwmatch - Matches SiLK records from two streams into a common stream
SYNOPSIS
rwmatch --relate=FIELD_PAIR [--relate=FIELD_PAIR ...]
[--time-delta=DELTA] [--symmetric-delta] [--unmatched={q|r|b}]
[--absolute-delta] [--relative-delta] [--infinite-delta]
[--site-config-file=FILENAME]
QUERY_FILE RESPONSE_FILE OUTPUT_FILE
DESCRIPTION
rwmatch provides a facility for relating SiLK Flow records contained in two files, called the QUERY_FILE and the RESPONSE_FILE. This relationship results in an OUTPUT_FILE which contains records from the QUERY_FILE and their matching responses from the RESPONSE_FILE.
Matching criteria are defined by using the --relate switch, which specifies that two fields (one in the QUERY_FILE, one in the RESPONSE_FILE) are related. A query and a response record match if all the field pairs specified by --relate are equal, and they occur close to each other in time. For the default case, absolute matches, the time period is defined as being the interval between the start time and end time of the query flow: if the start time of the response flow is greater than or equal to the start time of the query flow and less than or equal to the end time of the query flow, it is considered to match. The end time can be extended using the --time-delta switch. When the --symmetric-delta switch is provided, the response may precede the query by time-delta seconds as well. This is useful when matching flows where either side may have initiated the conversation. The --relative-delta switch alters the interval so that a matching flow may start up to time-delta seconds after the end of the most recently ending member of the match. The --infinite-delta switch ignores time-delta after the initial match pair is formed and continues the match as long as the match fields are equal.
The match direction is determined by the source of the earliest flow of the initial pair. In the case of a tie, the direction is determined by ports for TCP and UDP with the direction being towards the low port if one is above 1023 and the other below 1024. The default direction is that of the QUERY_FILE. In most cases of millisecond time resolution, ties should be rare.
Matched records are written to the OUTPUT_FILE. Each set of matching records is assigned a match ID which is written into the next hop IP field. The last 24 bits of the next hop IP are used to store the match ID, while the first 8 bits are used to indicate whether the record in question came from the query file (first octet set to 0), or the response file (first octet set to 255). Therefore, the first match identified will have 0.0.0.1 in the query record's next hop IP field, the query record for the second match will have 0.0.0.2 and so on. The corresponding response records will have 255.0.0.1, 255.0.0.2 and so on. Apart from the change to the next hop IP, query and response records are not modified.
Unmatched query and/or response records may written to the
OUTPUT_FILE by specifying the --unmatched switch. The required
parameter is one of q, r, or b to write the query records, the
response records, or both to OUTPUT_FILE. Unmatched query records
have their next hop IP set to 0.0.0.0, and unmatched response records
have their next hop IP set to 255.0.0.0.
When the --unmatched switch is not specified, the rwmatch application only writes records which have a query and a response; if a query record does not have a corresponding response, the query record is discarded. Similarly, if responses do not have a corresponding query, they are discarded. Once a match is found, multiple records from both sides may be included. The matched members are written to OUTPUT_FILE in order of their start times with ties being broken in favor of the match direction.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
- --relate=FIELD_PAIR
- Specifies a pair of fields separated by a comma, one in the query record, one in the response record. These data in these fields must be identical for two records to match. At least one FIELD_PAIR must be provided; up to 8 FIELD_PAIRs may be provided. The available fields are:
- --time-delta=DELTA
- Specifies the number of seconds by which a response can follow a query. DELTA may contain fractional sections; for example, 0.500 represents a 500 millisecond delay. Responses match queries if
-
query:sTime <= response:sTime response:sTime <= query:eTime + DELTA
-
Use the --time-delta switch to allow for a delay in the response. Although responses usually occur within a second of the query, delays of several seconds are not uncommon due to combinations of host and network processing delays. The DELTA value can also compensate for timing errors between multiple sensors.
- --symmetric-delta
- Allows matching of flows where the RESPONSE_FILE contains the initial flow. This switch allows the response to precede the query by DELTA seconds.
- --unmatched=q|r|b
- Causes unmatched query and/or response records to be written to the OUTPUT_FILE. The parameter determines whether the query records, the response records, or both are written to OUTPUT_FILE. Unmatched query records have their next hop IP set to 0.0.0.0, and unmatched response records have their next hop IP set to 255.0.0.0. When the b value is used, OUTPUT_FILE contains a complete merge of QUERY_FILE and RESPONSE_FILE.
- --site-config-file=FILENAME
- Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK_CONFIG_FILE environment variable is used if that variable is not empty. The value of SILK_CONFIG_FILE should include the name of the file. Otherwise, the application looks for a file named silk.conf in the following directories: the directory specified in the SILK_DATA_ROOTDIR environment variable; the data root directory that is compiled into SiLK (use the --version switch to view this value); the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/; and the share/silk/ and share/ directories parallel to the application's directory.
-
source IP
destination IP
source port
destination port
protocol
packets
bytes
flags
FIELD_PAIR is always given as a pair, such as:
-
--relate=
2,1 (query:destination IP :: response:source IP)
--relate=5,5 (query:protocol :: response:protocol)
FIELD_PAIR is unidirectional; specifying
--relate=1,2
will match records where the query record's source IP matches the response record's destination IP, regardless of the values for the other IP's in the set. To match symmetric flow records between hosts, match both the source and destination IP addresses by specifying:
--relate=2,1 --relate=1,2
This command will match source and destination IP addresses. When using a port-based protocol (e.g., TCP or UDP), refine the match further by specifying the ports:
--relate=2,1 --relate=1,2 --relate=3,4 --relate=4,3
Matching becomes more specific as more fields are added. Since rwmatch discards unmatched records, a highly specific match (such as the last one specified above) will generate more matches (resulting in higher match IDs), but may result in fewer total flows due to certain records being unmatched.
LIMITATIONS
rwmatch works by walking through each record in QUERY_FILE; for every query record, it walks through RESPONSE_FILE to look for records which match the query record according to user specification. If it finds matching records, it outputs the query and all corresponding additional queries and responses. All records are examined once, and if not output, are discarded.
rwmatch requires that the SiLK Flow records in the QUERY_FILE and RESPONSE_FILE be sorted. In particular:
-
Sorting must be done for all fields specified in the --relate
switches as well as time and must be done in exactly the same order.
The sort ordering must exactly match the --relate ordering. The
first relate pair must be the first sort field for the QUERY_FILE
and the first sort field for the RESPONSE_FILE. Similarly for the
remaining fields.
The recommended sort ordering for TCP and UDP is shown below. This
correctly handles multiple flows occurring during the same time
interval which involve multiple ports:
rwsort --fields=1,4,2,3,5,9 incoming.rwf > incoming-query.rwf rwsort --fields=2,3,1,4,5,9 outgoing.rwf > outgoing-response.rwf
For example, to match on the --relate=1,2 case shown above, sort as follows:
rwsort --field=1,9 source_ips.rwf > source_ips.s.rwf rwsort --field=2,9 dest_ips.rwf > dest_ips.s.rwf
And then call rwmatch with
rwmatch --relate=1,2 source_ips.s.rwf dest_ips.s.rwf matched.rwf
But note that
rwmatch --relate=2,1 source_ips.s.rwf dest_ips.s.rwf stdout
Will not produce meaningful results, since source_ips.s.rwf was sorted on field 1, and dest_ips.s.rwf was sorted on field 2.
rwmatch assumes that there is parity between queries and responses; that is, for every response record, there exists one query record. Because long-lived sessions are often broken into multiple flows, rwmatch may discard records that are part of a long-lived session. The --relative-delta switch will compensate for this if the gap between flows is less that the time specified in the --time-delta switch. The --relative-delta will compensate for arbitrarily long gaps, but it may group flows that are not part of a true session. DNS flows that use port 53/udp as both a service and reply port are an example.
EXAMPLES
rwmatch is a generalized matching tool; the most basic function provided by rwmatch is the ability to match both sides of a TCP connection. For example, given incoming and outgoing web traffic in two files web_in.rwf and web_out.rwf
rwsort --field=1,3,9 web_in.rwf > web_in.s.rwf rwsort --field=2,4,9 web_out.rwf > web_out.s.rwf
rwmatch --relate=1,2 --relate=2,1 --relate=3,4 --relate=4,3 \
web_in.s.rwf web_out.s.rwf web.sessions.rwf
Will generate a file consisting of matched sessions for every complete web session in web_in.rwf and web_out.rwf
Because rwmatch can match fields arbitrarily, you can also match records across different protocols. For example, to trace all router responses to a scan attempt, we start by identifying a specific horizontal scan. Assume that we have a list of scanners for port X, perhaps obtained from rwscan. The list is an IPset in Xscan.set.
rwfilter --proto=6 --sip-set=Xscan.set --dport=X --pass=Xscanners.rwf indata rwsort --field=1,3,2,4,9 Xscanners.rwf > Xscanners.s.rwf
We can identify hosts that responded to the scan by accepting the TCP connection with by extracting potential replies and sorting them.
rwfilter --proto=6 --dip-set=Xscan.set --sport=X --pass=XRscanners.rwf indata rwsort --field=2,4,1,3,9 XRscanners.rwf > XRscanners.s.rwf
rwmatch --relate=1,2 --relate-3,4 --relate=2,1 --relate=4,3 \
Xscanners.s.rwf XRscanners.s.rwf XMscanners.s.rwf
The results file, XMscanners.rwf, will contain all the exchanges between the scanning hosts and the responders on port X. Examination of these flows may show evidence of buffer overflows or similar attacks.
We then identify all ICMP destination unreachable messages:
rwfilter --proto=1 --dport=768-1023 --pass=icmp.rwf outdata rwsort --field=2,9 icmp.rwf > icmp.s.rwf
We then match exclusively on IP address
rwmatch --relate=2,1 icmp.s.rwf Xscanners.s.rwf result.rwf
The resulting file will consist of single packet flows (from Xscanners.s.rwf) with an ICMP response (from icmp.s.rwf). Similar queries can be used to identify other multiple-protocol phenomena, such as the results of a traceroute.
When using rwcut to display the output file produced by rwmatch,
consider using the cutmatch.so plug-in to display the match
parameter that rwmatch writes into the next hop IP field. The
cutmatch.so plug-in displays the direction of the flow and the
match ID. To use the plug-in, specify the --dynamic-library switch
to rwcut and modify the list of --fields to include match:
$ rwcut --dynamic-lib=cutmatch.so --fields=1,3,match,2,4,5 matched.rwf
sIP|sPort| <->Match#| dIP|dPort|pro|
192.168.251.79|49636|-> 1| 10.10.10.65| 80| 6|
10.10.10.65| 80|<- 1| 192.168.251.79|49636| 6|
192.168.251.79|49637|-> 2| 10.10.10.65| 80| 6|
10.10.10.65| 80|<- 2| 192.168.251.79|49637| 6|
ENVIRONMENT
- SILK_CONFIG_FILE
- This environment variable is used as the value for the --site-config-file when that switch is not provided.
- SILK_DATA_ROOTDIR
- When the --site-config-file switch is not provided and the SILK_CONFIG_FILE environment variable is not set, rwmatch looks for the site configuration file in $SILK_DATA_ROOTDIR/silk.conf.
- SILK_PATH
- This environment variable gives the root of the install tree. As part of its search for the SiLK site configuration file, rwmatch checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share.
SEE ALSO
rwfilter(1), rwsort(1), rwgroup(1), rwcut(1)
BUGS
When used in an IPv6 environment, rwmatch will attempt to convert any IPv6 addresses to IPv4. Records that can be converted will be processed, all other records will be silently ignored.


