CERT/CC
background
background
CERT NetSA Security Suite 
Open Source Tools for Network Monitoring 
News | Documentation | Downloads
YAF 0.8.1 | NAF 0.6.0 | SiLK 1.0.1 | RAVE 1.9.9
fixbuf 0.7.3 | ipa 0.2.1 | airdbc 0.2.2 | airframe 0.7.2 | Portal 0.8.0
SiLK - Documentation - rwscan
Documentation | Downloads | Release Notes | FAQ | License | Credits | Reference Data | Live CD


NAME

rwscan - Detect scanning activity in a SiLK dataset


SYNOPSIS

  rwscan [--scan-model=MODEL] [--output-path=OUTFILE]
        [--trw-sip-set=SETFILE] [--trw-theta0=PROB] [--trw-theta1=PROB]
        [--no-titles] [--no-columns]
        [--column-separator=CHAR] [{--delimited | --delimited=CHAR}]
        [--integer-ips] [--model-fields] [--scandb]
        [--threads=THREADS] [--queue-depth=DEPTH]
        [--verbose-progress=CIDR] [--verbose-flows] [FILES...]


DESCRIPTION

rwscan performs scan detection analysis on SiLK flow records. Input data can come from an input pipe, or can be read from the files listed on the command line. Input data should be pre-sorted with rwsort(1) by sip, proto, and dip.


OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--scan-model=MODEL
Select a specific scan detection model. If not specified, the default value for MODEL is 0. See the METHOD OF OPERATION section for more details.
 0 
Use the Threshold Random Walk (TRW) and Bayesian Logistic Regression (BLR) scan detection models in series.

 1 
Use only the TRW scan detection model.

 2 
Use only the BLR scan detection model.

--output-path=OUTFILE
Specify the output file that scan records will be written to. If not specified, the scan records are written to scans.dat.

--trw-sip-set=SETFILE
Specify an IPset file containing all valid internal IP addresses. This parameter is required when using the TRW scan detection model, since the TRW model requires the list of targeted IPs (i.e., the IPs to detect the scanning activity to). This switch is ignored when the TRW model is not used.

--trw-theta0=PROB
Set the theta_0 parameter for the TRW scan model to PROB, which must be a floating point number between 0 and 1. theta_0 is defined as the probability that a connection succeeds given the hypothesis that the remote source is benign (not a scanner.) The default value for this option is 0.8. This option should only be used by experts familiar with the TRW algorithm.

--trw-theta1=PROB
Set the theta_1 parameter for the TRW scan model to PROB, which must be a floating point number between 0 and 1. theta_1 is defined as the probability that a connection succeeds given the hypothesis that the remote source is malicious (a scanner.) The default value for this option is 0.2. This option should only be used by experts familiar with the TRW algorithm.

--no-titles
Turns off column titles. By default, titles are printed.

--no-columns
Disable fixed-width columnar output.

--column-separator=C
Use specified character between columns. When this switch is not specified, the default of '|' is used.

--delimited
--delimited=C
Run as if --no-columns --column-sep=C had been specified. That is, disable fixed-width column output; if character C is provided, it is used as the delimiter between columns instead of the default '|'.

--model-fields
Show scan model detail fields. This switch controls whether additional informational fields about the scan detection models are printed.

--integer-ips
Print IP addresses as decimal integers instead of the more human-readable dotted quad notation.

--scandb
Produce output suitable for loading into a database. Sample database schema are given below under EXAMPLES. This option is equivalent to --no-titles --no-columns --model-fields --integer-ips.

--threads=THREADS
Specify the number of worker threads to create for scan detection processing. By default, one thread will be used, changing this number to match the number of available CPUs will often yield a large performance improvement.

--queue-depth=DEPTH
Specify the depth of the work queue. The default is to make the work queue the same size as the number of worker threads, but this can be changed. Normally, the default is fine.

--verbose-progress=CIDR
Report progress as rwscan processes input data. The CIDR argument should be an integer that corresponds to the netblock size of each line of progress. For example, --verbose-progress=8 would print a progress message for each /8 network processed.

--verbose-flows
This flag will print very verbose information for each flow, and is primarily useful for debugging.


METHOD OF OPERATION

rwscan's default behavior is to consult two scan detection models to determine whether a source is a scanner. The primary model used is the Threshold Random Walk (TRW) model. The TRW algorithm takes advantage of the tendency of scanners to attempt to contact a large number of IPs that do not exist on the target network.

By keeping track of the number of ``hits'' (successful connections) and ``misses'' (attempts to connect to IP addresses that are not active on the target network), scanners can be detected quickly and with a high degree of accuracy. Sequential hypothesis testing is used to analyze the probability that a source is a scanner as each flow record is processed. Once the scan probability exceeds a configured maximum, the source is flagged as a scanner, and no further analysis of traffic from that host is necessary.

The TRW model is not 100% accurate, however, and only finds scans in TCP flow data. In the case where the TRW model is inconclusive, a secondary model called BLR is invoked. BLR stands for ``Bayesian Logistic Regression.'' Unlike TRW, the BLR approach must analyze all traffic from a given source IP to determine whether that IP is a scanner.

Because of this, BLR operates much slower than TRW. However, the BLR model has been shown to detect scans that are not detected by the TRW model, particularly scans in UDP and ICMP data, and vertical TCP scans which focus on finding services on a single host. It does this by calculating metrics from the flow data from each source, and using those metrics to arrive at an overall likelihood that the flow data represents scanning activity.

The metrics BLR uses for detecting scans in TCP flow data are:

The metrics BLR uses for detecting scans in UDP flow data are:

The metrics BLR uses for detecting scans in ICMP flow data are:

Because the TRW model has a lower false positive rate than the BLR model, any source identified as a scanner by TRW will be identified as a scanner by the hybrid model without consulting BLR. BLR is only invoked in the following cases:

In situations where the use of one model is preferred, the other model can be disabled using the --scan-model switch. This may have an impact on the performance and/or accuracy of the system.


LIMITATIONS

rwscan detects scans in IPv4 flows only.


EXAMPLES

Basic usage requires only input and output file arguments:

  $ rwscan -o scans.dat data.rw

Typically, though, data will be piped into rwscan from rwfilter(1) and rwsort(1), e.g.:

  $ rwfilter --start=2004/12/29:00 --type=in,inweb --all-dest=stdout \
        | rwsort --fields=sip,proto,dip \
        | rwscan --trw-sip-set=sip.set --scan-model=0 \
            --output-path=scans.dat

rwcsan's --scandb output is suitable for loading into a database of scans. Here are schemas for such databases in PostgreSQL, Oracle, and MySQL.

Sample Schema for PostgreSQL

  CREATE DATABASE scans
  CREATE SCHEMA scans
  CREATE SEQUENCE scans_id_seq
  CREATE TABLE scans (
    id          BIGINT      NOT NULL    DEFAULT nextval('scans_id_seq'),
    sip         BIGINT      NOT NULL,
    proto       SMALLINT    NOT NULL,
    stime       TIMESTAMP without time zone NOT NULL,
    etime       TIMESTAMP without time zone NOT NULL,
    flows       BIGINT      NOT NULL,
    packets     BIGINT      NOT NULL,
    bytes       BIGINT      NOT NULL,
    scan_model  INTEGER     NOT NULL,
    scan_prob   FLOAT       NOT NULL,
    PRIMARY KEY (id)
  )
  CREATE INDEX scans_stime_idx ON scans (stime)
  CREATE INDEX scans_etime_idx ON scans (etime)
  ;

A database user should be created for the purposes of populating the scan database, e.g.:

    CREATE USER rwscan WITH PASSWORD 'secret';
    GRANT ALL PRIVILEGES ON DATABASE scans TO rwscan;

Additionally, a user with read-only access should be created for use by the rwscanquery(1) tool:

    CREATE USER rwscanquery WITH PASSWORD 'secret';
    GRANT SELECT ON DATABASE scans TO rwscanquery;

Importing Scan Data into PostgreSQL

To import rwscan's --scandb output into a PostgreSQL database, use a command similar to the following:

    cat /tmp/scans.import.dat |
    psql -c \
        "COPY scans \
            (sip, proto, stime, etime, \
            flows, packets, bytes, \
            scan_model, scan_prob) \
        FROM stdin DELIMITER as '|'" scans

Sample Schema for Oracle

  CREATE TABLE scans (
    id          integer unsigned    not null unique,
    sip         integer unsigned    not null,
    proto       tinyint unsigned    not null,
    stime       datetime            not null,
    etime       datetime            not null,
    flows       integer unsigned    not null,
    packets     integer unsigned    not null,
    bytes       integer unsigned    not null,
    scan_model  integer unsigned    not null,
    scan_prob   float unsigned      not null,
    primary key (id)
  );

Sample Schema for MySQL

  CREATE TABLE scans (
    id          integer unsigned    not null auto_increment,
    sip         integer unsigned    not null,
    proto       tinyint unsigned    not null,
    stime       datetime            not null,
    etime       datetime            not null,
    flows       integer unsigned    not null,
    packets     integer unsigned    not null,
    bytes       integer unsigned    not null,
    scan_model  integer unsigned    not null,
    scan_prob   float unsigned      not null,
    primary key (id),
    INDEX (stime),
    INDEX (etime)
  ) TYPE=InnoDB;


SEE ALSO

rwfilter(1), rwsort(1), rwset(1), rwscanquery(1)


BUGS

When used in an IPv6 environment, rwscan will attempt to convert any IPv6 addresses to IPv4. Records that can be converted will be processed, all other records will be silently ignored.