Introduction
------------

The RAVE 1.9.0 release is delivered with an experimental set of
RAVE-enabled wrappers for commonly used SiLK rw tools. These functions
are presently under active development, and will become more capable
with time, but they are delivered with this release to allow
experimentation with RAVE-managed workflows using SiLK tools.

The module is installed by default with RAVE 1.9.0, and requires the
SiLK analysis tools to be present in order to run. See the Silk
Installation Handbook, sections 4 and 5, for details on installing the
SiLK analysis tools.

Simple Usage
------------

A Python script using these wrappers must import the functions from the
SiLK modules using the following statement:

from silk import *

Generally speaking, the interface to an rw tool through these wrappers
uses identical arguments to the rw tool itself, using Python syntax
instead of UNIX command-line syntax. For example:

rwfilter --start-date=2007/03/03:02 --end-date=2007/03/03:04 \
         --protocol=6,17 --dport=53 --type=out --pass-destination=stdout

becomes

rwdata_dns = rwfilter(start_date="2007/03/03:02", end_date="2007/03/03:04",
                      protocol="6,17", dport="53", type="out")

The pass-destination argument is implicit; the return value contains a
reference to the output data.

The output data can then be fed into another rw command; for example:

rwdata_dns_10 = rwfilter(rwdata_dns, saddress="10/8")
rwdata_dns_10_counts = rwcount(rwdata_dns_10)

Unnamed arguments are assumed to be inputs, and should appear in the
argument list before named arguments.  Flags (SiLK command-line
options that take no arguments) should be given the value True.

Textual data can be retrieved using the get_data method on the return value:

print rwdata_dns_10_counts.get_data()

Coverage
--------
The following rw tools are presently wrapped:

rwcat
rwuniq
rwtotal
rwcount
rwcut
rwsort
rwfilter

The RAVE 1.9.0 release of these tools caches the output data in the
RAVE cache, but does no special work to help data be re-used between
different queries.  Future releases will be smarter about rwfilter data
retrieval in order to increase cache utilization.

rwfilter with Multiple Outputs
------------------------------

In order to work with multiple outputs from rwfilter in Python, you
must tell the rwfilter function which outputs you wish to receive by
using the 'rwfilter_output' argument.  For example, the following
command line and Python are loosely equivalent:

rwfilter --start-date=2007/03/03:02 --end-date=2007/03/03:04 \
         --protocol=6,17 --dport=53 --type=out --pass-destination=a \
         --fail-destination=b

(a, b) = rwfilter(start_date="2007/03/03:02", end_date="2007/03/03:04",
                  protocol="6,17", dport="53", type="out",
                  rwfilter_output=PASS_FAIL)

The following values are supported for rwfilter_output:

    PASS
    FAIL
    ALL
    PASS_FAIL
    PASS_ALL
    FAIL_ALL
    PASS_FAIL_ALL

Working with Datasets
--------------------- 

All of the tools that output textual data (rwuniq, rwtotal, rwcount,
and rwcut) are handled specially.  Instead of the raw text output
being read into Python, the data is read into a special Dataset object
to allow for easier manipulation.  Here are some simple things you can
do using a dataset:

print data            - Outputs the dataset in human-readable columnar format.
data[0]               - The 0th (first) row in the dataset as a dictionary.
data['foo']           - The column of the dataset named 'foo'
data[0:10]            - The first 10 rows of the dataset.
len(data)             - The number of rows in the dataset.
data.sort_col('foo')  - Returns the dataset sorted by column 'foo' (supports
                        cmp, key, and reverse arguments like list.sort.)
data.columns          - The text column names

So in order to output the top ten protocols by byte count, you could
use the following commands:

a = rwfilter(..., proto='0-255')
b = rwtotal(a, proto=True)
result = b.get_data()
result = result.sort_col('bytes', reverse=True)
print result[0:10]

Specific Warnings
-----------------

All error-checking is at the moment very primitive.  There are certain
options (such as --print-volume-statistics) that are not handled well,
and no warnings are given by the functions when these options are
used.

The rwcut tool should be used sparingly, and at the very last stage of
analysis.  Reading a large number of flows into a python dataset will
take quite some time, and is very inefficient.  Cut down your data to
the smallest possible number of flows before using rwcut.

As a final reminder: This is an experimental API.  The performance is
not yet up to snuff, nor is the error checking.  Nothing important
should be built using this API yet.
