rwflowpack - Collects flow data and stores it in binary SiLK Flow files
rwflowpack [--input-mode=MODE] INPUT_MODE_SPECIFIC_SWITCHES
[--output-mode=MODE] OUTPUT_MODE_SPECIFIC_SWITCHES
{ --log-destination=DESTINATION
| --log-directory=DIR_PATH [--log-basename=BASENAME]
| --log-pathname=FILE_PATH }
[--no-file-locking] [--flush-timeout=VAL]
[--file-cache-size=VAL] [--pack-interfaces]
[--byte-order=ENDIAN] [--compression-method=COMP_METHOD]
[--error-directory=DIR_PATH] [--archive-directory=DIR_PATH]
[--flat-archive] [--post-archive-command=COMMAND]
[--site-config-file=FILENAME] [--log-level=LEVEL]
[--log-sysfacility=NUMBER] [--pidfile=FILE_PATH] [--no-daemon]
To collect flow data over the network or directory polling (default):
rwflowpack [--input-mode=stream] --sensor-configuration=FILE_PATH
[--packing-logic=PLUGIN] [--sensor-name=SENSOR]
[--polling-interval=NUMBER] ...
To collect from a file containing NetFlow v5 PDUs:
rwflowpack --input-mode=pdufile --netflow-file=FILE_PATH
--sensor-configuration=FILE_PATH [--packing-logic=PLUGIN]
[--sensor-name=SENSOR] ...
To collect from local files containing flows created by flowcap(8):
rwflowpack --input-mode=fcfiles --incoming-directory=DIR_PATH
--sensor-configuration=FILE_PATH [--packing-logic=PLUGIN]
[--polling-interval=NUMBER] ...
To respool SiLK Flow without modifying the class, type, or sensor:
rwflowpack --input-mode=respool --incoming-directory=DIR_PATH
[--polling-interval=NUMBER] ...
To store the SiLK Flow files on the local machine (default):
rwflowpack ... [--output-mode=local-storage]
--root-directory=DIR_PATH ...
To forward the SiLK Flow files to a remote machine:
rwflowpack ... --output-mode=sending --sender-directory=DIR_PATH
--incremental-directory=DIR_PATH ...
Help options:
rwflowpack --sensor-configuration=FILE_PATH [--packing-logic=PLUGIN]
{ --verify-sensor-config | --verify-sensor-config=VERBOSE }
rwflowpack --help
rwflowpack --version
rwflowpack is a daemon that runs as part of the SiLK flow collection and packing tool-chain. The primary job of rwflowpack is to convert each incoming flow record to the SiLK Flow format, categorize each incoming flow record (e.g., as incoming or outgoing), set the sensor value for the record, and determine which hourly file will ultimately store the record.
(Processing of IPFIX is only available when SiLK is compiled with support for libfixbuf, which is available from http://tools.netsa.cert.org/.)
The settings that rwflowpack uses to categorize each flow record are determined by a configuration file and rules that are referred to as packing logic. The --sensor-configuration switch specifies the configuration file to use. The syntax of that file is described in the sensor.conf(5) manual page. The packing logic used by rwflowpack is usually loaded as a run-time plug-in. The path to this plug-in may be specified using the --packing-logic switch. When that switch is not provided, rwflowpack uses the default value specified in the silk.conf(5) file. (The --packing-logic switch will not be available if the packing logic to use was compiled into rwflowpack.)
The value into which rwflowpack categorizes each flow record is called the flowtype or the class/type pair. See the sensor.conf(5) manual page or the SiLK Installation Handbook for an explanation of how SiLK categorizes flows and converts data to the SiLK format.
There are several ways to input data to rwflowpack; the method to use is determined by the --input-mode switch.
The default value for --input-mode is stream, which causes
rwflowpack to accept NetFlow v5 or IPFIX (Internet Protocol Flow
Information eXport) data from a network socket and/or to poll one or
more directories, where each directory may contain files of NetFlow v5
records, IPFIX records, or SiLK flow records. Which type(s) of input
rwflowpack should accept are determined by the sensor.conf(5)
file.
Instead of having rwflowpack listen for flow records, you may
configure your site to use the flowcap(8) daemon to collect NetFlow
v5 or IPFIX data and write the data to files. To have rwflowpack
process the files created by flowcap, specify the --input-mode
as fcfiles. Typically, flowcap and rwflowpack run on
separate machines, and the rwsender(8) and rwreceiver(8) daemons
are used to move the files between the machines.
Setting the --input-mode to pdufile tells rwflowpack to read
a single file containing NetFlow v5 PDU records and then exit.
When the --input-mode is respool, rwflowpack polls a
directory for SiLK flow files and it uses the existing class/type pair
and sensor values to determine where to store the flow record. That
is, rwflowpack will put the data into appropriate hourly file, but
it does not change any other settings on the flow records. Since no
categorization occurs in respool mode, the --sensor-configuration
and --packing-logic switches are not required, and their use will
cause rwflowpack to exit with an error code.
Once rwflowpack has determined the sensor, class/type, and starting hour for a flow record, it uses those three values to set the hourly file that will contain that record. Each data file that rwflowpack creates will have a unique value for the sensor, class/type, and staring hour. Where rwflowpack creates these file is determined by the --output-mode switch:
By default, rwflowpack writes each record directly to the hourly
files in the final data repository. This is called local-storage
mode.
When the --output-mode is sending, rwflowpack creates
temporary files and relies on the rwflowappend(8) daemon to combine
the temporary files into the hourly files in the data repository.
Specifically, rwflowpack writes the flow records into files called
incremental files. Periodically (two minutes by default),
rwflowpack closes all the incremental files and moves them to a
directory where another daemon can process them. Typically,
rwflowpack and rwflowappend run on different machines, and an
rwsender(8)/rwreceiver(8) pair moves the files between the
machines.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
rwflowpack supports multiple ways of getting data (the input mode) and storing data (the output mode).
Determine how rwflowpack will gather data. The default input
MODE is stream. The available modes are
streamrwflowpack uses the probes in the sensor configuration file that specify a network port, a UNIX domain socket, or a polling directory. For these probes, rwflowpack opens the ports and/or begins processing data files in the named directories.
fcfilesrwflowpack polls a local directory for files disk that were created by the flowcap(8) daemon. Typically flowcap runs on a separate machine near a router or other flow meter that is generating NetFlow v5 or IPFIX records. flowcap collects the records, compresses them, and stores them on its local disk. For the fcfiles input mode, the files are moved between the flowcap and rwflowpack machines by separate programs, typically the rwsender(8) and rwreceiver(8) daemons. In this mode, rwflowpack ignores the probe definitions in the sensor configuration file.
pdufilerwflowpack reads NetFlow v5 PDUs from a file, where the file's format is that created by NetFlow Collector: The file's size must be an integer multiple of 1464, where each 1464 byte chunk contains a 24 byte NetFlow v5 header and space for thirty 48 byte NetFlow records. The number of valid records per chunk is specified in the header.
respoolDetermines what rwflowpack will do with the data as it is packed
into SiLK binary files. The default output MODE is
local-storage. The available modes are
local-storagerwflowpack writes the data on the local machine into a directory tree with a specific structure.
sendingrwflowpack writes the data into a temporary location on the local disk. A separate program, rwsender(8), moves the data from the local machine to remote machines where rwreceiver(8) working in concert with the rwflowappend(8) will write the data into a directory tree with a specific structure.
When the --input-mode switch is set to stream or when the switch
is not provided, rwflowpack expects to do its own collection of
flow records. In this mode, rwflowpack gets flow records from
multiple sources. Specifically, rwflowpack processes every
probe listed in the sensor.conf(5) configuration file, and
rwflowpack will
listen on a network socket for NetFlow records
listen on the network for IPFIX (Internet Protocol Flow Information eXport) records
poll a directory for files containing NetFlow v5 PDUs (see the description of the pdufile input-mode for the required format of these files)
poll a directory for files containing IPFIX records (as generated by yaf(1))
poll a directory for files containing SiLK Flow records
This input mode accepts the following switches; the --sensor-configuration switch is required, and all other switches are optional.
Give the path to the configuration file that rwflowpack will consult to determine whether a record represents an incoming or outgoing flow. The complete syntax of the configuration file is described in the sensor.conf(5) manual page; see also the SiLK Installation Handbook.
Specify the plug-in that rwflowpack should load, where the plug-in
provides functions that determine into which class and type each flow
record will be categorized and the format of the files that
rwflowpack will write. When SiLK has been configured with
hard-coded packing logic (i.e., when --enable-packing-logic was
specified to the configure script), this switch will not be present
on rwflowpack. A default value for this switch may be specified in
the silk.conf(5) site configuration file (see the description of
the --site-config-file switch). When PLUGIN contains a slash
(/), rwflowpack assumes the path to PLUGIN is correct.
Otherwise, rwflowpack will attempt to find the file in
$SILK_PATH/lib/silk, $SILK_PATH/share/lib, $SILK_PATH/lib,
and in these directories parallel to the application's directory:
lib/silk, share/lib, and lib. If rwflowpack does not find
the file, it assumes the plug-in is in the current directory. To
force rwflowpack to look in the current directory first, specify
--packing-logic=./PLUGIN. When the SILK_PLUGIN_DEBUG
environment variable is non-empty, rwflowpack prints status
messages to the standard error as it tries to open the plug-in.
Cause rwflowpack to ignore all probes in the sensor configuration file except the probes for SENSOR. Only data for this sensor will be collected. This allows a common configuration file to be used by multiple rwflowpack invocations, yet also allow each rwflowpack instance only collect data for a single sensor. There must be a sensor definition for SENSOR in the configuration file. When this switch is not present, rwflowpack will collect and pack data for all sensors.
Specify the number of seconds rwflowpack will wait between queries
of the poll-directorys. This defaults to 15 seconds.
When the --input-mode=fcfiles switch is provided, rwflowpack will process files created by another SiLK daemon called flowcap(8). Typically flowcap runs near a router or other flow meter that is generating NetFlow v5 or IPFIX records. flowcap collects the records, compresses them, and stores them on the local disk. These files are transferred between the flowcap machine and rwflowpack machine by external programs (typically the rwsender(8) and rwreceiver(8) daemons). rwflowpack polls a local directory for these files, and then processes the files to generate SiLK Flow files.
In fcfiles mode, rwflowpack ignores the probe definitions in the sensor.conf file since flowcap labeled the files with probe where the flows were collected. rwflowpack will use the sensor definitions in sensor.conf.
When operating in fcfiles input mode, the --sensor-configuration and --incoming-directory switches are required.
Give the path to the configuration file that rwflowpack will consult to determine whether a record represents an incoming or outgoing flow. The complete syntax of the configuration file is described in the sensor.conf(5) manual page; see also the SiLK Installation Handbook.
Name the full path of the directory which rwflowpack will monitor for files created by flowcap. Once processed by rwflowpack, files are moved from this directory to the archive-directory, if it has been specified.
Specify the plug-in that rwflowpack should load for the packing logic. For more detail, see the description above.
Specify the number of seconds rwflowpack will wait between polls of the incoming-directory for new files created by flowcap. If not given, the default value is 15 seconds.
In this mode, rwflowpack processes a single file of NetFlow v5 data. Typically these files are generated by a NetFlow collector. rwflowpack will not become a daemon in this mode; instead it will remain in the foreground, process the NetFlow file, and exit.
The NetFlow v5 file has a particular format: The file's length should be an integer multiple of 1464 bytes, where 1464 is the maximum length of the NetFlow v5 PDU. Each 1464 block should contain the 24-byte NetFlow v5 header and space for 30 48-byte flow records, even if data for only one NetFlow record is valid.
The --sensor-configuration switch is required. The
--netflow-file switch is required in this mode; it specifies the
NetFlow file to process. Any value specified in the read-from-file
command in the sensor.conf file is ignored; the value is typically
set to /dev/null. The --sensor-name switch is also required in
pdufile mode unless the sensor.conf file contains a single
sensor.
The following switches are available in PDU File mode:
Give the path to the configuration file that rwflowpack will consult to determine whether a record represents an incoming or outgoing flow.
Name the full path of the file from which rwflowpack reads NetFlow v5 PDUs. This switch is required in PDU File mode.
Cause rwflowpack to ignore all probes in the sensor configuration file except the probes for SENSOR. There must be a sensor definition for SENSOR in the configuration file. This switch is required in this mode unless the sensor.conf file only defines a single sensor.
Specify the plug-in that rwflowpack should load for the packing logic. For more detail, see the description of this switch in the stream input-mode.
When the --input-mode=respool switch is provided, rwflowpack polls a directory for SiLK Flow files, and writes the records it finds into new hourly files, leaving the sensor and class/type values unchanged in the records. (In --input-mode=stream, rwflowpack can poll a directory for files of SiLK Flow records, but in that mode the records are re-categorized which can change the sensor and class/type values, whereas --input-mode=respool keeps the sensor and class/type the same.)
The first of the following switches is required:
Name the full path of the directory which rwflowpack will monitor for SiLK Flow files. Once processed by rwflowpack, files are moved from this directory to the archive-directory, if it has been specified.
Specify the number of seconds rwflowpack will wait between polls of the incoming-directory. If not given, the default value is 15 seconds.
Once rwflowpack has collected data, categorized it, and written it into files, it can do one of two things with the files:
Store the files on the local disk in a well-defined location.
Transfer the files to another machine and store them in a well defined location (see sending mode below).
(The data files must be stored in a well-defined location so that rwfilter(1) can find them. To see rwfilter's idea of the well-defined location, run rwfilter --version.)
The default output-mode is to store the files on the local disk (i.e., local-storage). When operating in this mode, the following switch is required:
Name the full path of the directory under which the files containing the packed SiLK Flow records will be stored. rwflowpack will create subdirectories below DIR_PATH based on the data received.
To transfer the packed SiLK Flow files to another machine, specify the --output-mode=sending switch and invoke the rwsender(8) to transfer the files. When rwflowpack is used with rwsender, the following three switches must be provided:
Name the full path of the directory under which packed SiLK files will initially be created. Files in this directory are considered to be incomplete; any files in this directory will be removed when rwflowpack is started. Once complete, files are moved from this directory to the sender-directory.
Name the full path of the directory under which completed
incremental files are stored while awaiting action by rwsender.
The rwsender is responsible for removing files from this directory.
The following switches are optional:
Do not use advisory write locks. Normally, rwflowpack will attempt to obtain a write lock on the data files prior to writing records to them; these locks prevent two instances of rwflowpack from writing to the same data file. However, not all file systems support advisory write locks, and this switch must be used when writing data to such a file system.
Set the timeout for flushing any in-memory records to disk to VAL seconds. If not specified, the default is 2 minutes (120 seconds). When using local storage mode, this value specifies how often the files are flushed to disk to ensure that any records in memory are written to disk. When using sending output mode, this value specifies how often to close the files and moves them from the incremental-directory to the sender-directory.
Set the maximum number of data files to have open for writing at any one time to VAL. If not specified, the default is 128 files. This switch also determines how many files rwflowpack will read from simultaneously when using probes that poll directories for files (see sensor.conf(5)). The maximum number of input files open at any one time is limited to one eighth of VAL, and the number of directory polling operations to perform simultaneously is limited to one sixteenth of VAL.
Allow one to override the default file output format of the packed
SiLK Flow files that rwflowpack writes. When this switch is
present, rwflowpack writes additional information into the packed
files: the router's SNMP input and output interfaces and the next-hop
IP address. (When the sensor.conf file contains an
interface-values attribute whose value is vlan, the input and
output fields contain the vlan IDs instead of SNMP interface values.)
The extra data produced by this switch is useful for determining why
traffic is being stored in certain files. Note that this switch will
only affect newly created files. New records will always be appended
to an existing file in the file's current output format to maintain
file integrity.
Set the byte order for newly created SiLK Flow files. When appending records to an existing file, the byte order of the file is maintained. The argument is one of the following:
Set the compression method for newly created SiLK Flow files to COMP_METHOD. When appending records to an existing file, the compression method of the file is maintained.
In addition to the packing (shrinking) of the flow records that SiLK normally does, rwflowpack can use an external library to further reduce the size of the records on disk. The list of available compression methods and the default method are set when SiLK is compiled (the --help and --version switches print the available and default compression methods) and depend on which supported libraries are found. SiLK can support the following:
Do not compress the SiLK Flow records using an external library.
Use the zlib(3) library for compressing the flow records.
Use the lzo1x algorithm from the LZO real-time compression library for compressing the flow records.
Use whichever available method gives the best compression in
general, though not necessarily the best for this particular
file.
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK_CONFIG_FILE environment variable is used if that variable is not empty. The value of SILK_CONFIG_FILE should include the name of the file. Otherwise, the application looks for a file named silk.conf in the following directories: the root of the data directory specified in the --root-directory switch; the directory specified in the SILK_DATA_ROOTDIR environment variable (sending mode only); the data root directory that is compiled into SiLK (sending mode only); the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/; and the share/silk/ and share/ directories parallel to the application's directory.
The following switches determine how rwflowpack handles input files
once it has processed them. These switches have no effect when
rwflowpack reads all of its data directly from the network.
Otherwise, the switches affect the named --netflow-file in
pdufile mode, the files read from the --incoming-directory in
fcfiles mode, and files read from poll-directory probes
(c.f. sensor.conf(5)) in stream mode.
Move input files that cannot be opened, have an unexpected format, contain an unrecognized probe name (in fcfiles input mode), or are not successfully processed into the directory DIR_PATH. DIR_PATH must be a complete directory path. If this switch is not provided, problem files remain in place and cause rwflowpack to exit.
Move input files that rwflowpack processes successfully into the directory DIR_PATH. DIR_PATH must be a complete directory path. When this switch is not provided and the input mode is pdufile, the original NetFlow source file is not modified, moved, or deleted. In all other input modes, no --archive-directory switch causes rwflowpack to delete each input file after successfully processing it. When the --flat-archive switch is also provided, incoming files are moved into the top of DIR_PATH; when --flat-archive is not given, each file is moved to a subdirectory based on the current UTC time: DIR_PATH/YEAR/MONTH/DAY/HOUR/. Removing files from the archive-directory is not the job of rwflowpack; the system administrator should implement a separate process to clean this directory. This switch is required when the --post-archive-command switch is present.
When archiving input files via the --archive-directory switch, move the files into the top of the archive-directory, not into subdirectories of the archive-directory. This switch has no effect if --archive-directory is not also specified. This switch can be used to allow another process to watch for new files appearing in the archive-directory.
Run COMMAND on each input file after rwflowpack has successfully
processed the file and moved the file into the archive-directory.
Each occurrence of the string %s in COMMAND will be replaced
with the full path to the input file in the archive-directory. When
using this feature, the --archive-directory switch must be
specified.
One of the following mutually-exclusive switches is required:
Specify the destination where logging messages are written. When
DESTINATION begins with a slash /, it is treated as a file
system path and all log messages are written to that file; there is no
log rotation. When DESTINATION does not begin with /, it must
be one of the following strings:
noneMessages are not written anywhere.
stdoutMessages are written to the standard output.
stderrMessages are written to the standard error.
syslogMessages are written using the syslog(3) facility.
bothMessages are written to the syslog facility and to the standard error (this option is not available on all platforms).
Use DIR_PATH as the directory where the log files are written. DIR_PATH must be a complete directory path. The log files have the form
DIR_PATH/LOG_BASENAME-YYYYMMDD.log
where YYYYMMDD is the current date and LOG_BASENAME is the application name or the value passed to the --log-basename switch when provided. The log files will be rotated: at midnight local time a new log will be opened and the previous day's log file will be compressed using gzip(1). (Old log files are not removed by rwflowpack; the administrator should use another tool to remove them.) When this switch is provided, a process-ID file (PID) will also be written in this directory unless the --pidfile switch is provided.
Use FILE_PATH as the complete path to the log file. The log file will not be rotated.
The following set of switches is optional:
Set the severity of messages that will be logged. The levels from
most severe to least are: emerg, alert, crit, err,
warning, notice, info, debug. The default is info.
Set the facility that syslog(3) uses for logging messages. This
switch takes a number as an argument. The default is a value that
corresponds to LOG_USER on the system where rwflowpack is
running. This switch produces an error unless
--log-destination=syslog is specified.
Use LOG_BASENAME in place of the application name for the files in the log directory. See the description of the --log-directory switch.
Set the complete path to the file in which rwflowpack writes its process ID (PID) when it is running as a daemon. No PID file is written when --no-daemon is given. When this switch is not present, no PID file is written unless the --log-directory switch is specified, in which case the PID is written to LOGPATH/rwflowpack.pid.
Force rwflowpack to stay in the foreground---it does not become a daemon. Useful for debugging.
Verify that the syntax of the sensor configuration file is correct and
then exit rwflowpack. If the file is incorrect or if it does not
define any sensors, an error message is printed and rwflowpack
exits abnormally. If the file is correct and no argument is provided
to the --verify-sensor-config switch, rwflowpack simply exits
with status 0. If an argument (other than the empty string and 0)
is provided to the switch, the names of the probes and sensors found
in the sensor configuration file are printed to the standard output,
and then rwflowpack exits.
Print the available options and exit.
Print the version number and information about how SiLK was configured, then exit the application.
This environment variable is used as the value for the --site-config-file when that switch is not provided.
When the --site-config-file switch is not provided and the SILK_CONFIG_FILE environment variable is not set, rwset looks for the site configuration file in $SILK_DATA_ROOTDIR/silk.conf.
This environment variable gives the root of the install tree. As part of its search for the SiLK site configuration file, rwflowpack checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share.
When set to 1, rwflowpack print status messages to the standard error as it tries to open the packing logic plug-in.
The root of the directory tree that contains the packed, binary SiLK Flow files is set by the --root-directory switch; this directory is called the SILK_DATA_ROOTDIR. Immediately underneath it are subdirectories corresponding to the traffic categories (directions) discussed above. Under these are directories representing the year, month, and day in YYYY/MM/DD format. That is
$SILK_DATA_ROOTDIR/in/{$YEAR}/{$MONTH}/{$DAY}/*
$SILK_DATA_ROOTDIR/inweb/{$YEAR}/{$MONTH}/{$DAY}/*
$SILK_DATA_ROOTDIR/innull/{$YEAR}/{$MONTH}/{$DAY}/*
$SILK_DATA_ROOTDIR/out/{$YEAR}/{$MONTH}/{$DAY}/*
$SILK_DATA_ROOTDIR/outweb/{$YEAR}/{$MONTH}/{$DAY}/*
$SILK_DATA_ROOTDIR/outnull/{$YEAR}/{$MONTH}/{$DAY}/*
For example, output web files for October 4th, 2003 are recorded in $SILK_DATA_ROOTDIR/outweb/2003/10/04/
The names of the files in these directories include all of this information, and are written in the form:
flowType-sensorName_YYYYMMDD.HH
where flowType encodes the category and sensorName is the sensor on which the flow was collected.
SiLK Installation Handbook, sensor.conf(5), silk.conf(5), flowcap(8), rwfilter(1), rwflowappend(8), rwreceiver(8), rwsender(8), silk(7), syslog(3), cron(8)