pipeline - Examine SiLK Flow records as they arrive

To run as a daemon:

  pipeline --configuration-file=FILE_PATH --alert-log-file=FILE_PATH
        --incoming-directory=DIR_PATH --error-directory=DIR_PATH
        [--archive-directory=DIR_PATH] [--flat-archive]
        [--aux-alert-file=FILE_PATH | --aux-alert-basename=BASENAME]
        [--polling-interval=NUMBER] [--country-code-file=FILE_PATH]
        [--integer-sensors] [--site-config-file=FILENAME]
        { --log-destination=DESTINATION
          | --log-directory=DIR_PATH [--log-basename=BASENAME]
          | --log-pathname=FILE_PATH }
        [--log-level=LEVEL] [--log-sysfacility=NUMBER]
        [--pidfile=FILE_PATH] [--no-daemon]

To run over specific files:

  pipeline --configuration-file=FILE_PATH --alert-log-file=FILE_PATH
        --name-files [--aux-alert-file=FILENAME]
        [--country-code-file=FILE_PATH] [--integer-sensors]
        [--site-config-file=FILENAME] SILK_FILE [SILK_FILE ...]

Help options:

  pipeline --configuration-file=FILE_PATH --verify-configuration
  pipeline --help
  pipeline --version

The Analysis Pipeline program, pipeline, is designed to be run over files of SiLK Flow records as they are processed by the SiLK packing system.

pipeline requires a configuration file that specifies filters and evaluations. The filter blocks determine which flow records are of interest (similar to SiLK's rwfilter(1) command). The evaluation blocks can compute aggregate information over the flow records (similar to rwuniq(1)) to determine whether the flow records should generate an alert. Information on the syntax of the configuration file is available in a separate document.

The output of pipeline is a textual file in pipe-delimited (|-delimited) format describing which flow records raised an alert and the type of alert that was raised. The location of the output file must be specified via the --alert-log-file switch. The file is in a format that a properly configured ArcSight Log File Flexconnector can use. The pipeline.sdkfilereader.properties file in the share/analysis-pipeline/ directory can be used to configure the ArcSight Flexconnector to process the file.

pipeline can provide information about the alert in a separate file, called the auxiliary alert file. To use this feature, specify either --aux-alert-file or --aux-alert-basename. The --aux-alert-file switch specifies that all auxiliary information go to the named file for the entire duration of pipeline's execution. The --aux-alert-basename switch causes pipeline to create the file LOG_DIRECTORY/BASENAME-DATE.log, where the file is rotated daily at midnight local time.

The output from pipeline includes country code information. To map the IP addresses to country codes, a SiLK prefix map file, country_codes.pmap must be available to pipeline. This file can be installed in SiLK's install tree, or its location can be specified with the SILK_COUNTRY_CODES environment variable or the --country-codes-file command line switch.

Normally pipeline is run as a daemon during SiLK's collection and packing process. pipeline runs on the flow records after they have been processed rwflowpack(8), since pipeline may need to use the class, type, and sensor data that rwflowpack assigns to each flow record.

pipeline should get a copy of each incremental file that rwflowpack generates. There are three places that pipeline can be inserted so it will see every incremental file:

We describe each of these in turn. If none of these daemons are in use at your site, you must modify how rwflowpack runs, which is also described below.

To use pipeline with the rwsender in SiLK 2.2 or later, specify a --local-directory argument to rwsender, and have pipeline use that directory as its incoming-directory, for example:

 rwsender ... --local-directory=/var/silk/pipeline/incoming ...
 pipeline ... --incoming-directory=/var/silk/pipeline/incoming ...

When pipeline is running on a dedicated machine separate from the machine where rwflowpack is running, one can use a dedicated rwreceiver to receive the incremental files from an rwsender running on the machine where rwflowpack is running. In this case, the incoming-directory for pipeline will be the destination-directory for rwreceiver. For example:

 rwreceiver ... --destination-dir=/var/silk/pipeline/incoming ...
 pipeline ... --incoming-directory=/var/silk/pipeline/incoming ...

When pipeline is running on a machine where an rwreceiver (version 2.2. or newer) is already running, one can specify an additional --duplicate-destination directory to rwreceiver, and have pipeline use that directory as its incoming directory. For example:

 rwreceiver ... --duplicate-dest=/var/silk/pipeline/incoming ...
 pipeline ... --incoming-directory=/var/silk/pipeline/incoming ...

One way to use pipeline with rwflowappend is to have rwflowappend store incremental files into an archive-directory, and have pipeline process those files. This is not as straightforward as it would first seem, however. Since rwflowappend stores the incremental files in subdirectories under the archive-directory, you must specify a --post-command to rwflowappend to move (or copy) the files into another directory where pipeline can process them. For example:

 rwflowappend ... --archive-dir=/var/silk/rwflowappend/archive
       --post-command='mv %s /var/silk/pipeline/incoming' ...
 pipeline ... --incoming-directory=/var/silk/pipeline/incoming ...

Note: Newer versions of rwflowappend support a --flat-archive switch, which places the files into the root of the archive-directory. For this situation, make the archive-directory of rwflowappend the incoming-directory of pipeline:

 rwflowappend ... --archive-dir=/var/silk/pipeline/incoming
 pipeline ... --incoming-directory=/var/silk/pipeline/incoming ...

If none of the above daemons are in use at your site because rwflowpack writes files directly into the data repository, you must modify how rwflowpack runs so it uses a temporary directory that rwflowappend monitors, and you can then insert pipeline after rwflowappend has processed the files.

Assuming your current configuration for rwflowpack is:

 rwflowpack --sensor-conf=/var/silk/rwflowpack/sensor.conf
       --log-directory=/var/silk/rwflowpack/log
       --root-directory=/data

You can modify it as follows:

 rwflowpack --sensor-conf=/var/silk/rwflowpack/sensor.conf
       --log-directory=/var/silk/rwflowpack/log
       --output-mode=sending
       --incremental-dir=/var/silk/rwflowpack/incremental
       --sender-dir=/var/silk/rwflowappend/incoming
 rwflowappend --root-directory=/data
       --log-directory=/var/silk/rwflowappend/log
       --incoming-dir=/var/silk/rwflowappend/incoming
       --error-dir=/var/silk/rwflowappend/error
       --archive-dir=/var/silk/rwflowappend/archive
       --post-command='mv %s /var/silk/pipeline/incoming' ...
 pipeline --incoming-directory=/var/silk/pipeline/incoming
       --error-directory=/var/silk/pipeline/error
       --log-directory=/var/silk/pipeline/log
       --configuration-file=/var/silk/pipeline/pipeline.conf

It is possible to run pipeline over files whose names are specified on the command line. In this mode, pipeline stays in the foreground, processes the files, and exits. None of the files specified on the command line are changed in any way---they are neither moved nor deleted. To run pipeline in this mode, specify the --name-files switch and the names of the files to process.

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

These switches affect general configuration of pipeline. The first two switches are required:

--configuration-file=FILE_PATH

Give the path to the configuration file that specifies the filters that determine which flow records are of interest and the evaluations that signify when an alert is to be raised. This switch is required.

--alert-log-file=FILE_PATH

Specify the path to the file where pipeline will write the alert records. The full path to the log file must be specified. pipeline assumes that this file will be under control of the logrotate(8) command.

--country-codes-file=FILE_PATH

Use the designated country code prefix mapping file instead of the default.

--integer-sensors

In the alert log, print the integer ID of the sensor rather than its name.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK_CONFIG_FILE environment variable is used if that variable is not empty. The value of SILK_CONFIG_FILE should include the name of the file. Otherwise, the application looks for a file named silk.conf in the following directories: the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/; and the share/silk/ and share/ directories parallel to the application's directory.

The following switches are used when pipeline is run as a daemon. They may not be mixed with the switches related to Processing Existing Files described below. The first two switches are required, and at least one switch related to logging is required.

--incoming-directory=DIR_PATH

Watch this directory for new SiLK Flow files that are to be processed by pipeline. pipeline ignores any files in this directory whose names begin with a dot (.). In addition, new files will only be considered when their size is constant for one polling-interval after they are first noticed.

--error-directory=DIR_PATH

Store in this directory SiLK files that were NOT successfully processed by pipeline.

One of the following mutually-exclusive logging-related switches is required:

--log-destination=DESTINATION

Specify the destination where logging messages are written. When DESTINATION begins with a slash /, it is treated as a file system path and all log messages are written to that file; there is no log rotation. When DESTINATION does not begin with /, it must be one of the following strings:

none

Messages are not written anywhere.

stdout

Messages are written to the standard output.

stderr

Messages are written to the standard error.

syslog

Messages are written using the syslog(3) facility.

both

Messages are written to the syslog facility and to the standard error (this option is not available on all platforms).

--log-directory=DIR_PATH

Use DIR_PATH as the directory where the log files are written. DIR_PATH must be a complete directory path. The log files have the form

  DIR_PATH/LOG_BASENAME-YYYYMMDD.log

where YYYYMMDD is the current date and LOG_BASENAME is the application name or the value passed to the --log-basename switch when provided. The log files will be rotated: at midnight local time a new log will be opened and the previous day's log file will be compressed using gzip(1). (Old log files are not removed by pipeline; the administrator should use another tool to remove them.) When this switch is provided, a process-ID file (PID) will also be written in this directory unless the --pidfile switch is provided.

--log-pathname=FILE_PATH

Use FILE_PATH as the complete path to the log file. The log file will not be rotated.

The following switches are optional:

--archive-directory=DIR_PATH

Move incoming SiLK Flow files that pipeline processes successfully into the directory DIR_PATH. DIR_PATH must be a complete directory path. When this switch is not provided, the SiLK Flow files are deleted once they have been successfully processed. When the --flat-archive switch is also provided, incoming files are moved into the top of DIR_PATH; when --flat-archive is not given, each file is moved to a subdirectory based on the current local time: DIR_PATH/YEAR/MONTH/DAY/HOUR/. Removing files from the archive-directory is not the job of pipeline; the system administrator should implement a separate process to clean this directory.

--flat-archive

When archiving incoming SiLK Flow files via the --archive-directory switch, move the files into the top of the archive-directory, not into subdirectories of the archive-directory. This switch has no effect if --archive-directory is not also specified. This switch can be used to allow another process to watch for new files appearing in the archive-directory.

--aux-alert-file=FILE_PATH

Have pipeline provide additional information about an alert to FILE_PATH. When a record causes an alert, pipeline writes the record in textual format to the alert-log-file. Often there is additional information associated with an alert that cannot be captured in a single record; this is especially true for statistic-type alerts. The aux-alert-file is a location for pipeline to write that additional information. When pipeline is running as a daemon, the FILE_PATH must be an absolute path. See also --aux-alert-basename.

--aux-alert-basename=BASENAME

Have pipeline provide additional information about an alert to the file BASENAME-DATE.log in the log-directory. This switch is similar to --aux-alert-file, except the file is written in the log directory and rotated at midnight local time. This switch is only valid in daemon mode, it is incompatible with --aux-alert-file, and it requires the --log-directory switch to be specified.

--polling-interval=NUM

Configure pipeline to check the incoming directory for new files every NUM seconds. The default polling interval is 15 seconds.

--log-level=LEVEL

Set the severity of messages that will be logged. The levels from most severe to least are: emerg, alert, crit, err, warning, notice, info, debug. The default is info.

--log-sysfacility=NUMBER

Set the facility that syslog(3) uses for logging messages. This switch takes a number as an argument. The default is a value that corresponds to LOG_USER on the system where pipeline is running. This switch produces an error unless --log-destination=syslog is specified.

--log-basename=LOG_BASENAME

Use LOG_BASENAME in place of the application name for the files in the log directory. See the description of the --log-directory switch.

--pidfile=FILE_PATH

Set the complete path to the file in which pipeline writes its process ID (PID) when it is running as a daemon. No PID file is written when --no-daemon is given. When this switch is not present, no PID file is written unless the --log-directory switch is specified, in which case the PID is written to LOGPATH/pipeline.pid.

--no-daemon

Force pipeline to stay in the foreground---it does not become a daemon. Useful for debugging.

--name-files

Cause pipeline to run its analysis over a specific set of files named on the command line. Once pipeline has processed those files, it exits. This switch cannot be mixed with the Daemon Mode and Logging and Daemon Configuration switches described above. When using files named on the command line, pipeline will not move or delete the files.

--verify-configuration

Verify that the syntax of the configuration file is correct and then exit pipeline. If the file is incorrect or if it does not define any evaluations, an error message is printed and pipeline exits abnormally. If the file is correct, pipeline simply exits with status 0.

--help

Print the available options and exit.

--version

Print the version number and information about how the SiLK library used by pipeline was configured, then exit the application.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_COUNTRY_CODES

This environment variable allows the user to specify the country code mapping file that pipeline will use. The value may be a complete path or a file relative to the SILK_PATH. If the variable is not specified, the code looks for a file named country_codes.pmap in the location specified by SILK_PATH.

SILK_PATH

This environment variable gives the root of the install tree. As part of its search for the SiLK site configuration file, pipeline checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share. To find the country code prefix map file, pipeline checks those same directories for a file named country_codes.pmap.

silk(7), rwflowappend(8), rwflowpack(8), rwreceiver(8), rwsender(8), rwfilter(1), rwuniq(1), syslog(3), Analysis Pipeline Handbook, The SiLK Installation Handbook