pipeline - Examine SiLK Flow, YAF, or IPFIX records as they arrive
There are 4 possible data sources: SiLK, YAF, IPFIX, or a configuration file with all of the details.
There are 4 possible input modes, 3 of which run continuously and will be run as a daemon by default: UDP or TCP socket (which require --break-on-recs), polling a directory for new files. The last is a finite list of files to process, which is never run as a daemon.
Allowable combinations: SiLK with directory polling or named files. YAF with UDP or TCP sockets or named files. IPFIX with UDP or TCP sockets, directory polling, or named files.
A data source configuration file contains all necessary details of both the data source and the input method.
There are 4 general input modes for pipeline, each of which can be run with snarf and without snarf.
To run pipeline when built with snarf, a snarf destination can be specified with: --snarf-destination=ENDPOINT.
To run pipeline when built without snarf, alert log files must be specified with: --alert-log-file=FILE_PATH --aux-alert-file=FILE_PATH
In the examples below, substitute the above alerting configurations in place of "ALERT CONFIGURATION OPTIONS".
To run pipeline continuously but not as a daemon:
pipeline --configuration-file=FILE_PATH ALERT CONFIGURATION OPTIONS { --silk | --yaf | --ipfix } { --udp-port=NUMBER | --tcp-port=NUMBER | --incoming-directory=DIR_PATH --error-directory=DIR_PATH [--archive-directory=DIR_PATH] [--flat-archive] } [--break-on-recs=NUMBER] { [--time-is-clock] | [--time-field-name=STRING] | [--time-from-schema] | [--time-field-ent=NUMBER --time-field-id=NUMBER] } [--polling-interval=NUMBER] [--polling-timeout=NUMBER ] [--country-code-file=FILE_PATH] [--site-config-file=FILENAME] --do-not-daemonize
To run pipeline over a finite list of files:
pipeline --configuration-file=FILE_PATH ALERT CONFIGURATION OPTIONS { --silk | --yaf | --ipfix } --name-files [--break-on-recs=NUMBER] { [--time-is-clock] | [--time-field-name=STRING] | [--time-from-schema] | [--time-field-ent=NUMBER --time-field-id=NUMBER] } [--polling-interval=NUMBER] [--polling-timeout=NUMBER ] [--country-code-file=FILE_PATH] [--site-config-file=FILENAME]
To run pipeline using a configuration file specifying all data source and data input options. Daemonizing can be turned off it needed.
pipeline --configuration-file=FILE_PATH ALERT CONFIGURATION OPTIONS --data-source-configuration-file=FILE_PATH [--country-code-file=FILE_PATH] [--site-config-file=FILENAME] { --do-not-daemonize | { --log-destination=DESTINATION | --log-directory=DIR_PATH [--log-basename=BASENAME] | --log-pathname=FILE_PATH } [--log-level=LEVEL] [--log-sysfacility=NUMBER] [--pidfile=FILE_PATH] }
To run pipeline continuously as a daemon:
pipeline --configuration-file=FILE_PATH ALERT CONFIGURATION OPTIONS { --silk | --yaf | --ipfix } { --udp-port=NUMBER | --tcp-port=NUMBER | --incoming-directory=DIR_PATH --error-directory=DIR_PATH [--archive-directory=DIR_PATH] [--flat-archive] } [--break-on-recs=NUMBER] { [--time-is-clock] | [--time-field-name=STRING] | [--time-from-schema] | [--time-field-ent=NUMBER --time-field-id=NUMBER] } [--polling-interval=NUMBER] [--polling-timeout=NUMBER ] [--country-code-file=FILE_PATH] [--site-config-file=FILENAME] { --log-destination=DESTINATION | --log-directory=DIR_PATH [--log-basename=BASENAME] | --log-pathname=FILE_PATH } [--log-level=LEVEL] [--log-sysfacility=NUMBER] [--pidfile=FILE_PATH]
Help options:
pipeline --configuration-file=FILE_PATH --verify-configuration
pipeline --help
pipeline --version
The Analysis Pipeline program, pipeline, is designed to be run over three different types of input. The first, as in version 4.x, is files of SiLK Flow records as they are processed by the SiLK packing system. The second type is data coming directly out of YAF (or super_mediator) including deep packet inspection information. The last is any raw IPFIX records.
pipeline requires a configuration file that specifies filters and evaluations. The filter blocks determine which flow records are of interest (similar to SiLK's rwfilter(1) command). The evaluation blocks can compute aggregate information over the flow records (similar to rwuniq(1)) to determine whether the flow records should generate an alert. Information on the syntax of the configuration file is available in the Analysis Pipeline Handbook.
The output that pipeline produces depends on whether support for the snarf alerting library was compiled into the pipeline binary, as described in the next subsections.
Either form of output from pipeline includes country code information. To map the IP addresses to country codes, a SiLK prefix map file, country_codes.pmap must be available to pipeline. This file can be installed in SiLK's install tree, or its location can be specified with the SILK_COUNTRY_CODES environment variable or the --country-codes-file command line switch.
When pipeline is built with support for the snarf alerting
library (http://tools.netsa.cert.org/snarf/), the
--snarf-destination switch can be used to specify where to send the
alerts. The parameter to the switch takes the form
tcp://HOST:PORT
, which specifies that a snarfd process is
running on HOST at PORT. When --snarf-destination is not
specified, pipeline uses the value in the SNARF_ALERT_DESTINATION
environment variable. If it is not set, pipeline prints the alerts
encoded in JSON (JavaScript Object Notation). The outputs go to the
log file when running as a daemon, or to the standard output when the
--name-files switch is specified.
When snarf support is not built into pipeline, the output of
pipeline is a textual file in pipe-delimited (|
-delimited)
format describing which flow records raised an alert and the type of
alert that was raised. The location of the output file must be
specified via the --alert-log-file switch. The file is in a format
that a properly configured ArcSight Log File Flexconnector can use.
The pipeline.sdkfilereader.properties file in the
share/analysis-pipeline/ directory can be used to configure the
ArcSight Flexconnector to process the file.
pipeline can provide additional information about the alert in a separate file, called the auxiliary alert file. To use this feature, specify the complete path to the file in the --aux-alert-file switch. This option is required.
pipeline will assume that both the alert-log-file and the aux-alert-file are under control of the logrotate(8) daemon. See the Analysis Pipeline Handbook for details.
Normally pipeline is run as a daemon during SiLK's collection and packing process. pipeline runs on the flow records after they have been processed rwflowpack(8), since pipeline may need to use the class, type, and sensor data that rwflowpack assigns to each flow record.
pipeline should get a copy of each incremental file that rwflowpack generates. There are three places that pipeline can be inserted so it will see every incremental file:
rwsender(8)
rwreceiver(8)
rwflowappend(8)
We describe each of these in turn. If none of these daemons are in use at your site, you must modify how rwflowpack runs, which is also described below.
To use pipeline with the rwsender in SiLK 2.2 or later, specify a --local-directory argument to rwsender, and have pipeline use that directory as its incoming-directory, for example:
rwsender ... --local-directory=/var/silk/pipeline/incoming ...
pipeline ... --incoming-directory=/var/silk/pipeline/incoming ...
When pipeline is running on a dedicated machine separate from the machine where rwflowpack is running, one can use a dedicated rwreceiver to receive the incremental files from an rwsender running on the machine where rwflowpack is running. In this case, the incoming-directory for pipeline will be the destination-directory for rwreceiver. For example:
rwreceiver ... --destination-dir=/var/silk/pipeline/incoming ...
pipeline ... --incoming-directory=/var/silk/pipeline/incoming ...
When pipeline is running on a machine where an rwreceiver (version 2.2. or newer) is already running, one can specify an additional --duplicate-destination directory to rwreceiver, and have pipeline use that directory as its incoming directory. For example:
rwreceiver ... --duplicate-dest=/var/silk/pipeline/incoming ...
pipeline ... --incoming-directory=/var/silk/pipeline/incoming ...
One way to use pipeline with rwflowappend is to have rwflowappend store incremental files into an archive-directory, and have pipeline process those files. However, since rwflowappend stores the incremental files in subdirectories under the archive-directory, you must specify a --post-command to rwflowappend to move (or copy) the files into another directory where pipeline can process them. For example:
rwflowappend ... --archive-dir=/var/silk/rwflowappend/archive --post-command='mv %s /var/silk/pipeline/incoming' ...
pipeline ... --incoming-directory=/var/silk/pipeline/incoming ...
Note: Newer versions of rwflowappend support a --flat-archive switch, which places the files into the root of the archive-directory. For this situation, make the archive-directory of rwflowappend the incoming-directory of pipeline:
rwflowappend ... --archive-dir=/var/silk/pipeline/incoming
pipeline ... --incoming-directory=/var/silk/pipeline/incoming ...
If none of the above daemons are in use at your site because rwflowpack writes files directly into the data repository, you must modify how rwflowpack runs so it uses a temporary directory that rwflowappend monitors, and you can then insert pipeline after rwflowappend has processed the files.
Assuming your current configuration for rwflowpack is:
rwflowpack --sensor-conf=/var/silk/rwflowpack/sensor.conf --log-directory=/var/silk/rwflowpack/log --root-directory=/data
You can modify it as follows:
rwflowpack --sensor-conf=/var/silk/rwflowpack/sensor.conf --log-directory=/var/silk/rwflowpack/log --output-mode=sending --incremental-dir=/var/silk/rwflowpack/incremental --sender-dir=/var/silk/rwflowappend/incoming
rwflowappend --root-directory=/data --log-directory=/var/silk/rwflowappend/log --incoming-dir=/var/silk/rwflowappend/incoming --error-dir=/var/silk/rwflowappend/error --archive-dir=/var/silk/rwflowappend/archive --post-command='mv %s /var/silk/pipeline/incoming' ...
pipeline --silk --incoming-directory=/var/silk/pipeline/incoming --error-directory=/var/silk/pipeline/error --log-directory=/var/silk/pipeline/log --configuration-file=/var/silk/pipeline/pipeline.conf
There are two ways to run pipeline in non-daemon mode. The first is to run it using one of the ways above that runs forever (socket or directory polling) but just not run it as a daemon. use --do-not-daemonize to keep the process is the foreground.
The other way is to run pipeline over files whose names are specified on the command line. In this mode, pipeline stays in the foreground, processes the files, and exits. None of the files specified on the command line are changed in any way---they are neither moved nor deleted. To run pipeline in this mode, specify the --name-files switch and the names of the files to process.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
These switches affect general configuration of pipeline. The first two switches are required:
Give the path to the configuration file that specifies the filters that determine which flow records are of interest and the evaluations that signify when an alert is to be raised. This switch is required.
Use the designated country code prefix mapping file instead of the default.
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK_CONFIG_FILE environment variable is used if that variable is not empty. The value of SILK_CONFIG_FILE should include the name of the file. Otherwise, the application looks for a file named silk.conf in the following directories: the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/; and the share/silk/ and share/ directories parallel to the application's directory.
pipeline comes with a public suffix file provided by Mozilla at: https://publicsuffix.org/list/public_suffix_list.dat To provide pipeline with a different list, use this option to provide a file. The file must be formatted the same way as Mozilla's file. This is optional.
The number of integer minutes between pipeline logging statistics regarding records processed and memory usage. Setting this value to 0 turns off this feature. This is optional and the default value is 5 minutes.
pipeline needs to know what general type of data it will be receiving, SiLK flows, YAF data, or raw IPFIX. If there are multiple data sources, a data source configuration file is required. If using a daemon config file, the data source configuration file variable is required.
If there is a single data source, the data source type can be specified on the command line. Depending on the type of data, there are different available options for receiving data.
The records are SiLK flows. The data input method options are the same as in past versions --incoming-directory=DIR_PATH Pipeline will poll a direcory forever for new flow files --name-files The list of files pipeline will process are listed on the command line as the last group of arguments
The records are coming directly from a YAF sensor (or from an instance of super_mediator). The data input options are: --udp-port=NUMBER and --break-on-recs=NUMBER UDP socket to listen for YAF data on, and how many records to process before breaking and running evaluations. --tcp-port=NUMBER and --break-on-recs=NUMBER TCP socket to listen for YAF data on, and how many records to process before breaking and running evaluations. --name-files Process YAF data files listed on the command line.
The records are raw IPFIX records, not coming directly from YAF. The data input options are: --udp-port=NUMBER and --break-on-recs=NUMBER UDP socket to listen for YAF data on, and how many records to process before breaking and running evaluations. --tcp-port=NUMBER and --break-on-recs=NUMBER TCP socket to listen for YAF data on, and how many records to process before breaking and running evaluations. --name-files Process YAF data files listed on the command line. --incoming-directory=DIR_PATH Pipeline will poll a direcory forever for new flow files
The data source and input options are detailed in a configuration file. The sytnax for the file can be referenced by the Pipeline Handbook.
If the primary (or only) data source is SiLK, these options are not used. If it is a SiLK data source, flow end time is still used for timing source.
Otherwise, one of these options is required to provide a timing source.
Use the system clock time as the timing source
Use the provided field name as the timing source.
These must be used together, as it takes an enterprise ID and an element ID to define an information element. This element will be used as the timing source.
Use the timing source specified by the schema. If no timing source is specified by the schema(s) used, pipeline will report an error.
Versions 4.x only worked on SiLK files, which provided an easy way to know when to stop processing/filtering records and run evaluations. When accepting a stream of records from a socket, there is no break, so pipeline needs to know how many records to process/filter before running evaluations. Use this option to tell pipeline how many records to process. This option is required for socket connections.
When pipeline is built with support for snarf (http://tools.netsa.cert.org/snarf/), the following switch is available. Its use is optional.
Specify where pipeline is to send alerts. The ENDPOINT has the
form tcp://HOST:PORT
, which specifies that a snarfd
process is running on HOST at PORT. When this switch is not
specified, pipeline uses the value in the SNARF_ALERT_DESTINATION
environment variable. If that variable is not set, pipeline prints
the alerts locally, either to the log file (when running as a daemon),
or to the standard output.
When pipeline is built without support for snarf, the following switches are available, and the --alert-log-file switch is required.
Specify the path to the file where pipeline will write the alert records. The full path to the log file must be specified. pipeline assumes that this file will be under control of the logrotate(8) command.
Have pipeline provide additional information about an alert to FILE_PATH. When a record causes an alert, pipeline writes the record in textual format to the alert-log-file. Often there is additional information associated with an alert that cannot be captured in a single record; this is especially true for statistic-type alerts. The aux-alert-file is a location for pipeline to write that additional information. The FILE_PATH must be an absolute path, and pipeline assumes that this file will be under control of the logrotate(8) command.
The following switches are used when pipeline is run as a daemon. They may not be mixed with the switches related to Processing Existing Files described below. The first two switches are required, and at least one switch related to logging is required.
Watch this directory for new SiLK Flow files that are to be processed
by pipeline. pipeline ignores any files in this directory whose
names begin with a dot (.
). In addition, new files will only be
considered when their size is constant for one polling-interval after
they are first noticed.
Sets the interval in seconds for how often pipeline checks for new files if polling a direcory using --incoming-directory
Sets the amount of time in seconds pipeline will wait for a new file when polling a directory using --incoming-directory
Listen on a UDP port for YAF or IPFIX records, not SiLK records. pipeline will reestablish this connection if the sender closes the socket, unless --do-not-reestablish is used.
Listen on a TCP port for YAF or IPFIX records, not SiLK records. pipeline will reestablish this connection if the sender closes the socket, unless --do-not-reestablish is used.
Store in this directory SiLK files that were NOT successfully processed by pipeline.
One of the following mutually-exclusive logging-related switches is required:
Specify the destination where logging messages are written. When
DESTINATION begins with a slash /
, it is treated as a file
system path and all log messages are written to that file; there is no
log rotation. When DESTINATION does not begin with /
, it must
be one of the following strings:
none
Messages are not written anywhere.
stdout
Messages are written to the standard output.
stderr
Messages are written to the standard error.
syslog
Messages are written using the syslog(3) facility.
both
Messages are written to the syslog facility and to the standard error (this option is not available on all platforms).
Use DIR_PATH as the directory where the log files are written. DIR_PATH must be a complete directory path. The log files have the form
DIR_PATH/LOG_BASENAME-YYYYMMDD.log
where YYYYMMDD is the current date and LOG_BASENAME is the application name or the value passed to the --log-basename switch when provided. The log files will be rotated: at midnight local time a new log will be opened and the previous day's log file will be compressed using gzip(1). (Old log files are not removed by pipeline; the administrator should use another tool to remove them.) When this switch is provided, a process-ID file (PID) will also be written in this directory unless the --pidfile switch is provided.
Use FILE_PATH as the complete path to the log file. The log file will not be rotated.
The following switches are optional:
Move incoming SiLK Flow files that pipeline processes successfully into the directory DIR_PATH. DIR_PATH must be a complete directory path. When this switch is not provided, the SiLK Flow files are deleted once they have been successfully processed. When the --flat-archive switch is also provided, incoming files are moved into the top of DIR_PATH; when --flat-archive is not given, each file is moved to a subdirectory based on the current local time: DIR_PATH/YEAR/MONTH/DAY/HOUR/. Removing files from the archive-directory is not the job of pipeline; the system administrator should implement a separate process to clean this directory.
When archiving incoming SiLK Flow files via the --archive-directory switch, move the files into the top of the archive-directory, not into subdirectories of the archive-directory. This switch has no effect if --archive-directory is not also specified. This switch can be used to allow another process to watch for new files appearing in the archive-directory.
Configure pipeline to check the incoming directory for new files every NUM seconds. The default polling interval is 15 seconds.
Set the severity of messages that will be logged. The levels from
most severe to least are: emerg
, alert
, crit
, err
,
warning
, notice
, info
, debug
. The default is info
.
Set the facility that syslog(3) uses for logging messages. This
switch takes a number as an argument. The default is a value that
corresponds to LOG_USER
on the system where pipeline is
running. This switch produces an error unless
--log-destination=syslog is specified.
Use LOG_BASENAME in place of the application name for the files in the log directory. See the description of the --log-directory switch.
Set the complete path to the file in which pipeline writes its process ID (PID) when it is running as a daemon. No PID file is written when --do-not-daemonize is given. When this switch is not present, no PID file is written unless the --log-directory switch is specified, in which case the PID is written to LOGPATH/pipeline.pid.
Force pipeline to stay in the foreground---it does not become a daemon. Useful for debugging.
Cause pipeline to run its analysis over a specific set of files named on the command line. Once pipeline has processed those files, it exits. This switch cannot be mixed with the Daemon Mode and Logging and Daemon Configuration switches described above. When using files named on the command line, pipeline will not move or delete the files.
Verify that the syntax of the configuration file is correct and then exit pipeline. If the file is incorrect or if it does not define any evaluations, an error message is printed and pipeline exits abnormally. If the file is correct, pipeline simply exits with status 0.
Print the information elements available based on the schemas that arrive. When using any data source other than SiLK flows, this feature requires data to arrive such that templates/schemas can be read and information elements made available. This option will not verify your configuration file.
Print the information elements available based on the schemas that arrive, and verify the syntax of the configuration file. When using any data source other than SiLK flows, this feature requires data to arrive such that templates/schemas can be read and information elements made available.
Print the available options and exit.
Print the version number and information about how the SiLK library used by pipeline was configured, then exit the application.
This environment variable is used as the value for the --site-config-file when that switch is not provided.
This environment variable allows the user to specify the country code mapping file that pipeline will use. The value may be a complete path or a file relative to the SILK_PATH. If the variable is not specified, the code looks for a file named country_codes.pmap in the location specified by SILK_PATH.
This environment variable gives the root of the install tree. As part of its search for the SiLK site configuration file, pipeline checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share. To find the country code prefix map file, pipeline checks those same directories for a file named country_codes.pmap.
When pipeline is built with snarf support ( http://tools.netsa.cert.org/snarf/), this environment variable specifies the location to send the alerts. The --snarf-destination switch has precedence over this variable.
silk(7), rwflowappend(8), rwflowpack(8), rwreceiver(8), rwsender(8), rwfilter(1), rwuniq(1), syslog(3), logrotate(8), http://tools.netsa.cert.org/snarf, Analysis Pipeline Handbook, The SiLK Installation Handbook