NAME

rwflowappend - Append incremental SiLK Flow files to hourly files

SYNOPSIS

rwflowappend --incoming-directory=DIR_PATH --root-directory=DIR_PATH
      --error-directory=DIR_PATH [--archive-directory=DIR_PATH]
      [--flat-archive] [--post-command=COMMAND]
      [--hour-file-command=COMMAND] [--threads=N]
      [--reject-hours-past=NUM] [--reject-hours-future=NUM]
      [--no-file-locking] [--polling-interval=NUM]
      [--byte-order=ENDIAN] [--pad-header]
      [--compression-method=COMP_METHOD]
      [--site-config-file=FILENAME]
      { --log-destination=DESTINATION
        | --log-pathname=FILE_PATH
        | --log-directory=DIR_PATH [--log-basename=LOG_BASENAME]
          [--log-post-rotate=COMMAND] }
      [--log-level=LEVEL] [--log-sysfacility=NUMBER]
      [--pidfile=FILE_PATH] [--no-chdir] [--no-daemon]

rwflowappend --help

rwflowappend --version

DESCRIPTION

rwflowappend is a daemon that watches a directory for files that contain small numbers of SiLK Flow records---these files are called incremental files---as generated by rwflowpack(8) when it is run with --output-mode=incremental-files or --output-mode=sending. rwflowappend appends these SiLK Flow records to the hourly files stored in the SiLK data repository whose directory tree root is specified by the --root-directory switch.

The directory that rwflowappend watches for incremental files is specified by --incoming-directory. As rwflowappend scans this directory, it ignores a file if its size is 0 bytes or if its name begins with a dot (.). On each scan, if rwflowappend detects a file name that was not present in the previous scan, it records the name and size of the file. If the file has a different size on the next scan, the new size is recorded. Once the file has the same size on two consecutive scans, rwflowappend appends the file to the appropriate hourly file.

After rwflowappend processes an incremental file, the file is deleted unless the --archive-directory switch is specified, in which case the incremental file is moved to that directory or to a subdirectory of that directory depending on whether --flat-archive was specified. The --post-command switch allows a command to be executed on the incremental file after it has been moved to the archive directory.

If a fatal write error occurs (for example, the disk containing the data repository becomes full), rwflowappend exits. Before exiting, rwflowappend attempts to truncate the hourly file to the size it had when it was opened, and rwflowappend moves the incremental file it was reading to the directory specified by --error-directory.

Running rwflowappend separately from rwflowpack is used when you wish to copy the packed SiLK Flow records from the machine doing the packing to multiple machines for use by analysts. Almost any network file transport protocol may be used to move the files from the packing machine to the destination machine where rwflowappend is running, though we have written the rwsender(8) and rwreceiver(8) to perform this task.

Separate rwflowpack and rwflowappend processes are also recommended if you want another process (such as the Analysis Pipeline http://tools.netsa.cert.org/analysis-pipeline/) to process the SiLK Flow records as they are generated.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

General Configuration

The following switches are required:

--incoming-directory=DIR_PATH

Periodically scan the directory DIR_PATH for incremental files to append to the hourly files. As rwflowappend scans DIR_PATH, it ignores a file if its name begins with a dot (.) or if its size is 0 bytes. When a file is first detected, its size is recorded, and the file must have the same size for two consecutive scans before rwflowappend will append it to the appropriate hourly file. The interval between scans is set by --polling-interval. DIR_PATH must be a complete directory path.

--root-directory=DIR_PATH

Append to existing hourly files and create new hourly files in the directory tree rooted at this location. The directory tree has the same subdirectory structure as that created by rwflowpack. DIR_PATH must be a complete directory path.

--error-directory=DIR_PATH

Store in this directory incremental files that were NOT successfully appended to an hourly file. DIR_PATH must be a complete directory path.

The following switches are optional:

--archive-directory=DIR_PATH

Move each incremental file to DIR_PATH or a subdirectory of it after rwflowappend has successfully appended the incremental file to an hourly file. If this switch is not provided, the incremental files are deleted once they are successfully appended to an hourly file. When the --flat-archive switch is also provided, incremental files are moved into the top of DIR_PATH; when --flat-archive is not given, each incremental file is moved to a subdirectory of DIR_PATH that mirrors the path of the hourly file to which the incremental file was appended. Removing files from the archive-directory is not the job of rwflowappend; the system administrator should implement a separate process to clean this directory. This switch is required when the --post-command switch is present.

--flat-archive

When archiving incremental files via --archive-directory, move the files into the top of the archive-directory, not into subdirectories of it. This switch has no effect if --archive-directory is not also specified. This switch may be used to allow another process to watch for new files appearing in the archive-directory.

--post-command=COMMAND

Run COMMAND on each incremental file after rwflowappend has successfully appended it to an hourly file and moved it into the archive-directory. Each occurrence of the string %s in COMMAND is replaced with the full path to the incremental file in the archive-directory, and each occurrence of %% is replaced with %. If any other character follows %, rwflowappend exits with an error. When using this feature, the --archive-directory must be specified. The exit status of COMMAND is ignored. See also the rwpollexec(8) daemon.

--hour-file-command=COMMAND

Run COMMAND upon creation of a new hourly file. The string %s in COMMAND is replaced with the full path to the hourly file, and the string %% is replaced with %. If any other character follows %, rwflowappend exits with an error. The exit status of COMMAND is ignored.

--threads=N

Invoke rwflowappend with N threads reading the incremental files and writing to the repository. When this switch is not provided, rwflowappend runs with a single thread. Since SiLK 3.8.2.

--reject-hours-past=NUM

Reject incremental files containing records whose starting hour occurs more than this number of hours in the past relative to the current hour. Incremental files that violate this value are moved into the error directory. Times are compared using the starting hour of the flow record and the current hour. For example, flow records that start at 18:02:56 and 18:58:04 are considered 1 hour in the past whether the current time is 19:01:47 or 19:59:33. When performing live data collection, it is not uncommon to get flows one to two hours in the past due to the flow generator's active timeout (often 30 minutes) and the time to transfer the flow records through the collection system. The default is to accept all incremental files.

--reject-hours-future=NUM

Similar to --reject-hours-past, but reject incremental files containing records whose starting hour occurs more than this number of hours in the future relative to the current hour. Future dated flow records are rare, but can occur due to time drift at the sensor. The default is to accept all incremental files.

--no-file-locking

Do not use advisory write locks. Normally, rwflowappend obtains a write lock on an hourly file prior to writing records to it. The write lock prevents two instances of rwflowappend from writing to the same hourly file simultaneously. However, attempting to use a write lock on some file systems causes rwflowappend to exit with an error, and this switch can be use when writing data to these file systems.

--polling-interval=NUM

Check the incoming directory for new incremental files every NUM seconds. The default polling interval is 15 seconds.

--byte-order=ENDIAN

Set the byte order for newly created SiLK Flow files. When appending records to an existing file, the byte order of the file is maintained. The argument is one of the following:

as-is

Maintain the byte order of the incremental files (i.e., the byte order specified to rwflowpack). This is the default.

native

Use the byte order of the machine where rwflowappend is running.

big

Use network byte order (big endian) for the flow files.

little

Write the flow files in little endian format.

--compression-method=COMP_METHOD

Specify the compression library to use when creating new hourly files. When this switch is not given, newly created hourly files maintain the compression method used by the incremental file (i.e., the compression method specified to rwflowpack). When appending to an existing hourly file, the compression method of the file is maintained. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method. use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwflowappend searches for the site configuration file in the locations specified in the "FILES" section.

Logging and Daemon Configuration

One of the following mutually-exclusive switches is required:

--log-destination=DESTINATION

Specify the destination where logging messages are written. When DESTINATION begins with a slash /, it is treated as a file system path and all log messages are written to that file; there is no log rotation. When DESTINATION does not begin with /, it must be one of the following strings:

none

Messages are not written anywhere.

stdout

Messages are written to the standard output.

stderr

Messages are written to the standard error.

syslog

Messages are written using the syslog(3) facility.

both

Messages are written to the syslog facility and to the standard error (this option is not available on all platforms).

--log-directory=DIR_PATH

Use DIR_PATH as the directory where the log files are written. DIR_PATH must be a complete directory path. The log files have the form

DIR_PATH/LOG_BASENAME-YYYYMMDD.log

where YYYYMMDD is the current date and LOG_BASENAME is the application name or the value passed to the --log-basename switch when provided. The log files are rotated: At midnight local time, a new log is opened, the previous file is closed, and the command specified by --log-post-rotate is invoked on the previous day's log file. (Old log files are not removed by rwflowappend; the administrator should use another tool to remove them.) When this switch is provided, a process-ID file (PID) is also written in this directory unless the --pidfile switch is provided.

--log-pathname=FILE_PATH

Use FILE_PATH as the complete path to the log file. The log file is not rotated.

The following set of switches is optional:

--log-level=LEVEL

Set the severity of messages that are logged. The levels from most severe to least are: emerg, alert, crit, err, warning, notice, info, debug. The default is info.

--log-sysfacility=NUMBER

Set the facility that syslog(3) uses for logging messages. This switch takes a number as an argument. The default is a value that corresponds to LOG_USER on the system where rwflowappend is running. This switch produces an error unless --log-destination=syslog is specified.

--log-basename=LOG_BASENAME

Use LOG_BASENAME in place of the application name in the name of log files in the log directory. See the description of the --log-directory switch. This switch does not affect the name of the process-ID file.

--log-post-rotate=COMMAND

Run COMMAND on the previous day's log file after log rotation. When this switch is not specified, the previous day's log file is compressed with gzip(1). When the switch is specified and COMMAND is the empty string, no action is taken on the log file. Each occurrence of the string %s in COMMAND is replaced with the full path to the log file, and each occurrence of %% is replaced with %. If any other character follows %, rwflowappend exits with an error. Specifying this switch without also using --log-directory is an error.

--pidfile=FILE_PATH

Set the complete path to the file in which rwflowappend writes its process ID (PID) when it is running as a daemon. No PID file is written when --no-daemon is given. When this switch is not present, no PID file is written unless the --log-directory switch is specified, in which case the PID is written to LOGPATH/rwflowappend.pid.

--no-chdir

Do not change directory to the root directory. When rwflowappend becomes a daemon process, it changes its current directory to the root directory so as to avoid potentially running on a mounted file system. Specifying --no-chdir prevents this behavior, which may be useful during debugging. The application does not change its directory when --no-daemon is given.

--no-daemon

Force rwflowappend to run in the foreground---it does not become a daemon process. This may be useful during debugging.

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

ENVIRONMENT

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwflowappend may use this environment variable. See the "FILES" section for details.

FILES

${SILK_CONFIG_FILE}
ROOT_DIRECTORY/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/share/silk/silk.conf
/usr/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided, where ROOT_DIRECTORY/ is the directory specified to the --root-directory switch.

SEE ALSO

rwflowpack(8), rwreceiver(8), rwsender(8), rwpollexec(8), rwfilter(1), silk(7), gzip(1), syslog(3), zlib(3), The SiLK Installation Handbook

NOTES

rwflowappend does not check the integrity of an hourly file before appending records to it.

Prior to SiLK 3.6.0 when a write error occurred, rwflowappend could leave a partially written record or compressed block in the hourly file. If a partially written compressed block remained and additional compressed blocks were appended, these compressed blocks could not be read by other SiLK tools. If a partially written record remained and additional records were appended, SiLK tools would read the unaligned data as if it were aligned and produce garbage records. Although SiLK 3.6.0 works around the issue on write errors, similar issues can occur if rwflowappend is suddenly killed (e.g., by kill -9).

When a write error occurs, rwflowappend may leave a zero byte file in the data repository. Such files do affect the exit status of rwfilter(1), though rwfilter warns about being unable to read the header from the file.

As of SiLK 3.1.0, rwflowappend obtains an advisory write lock on the hourly file it is writing, allowing multiple rwflowappend processes to write to the same hourly file. File locking may be disabled by using the --no-file-locking switch. If this switch is enabled, the administrator must ensure that multiple rwflowappend processes do not attempt to write to the same hourly file simultaneously.