NAME

rwrecgenerator - Generate random SiLK Flow records

SYNOPSIS

rwrecgenerator { --silk-output-path=PATH | --text-output-path=PATH
                 | { --output-directory=DIR_PATH
                     --processing-directory=DIR_PATH }}
      --log-destination=DESTINATION [--log-level=LEVEL]
      [--log-sysfacility=NUMBER] [--seed=SEED]
      [--start-time=START_DATETIME --end-time=END_DATETIME]
      [--time-step=MILLISECONDS] [--events-per-step=COUNT]
      [--num-subprocesses=COUNT] [--flush-timeout=MILLISEC]
      [--file-cache-size=SIZE] [--compression-method=COMP_METHOD]
      [--timestamp-format=FORMAT] [--epoch-time]
      [--ip-format=FORMAT] [--integer-ips] [--zero-pad-ips]
      [--integer-sensors] [--integer-tcp-flags] [--no-titles]
      [--no-columns] [--column-separator=CHAR]
      [--no-final-delimiter] [--delimited=[CHAR]]]
      [--site-config-file=FILENAME] [--sensor-prefix-map=FILE]
      [--flowtype-in=CLASS/TYPE] [--flowtype-inweb=CLASS/TYPE]
      [--flowtype-out=CLASS/TYPE] [--flowtype-outweb=CLASS/TYPE]

rwrecgenerator --help

rwrecgenerator --version

DESCRIPTION

rwrecgenerator uses pseudo-random numbers to generate events, where each consists of one or more SiLK Flow records. These flow records can written as a single binary file, as text (in either a columnar or a comma separated value format) similar to the output from rwcut(1), or as a directory of small binary files to mimic the incremental files produced by rwflowpack(8). The type of output to produce must be specified using the appropriate switches. Currently only one type of output may be produced in a single invocation.

rwrecgenerator works through a time window, where the starting and ending times for the window may be specified on the command line. When not specified, the window defaults to the previous hour. By default, rwrecgenerator will generate one event at the start time and one event at the end time. To modify the size of the steps rwrecgenerator takes across the window, specify the --time-step switch. The number of events to create at each step may be specified with the --events-per-step switch.

The time window specifies when the events begin. Since most events create multiple flow records with small time offsets between them (and some events may create flow records across multiple hours), flow records will exist that begin after the time window.

To generate a single SiLK flow file, specify its location with the --silk-output-path switch. A value of - will write the output to the standard output unless the standard output is connected to a terminal.

To produce textual output, specify --text-output-path. rwrecgenerator has numerous switches to control the appearance of the text; however, currently rwrecgenerator produces a fixed set of fields.

When creating incremental files, the --output-directory and --processing-directory switches are required. rwrecgenerator creates files in the processing directory, and moves the files to the output directory when the flush timeout arrives. The default flush timeout is 30,000 milliseconds (30 seconds); the user may modify the value with the --flush-timeout switch. Any files in the processing directory are removed when rwrecgenerator starts.

The --num-subprocesses switch tells rwrecgenerator to use multiple subprocesses when creating incremental files. When the switch is specified, rwrecgenerator will split the time window into multiple pieces and give each subprocess its own time window to create. The initial rwrecgenerator process then waits for the subprocesses to complete. When --num-subprocesses is specified, rwrecgenerator will create subdirectories under the --processing-directory, where each subprocess gets its own processing directory.

The --seed switch may be specified to provide a consistent set of flow records across multiple invocations. (Note that the names of the incremental files will differ across invocations since those names are created with the mkstemp(3) function.)

Given the same seed for the pseudo-random number generator and assuming the --num-subprocesses is not specified, the output from rwrecgenerator will contain the same data regardless of whether the output is written to a single SiLK flow file, a text file, or a series of incremental files.

When both --seed and --num-subprocesses is specified, the incremental files will contain the same flow records across invocations, but the flow records will not be consistent with those created by --silk-output-path or --text-output-path.

rwrecgenerator must have access to a silk.conf(5) site configuration file, either specified by the --site-config-file switch on the command line or specified by the typical methods.

The --flowtype-in, --flowtype-inweb, --flowtype-out, and --flowtype-outweb switches may be used to specify the flowtype (that is, the class/type pair) that rwrecgenerator uses for its flow records. When these switches are not specified, rwrecgenerator attempts to use the flowtypes defined in the silk.conf file for the twoway site. Specifically, it attempts to use "all/in", "all/inweb", "all/out", and "all/outweb", respectively.

Use of the --sensor-prefix-map switch is recommended. The argument should name a prefix map file that maps from an internal IP address to a sensor number. If the switch is not provided, all flow records will use the first sensor in the silk.conf file that is supported by the class specified by the flowtypes. When using the --sensor-prefix-map, make certain the sensors you choose are in the class specified in the --flowtype-* switches.

When using the --sensor-prefix-map switch and creating incremental files, it is recommended that you use the --file-cache-size switch to increase the size of the stream cache to be approximately 12 to 16 times the number of sensors. This will reduce the amount of time spent closing and reopening the files.

The --log-destination switch is required. Specify none to disable logging.

Currently, rwrecgenerator only supports generating IPv4 addresses. Addresses in 0.0.0.0/1 are considered internal, and addresses in 128.0.0.0/1 are considered external. All flow records are between an internal and an external address. Whether the internal addresses is the source or destination of the unidirectional flow record is determined randomly.

The types of flow records that rwrecgenerator creates are:

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

Output Switches

Exactly one of the following switches is required.

--silk-output-path=PATH

Tell rwrecgenerator to create a single binary file of SiLK flow records at the specified location. If PATH is -, the records are written to the standard output. rwrecgenerator does not support writing binary data to a terminal.

--output-directory=DIR_PATH

Name the directory into which the incremental files are written once the flush timeout is reached.

--text-output-path=PATH

Tell rwrecgenerator to convert the flow records it creates to text and to print the result in a format similar to that created by rwcut(1). The output is written to the specified location. If PATH is -, the records are written to the standard output.

Logging Switches

The --log-destination switch is required. Use a value of none to disable logging.

--log-destination=DESTINATION

Specify the destination where logging messages are written. When DESTINATION begins with a slash /, it is treated as a file system path and all log messages are written to that file; there is no log rotation. When DESTINATION does not begin with /, it must be one of the following strings:

none

Messages are not written anywhere.

stdout

Messages are written to the standard output.

stderr

Messages are written to the standard error.

syslog

Messages are written using the syslog(3) facility.

both

Messages are written to the syslog facility and to the standard error (this option is not available on all platforms).

--log-level=LEVEL

Set the severity of messages that will be logged. The levels from most severe to least are: emerg, alert, crit, err, warning, notice, info, debug. The default is info.

--log-sysfacility=NUMBER

Set the facility that syslog(3) uses for logging messages. This switch takes a number as an argument. The default is a value that corresponds to LOG_USER on the system where rwrecgenerator is running. This switch produces an error unless --log-destination=syslog is specified.

General Switches

The following are general purpose switches. None are required.

--seed=SEED

Seed the pseudo-random number generator with the value SEED. When not specified, rwrecgenerator creates its own seed. Specifying the seed allows different invocations of rwrecgenerator to produce the same output (assuming the same value is given for all switches and that the time window is specified).

--start-time=YYYY/MM/DD[:HH[:MM[:SS[.ssssss]]]]
--start-time=EPOCH_SECONDS_PLUS_MILLISECONDS

Specify the earliest date and time at which an event is started. The specified time must be given to at least day precision. Any parts of the date-time string that are not specified are set to 0. The switch also accepts UNIX epoch seconds with optional fractional seconds. When not specified, defaults to the beginning of the previous hour.

--end-time=YYYY/MM/DD[:HH[:MM[:SS[.ssssss]]]]
--end-time=EPOCH_SECONDS_PLUS_MILLISECONDS

Specify the latest date and time at which an event is started. This time does not specify the latest end-time for the flow records or even the latest start-time, since many events simulate a query/response pair, with the response following the query by a few milliseconds. The specified time must be given to at least day precision, and it must not be less than the start-time. Any parts of the date-time string that are not specified are set to 0. The switch also accepts UNIX epoch seconds with optional fractional seconds. When not specified, defaults to the end of the previous hour.

--time-step=MILLISECONDS

Move forward MILLISECONDS milliseconds at each step as rwrecgenerator moves through the time window. When not specified, defaults to the difference between the start-time and end-time; that is, rwrecgenerator will generate events at the start-time and then at the end-time. A MILLISECONDS value of 0 indicates rwrecgenerator should only create events at the start-time.

--events-per-step=COUNT

Create COUNT events at each time step. The default is 1.

--help

Print the available options and exit.

--version

Print the version number and information about how rwrecgenerator was configured, then exit the application.

Incremental Files Switches

The following switches are used when creating incremental files.

--processing-directory=DIR_PATH

Name the directory under the incremental files are initially created. Any files in this directory are removed when rwrecgenerator is started. When the flush timeout is reached, the files are closed and moved from this directory to the output-directory. If --num-subprocesses is specified, subdirectories are created under DIR_PATH, and each subprocess is given its own subdirectory.

--num-subprocesses=COUNT

Tell rwrecgenerator to create COUNT subprocesses to generate incremental files. This switch is ignored when incremental files are not being created. When this switch is specified, rwrecgenerator creates subdirectories below the processing directory. The default value for COUNT is 0.

--flush-timeout=MILLISECONDS

Set the timeout for flushing any in-memory records to disk to MILLISECONDS milliseconds. At this time, the incremental files are closed and the files are moved from the processing directory to the output directory. The timeout uses the internal time as rwrecgenerator moves through the time window. If not specified, the default is 30,000 milliseconds (30 seconds). This switch is ignored when incremental files are not being created.

--file-cache-size=SIZE

Set the maximum number of data files to have open for writing at any one time to SIZE. If not specified, the default is 32 files.

--compression-method=COMP_METHOD

Specify the compression library to use when writing binary output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, binary output is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the SiLK Flow records using an external library.

zlib

Use the zlib(3) library for compressing the flow records.

lzo1x

Use the lzo1x algorithm from the LZO real-time compression library for compressing the flow records.

snappy

Use the snappy library for compressing the flow records. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available.

Text File Switches

The following switches can be used when creating textual output.

--timestamp-format=FORMAT

When producing textual output, specify the format, timezone, and/or precision (representation of fractional seconds) to use when printing timestamps and the duration. When this switch is not specified, the SILK_TIMESTAMP_FORMAT environment variable is checked for a format, timezone, and precision. If it is empty or contains invalid values, timestamps are printed in the default format with milliseconds, and the timezone is UTC unless SiLK was compiled with local timezone support. FORMAT is a comma-separated list of a format, a timezone, and/or a precision in any order. The format is one of:

default

Print the timestamps as YYYY/MM/DDThh:mm:ss.sss.

iso

Print the timestamps as YYYY-MM-DD hh:mm:ss.sss.

m/d/y

Print the timestamps as MM/DD/YYYY hh:mm:ss.sss.

epoch

Print the timestamps as the number of seconds since 00:00:00 UTC on 1970-01-01.

The --timestamp-format switch may change the representation of fractional seconds, or precision, of the timestamp and duration fields from their default of milliseconds. Note: When using a precision less than that used by SiLK internally, the printed start time and duration may not equal the printed end time. The available precisions are:

no-frac

Truncate the fractional seconds value on the timestamps and on the duration field. Previously this was called no-msec. Since SiLK 3.23.0.

milli

Print the fractional seconds to 3 decimal places. Since SiLK 3.23.0.

micro

Print the fractional seconds to 6 decimal places. Since SiLK 3.23.0.

nano

Print the fractional seconds to 9 decimal places. Since SiLK 3.23.0.

no-msec

Truncate the fractional seconds value on the timestamps and on the duration field. This is an alias for no-frac and is deprecated as of SiLK 3.23.0.

When a timezone is specified, it is used regardless of the default timezone support compiled into SiLK. The timezone is one of:

utc

Use Coordinated Universal Time to print timestamps.

local

Use the TZ environment variable or the local timezone.

--epoch-time

When producing textual output, print timestamps as epoch time (number of seconds since midnight GMT on 1970-01-01). This switch is equivalent to --timestamp-format=epoch, it is deprecated as of SiLK 3.8.1, and it will be removed in the SiLK 4.0 release.

--ip-format=FORMAT

When producing textual output, specify how IP addresses are printed, where FORMAT is a comma-separated list of the arguments described below. When this switch is not specified, the SILK_IP_FORMAT environment variable is checked for a value and that format is used if it is valid. The default FORMAT is canonical. Since SiLK 3.8.1.

canonical

Print IP addresses in the canonical format. For an IPv4 record, use dot-separated decimal (192.0.2.1). For an IPv6 records, use either colon-separated hexadecimal (2001:db8::1) a or mixed IPv4-IPv6 representation for IPv4-mapped IPv6 addresses (the ::ffff:0:0/96 netblock, e.g., ::ffff:192.0.2.1) and IPv4-compatible IPv6 addresses (the ::/96 netblock other than ::/127, e.g., ::192.0.2.1).

no-mixed

Print IP addresses in the canonical format (192.0.2.1 or 2001:db8::1) but do not used the mixed IPv4-IPv6 representations. For example, use ::ffff:c000:201 instead of ::ffff:192.0.2.1. Since SiLK 3.17.0.

decimal

Print IP addresses as integers in decimal format. For example, print 192.0.2.1 and 2001:db8::1 as 3221225985 and 42540766411282592856903984951653826561, respectively.

hexadecimal

Print IP addresses as integers in hexadecimal format. For example, print 192.0.2.1 and 2001:db8::1 as c00000201 and 20010db8000000000000000000000001, respectively.

zero-padded

Make all IP address strings contain the same number of characters by padding numbers with leading zeros. For example, print 192.0.2.1 and 2001:db8::1 as 192.000.002.001 and 2001:0db8:0000:0000:0000:0000:0000:0001, respectively. For IPv6 addresses, this setting implies no-mixed, so that ::ffff:192.0.2.1 is printed as 0000:0000:0000:0000:0000:ffff:c000:0201. As of SiLK 3.17.0, may be combined with any of the above, including decimal and hexadecimal.

The following arguments modify certain IP addresses prior to printing. These arguments may be combined with the above formats.

map-v4

Change IPv4 addresses to IPv4-mapped IPv6 addresses (addresses in the ::ffff:0:0/96 netblock) prior to formatting. Since SiLK 3.17.0.

unmap-v6

Change any IPv4-mapped IPv6 addresses (addresses in the ::ffff:0:0/96 netblock) to IPv4 addresses prior to formatting. Since SiLK 3.17.0.

The following argument is also available:

force-ipv6

Set FORMAT to map-v4,no-mixed.

--integer-ips

When producing textual output, print IP addresses as integers. This switch is equivalent to --ip-format=decimal, it is deprecated as of SiLK 3.8.1, and it will be removed in the SiLK 4.0 release.

--zero-pad-ips

When producing textual output, print IP addresses as fully-expanded, zero-padded values in their canonical form. This switch is equivalent to --ip-format=zero-padded, it is deprecated as of SiLK 3.8.1, and it will be removed in the SiLK 4.0 release.

--integer-sensors

When producing textual output, print the integer ID of the sensor rather than its name.

--integer-tcp-flags

When producing textual output, print the TCP flag fields (flags, initialFlags, sessionFlags) as an integer value. Typically, the characters F,S,R,P,A,U,E,C are used to represent the TCP flags.

--no-titles

When producing textual output, turn off column titles. By default, titles are printed.

--no-columns

When producing textual output, disable fixed-width columnar output.

--column-separator=C

When producing textual output, use specified character between columns and after the final column. When this switch is not specified, the default of '|' is used.

--no-final-delimiter

When producing textual output, do not print the column separator after the final column. Normally a delimiter is printed.

--delimited
--delimited=C

When producing textual output, run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as the delimiter between columns instead of the default '|'.

SiLK Site Specific Switches

The following switches control the class/type and sensor that rwrecgenerator assigns to every flow record.

--sensor-prefix-map=FILE

Load a prefix map from FILE and use it to map from the internal IP addresses to sensor numbers. If the switch is not provided, all flow records will use the first sensor in the silk.conf file that is supported by the class named in the flowtype. The sensor IDs specified in FILE should agree with the class specified in the --flowtype-* switches.

--flowtype-in=CLASS/TYPE

Set the class/type pair for flow records where the source IP is external, the destination IP is internal, and the flow record is not considered to represent a web record to CLASS/TYPE. Web records are those that appear on ports 80/tcp, 443/tcp, and 8080/tcp. When not specified, rwrecgenerator attempts to find the flowtype "all/in" in the silk.conf file.

--flowtype-inweb=CLASS/TYPE

Set the class/type pair for flow records representing web records where the source IP is external and the destination IP is internal to CLASS/TYPE. When not specified and the --flowtype-in switch is given, that CLASS/TYPE pair will be used. When neither this switch nor --flowtype-in is given, rwrecgenerator attempts to find the flowtype "all/inweb" in the silk.conf file.

--flowtype-out=CLASS/TYPE

Set the class/type pair for flow records where the source IP is internal, the destination IP is external, and the flow record is not considered to represent a web record to CLASS/TYPE. When not specified, rwrecgenerator attempts to find the flowtype "all/out" in the silk.conf file.

--flowtype-outweb=CLASS/TYPE

Set the class/type pair for flow records representing web records where the source IP is internal and the destination IP is external to CLASS/TYPE. When not specified and the --flowtype-out switch is given, that CLASS/TYPE pair will be used. When neither this switch nor --flowtype-out is given, rwrecgenerator attempts to find the flowtype "all/outweb" in the silk.conf file.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwrecgenerator searches for the site configuration file in the locations specified in the "FILES" section.

ENVIRONMENT

SILK_IP_FORMAT

This environment variable is used as the value for --ip-format when that switch is not provided. Since SiLK 3.11.0.

SILK_TIMESTAMP_FORMAT

This environment variable is used as the value for --timestamp-format when that switch is not provided. Since SiLK 3.11.0.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the "FILES" section, rwrecgenerator may use this environment variable when searching for the SiLK site configuration file.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwrecgenerator may use this environment variable. See the "FILES" section for details.

TZ

When the argument to the --timestamp-format switch includes local or when a SiLK installation is built to use the local timezone, the value of the TZ environment variable determines the timezone in which rwrecgenerator displays timestamps. (If both of those are false, the TZ environment variable is ignored.) If the TZ environment variable is not set, the machine's default timezone is used. Setting TZ to the empty string or 0 causes timestamps to be displayed in UTC. For system information on the TZ variable, see tzset(3) or environ(7). (To determine if SiLK was built with support for the local timezone, check the Timezone support value in the output of rwrecgenerator --version.) The TZ environment variable is also used when rwrecgenerator parses the timestamp specified in the --start-time or --end-time switches if SiLK is built with local timezone support.

FILES

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/share/silk/silk.conf
/usr/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

silk(7), rwcut(1), rwflowpack(8), silk.conf(5), syslog(3), zlib(3), tzset(3), environ(7)