rwfileinfo - Print information about a SiLK file
rwfileinfo [--fields=FIELDS] [--summary] [--no-titles] [--site-config-file=FILENAME] FILE [ FILE ... ] rwfileinfo --help rwfileinfo --help-fields rwfileinfo --version
rwfileinfo prints information about a binary SiLK file that can be determined by reading the file's header and by moving quickly over the data blocks in the file.
rwfileinfo requires one or more file arguments to be given on the command line. To have rwfileinfo read a file's contents from the standard input, use
stdin as a file argument.
When the --summary switch is given, rwfileinfo first prints the information for each individual file and then prints the number of files processed, the sum of the individual file sizes, and the sum of the individual record counts.
By default, rwfileinfo prints the following information for each file argument. Use the --fields switch to modify which pieces of information are printed.
(rwfileinfo prints a field in order in which support for that field was added to SiLK. The field descriptions are presented here in a more logical order.)
The size of the file on disk as reported by the operating system. rwfileinfo prints 0 for the file-size when reading from the standard input.
Every binary file written by SiLK has a version number field. Since SiLK 1.0.0, the version number field has been used to indicate the general structure (or layout) of the file. The file structure adopted in SiLK 1.0.0 uses a version number of 16 and has a header section and a data section. The header section begins with 16 bytes that specify well-defined values, and those bytes are followed by one or more variably-sized header entries. The specifics of the data section depend on the content of the file.
The header-length field shows the number of octets required by header (i.e., the initial 16 bytes and the header entries). Since everything after the header is data, the header-length is the starting offset of the data section. The smallest header length is 24 bytes, but typically the header is padded to be an integer multiple of the record-length. The header-length that rwfileinfo prints for a file is determined dynamically by reading the file's header.
When a SiLK tool creates a binary file, the tool writes the current SiLK release number (such as 3.9.0) into the file's header as a way to help diagnose issues should a bug with a particular release of SiLK be discovered in the future.
Every SiLK file has a byte-order or endian field. SiLK uses the machine's native representation of integers when writing data, and this field shows what representation the file contains.
BigEndian is network byte order and
littleEndian is used by Intel chips. The rwswapbytes(1) tool changes a file's integer representation, and some tools have a --byte-order switch that allows the user to specify the integer representation of output files. The header-section of a file is always written in network byte order.
SiLK tools may use the zlib library (http://zlib.net/) or the LZO library (http://www.oberhumer.com/opensource/lzo/) to compress the data section of a file. The compression field specifies which library (if any) was used to compress the data section. If a file is compressed with a library that was not included in an installation of SiLK, SiLK is unable to read the data section of the file. Many SiLK tools accept the --compression-method switch to choose a particular compression method. (The compression field does not indicate whether the entire file has been compressed with an external compression utility such as gzip(1).)
Every binary file written by SiLK has two fields in the header that specify exactly what the file contains: the format and the record-version. In general, the format indicates the content type of the file and the record-version indicates the evolution of that content.
The contents of a file whose format is
FT_PREFIXMAP is fairly obvious (an IPset, a Bag, a prefix map).
There are many different file formats for writing SiLK Flow records, but the SiLK analysis tools largely use a single Flow file format. That format is
FT_RWIPV6ROUTING if SiLK has been compiled with IPv6 support, or
FT_RWGENERIC otherwise. A file that uses the
FT_RWGENERIC format is only capable of holding IPv4 addresses.
The other SiLK Flow file formats are created by rwflowpack(8) as it writes flow records to the repository. These formats often omit fields and use reduced bit-sizes for fields to reduce the space required for an individual flow record.
The record-version field indicates changes within the general type specified by the format field. For example, SiLK incremented the record-version of the formats that hold flow records when the resolution of record timestamps was changed from seconds to milliseconds.
Together with the format fields specifies the contents of the file. See the discussion of format for details.
Files created by SiLK 1.0.0 and later have a record length field. This field contains the length of an individual record, and this value is dependent on the format and record-version fields described above. Some files (such as those containing IPsets or prefix maps) do not write individual records to the output, and the record length is 1 for these files.
The count-records field is generated dynamically by determining the length the data section would require if it were completely uncompressed and dividing it by the record-length. When the record-length is 1 (such as for IPset files), the count-records field does not provide much information beyond the length of the uncompressed data. For an uncompressed file, adding header-length to the product of count-records and record-length is equal to the file-size.
The fields given above are either present in the well-defined header or are computed by reading the file.
The following fields are generated by reading the header entries and determining if one or more header entries of the specified type are present. The field is not printed in the output when the header entry is not present in the file.
Many of the SiLK tools write a header entry to the output file that contains the command line invocation used to create that file, and some of the SiLK tools also copy the command line history from their input files to the output file. (The --invocation-strip switch on the tools can be used to prevent copying and recording of the invocation.) The command lines are stored in individual header entries and this field displays those entries with the most recent invocation at the end of the list.
The command line history is has a couple of issues:
When multiple input files are used to create a single output, the entries are stored as a list, and this makes it is difficult to know which set of command line entries are associated with which input file.
When a SiLK tool creates multiple output files (e.g., when using both --pass and --fail to rwfilter(1)), the tool writes the same command line entry to each output file. Some context in addition to the command line history may be needed to know which branch of that tool a particular file represents.
Most of SiLK tools that create binary output files provide the --note-add and --note-file-add switches which allow an arbitrary annotation to be added to the header of a file. Some tools also copy the annotations from the source files to the destination files. The annotations are stored in individual header entries and this field displays those entries.
SiLK 3.0.0 and SiLK 3.7.0 introduced new output formats for IPset data structures, and these formats are denoted by record-versions 3 and 4, respectively. (To select these formats, use the --record-version switch on rwset(1), rwsetbuild(1), or rwsettool(1), or use the --ipset-record-version switch on rwbagtool(1).) When the record-version is 3, the file contains a version of the IPset data structure that can be read directly into memory, and the file contains a header entry that specifies the number of nodes, the number of branches from each node, the number of leaves, the size of the nodes and leaves, and which node is the root of the tree. When the record-version is 4, the header entry specifies whether the file contains IPv4 addresses or IPv6 addresses.
Since SiLK 3.0.0, the tools that write binary Bag files (rwbag(1), rwbagbuild(1), and rwbagtool(1)) have written a header entry that specifies the type and size of the key and of the counter in the file.
When using rwpmapbuild(1) to create a prefix map file, a string that specifies a mapname may be provided. rwpmapbuild writes the mapname to a header entry in the prefix map file. The mapname is used to generate command line switches or field names when the --pmap-file switch is specified to several of the SiLK tools (see pmapfilter(3) for details). When displaying the mapname, rwfileinfo prefixes it with the string
v1: which denotes a version number for the prefix-map header entry. (The version number is printed for completeness.)
When rwflowpack(8) creates a SiLK Flow file for the repository, all the records in the file have the same starting hour, the same sensor, and the same flowtype (class/type pair). rwflowpack writes a header entry to the file that contains these values, and this field displays those values. (To print the names for the sensor and flowtype, the silk.conf(5) file must be accessible.)
When flowcap(8) creates a SiLK flow file, it adds a header entry specifying the name of the probe from which the data was collected.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
Specify what information to print for each file argument on the command line. FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Field-names are case-insensitive and may be shortened to a unique prefix. When the --fields option is not given, all fields are printed if the file contains the necessary information. The fields are always printed in the order they appear here regardless of the order they are specified in FIELDS.
The possible field values are given next with a brief description of each. For a full description of each field, see "Field Descriptions" above.
The contents of the file as a name and the corresponding hexadecimal ID.
An integer describing the layout or structure of the file.
littleEndian to indicate the representation used to store integers in the file (network or non-network byte order).
The compression library (if any) used to compress the data-section of the file, specified as a name and its decimal ID.
The octet length of the file's header; alternatively the offset where data begins.
The octet length of a single record or the value 1 if the file's content is not record-based.
The number of records in the file, computed by dividing the uncompressed data length by the record-length.
The size of the file on disk as reported by the operating system.
The command line invocation used to generate this file.
The version of the records contained in the file.
The release of SiLK that wrote this file.
For a repository Flow file generated by rwflowpack(8), this prints the timestamp of the starting hour, the flowtype, and the sensor of each flow record in the file.
For a Flow file generated by flowcap(8), the name of the probe where the flow records where initially collected.
The notes (annotations) that users have added to the file's header.
For a prefix map file, the
mapname that was set when the file was created by rwpmapbuild(1).
For an IPset file whose record-version is 3, a description of the tree data structure. For an IPset file whose record-version is 4, the type of IP addresses (IPv4 or IPv6).
For a bag file, the type and size of the key and of the counter.
After the data for each individual file is printed, print a summary that shows the number of files processed, the sum of the individual file sizes, and the total number of records contained in those files.
Suppress printing of the file name and field names. The output contains only the values, where each value is printed left-justified on a single line.
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwfileinfo searches for the site configuration file in the locations specified in the "FILES" section.
Print the available options and exit.
Print a description of each field, its alias, and exit.
Print the version number and information about how SiLK was configured, then exit the application.
In the following examples, the dollar sign (
$) represents the shell prompt. The text after the dollar sign represents the command line.
Get information about the file tcp-data.rw:
$ rwfileinfo tcp-data.rw tcp-data.rw: format(id) FT_RWGENERIC(0x16) version 16 byte-order littleEndian compression(id) none(0) header-length 208 record-length 52 record-version 5 silk-version 1.0.1 count-records 7 file-size 572 command-lines 1 rwfilter --proto=6 --pass=tcp-data.rw ... annotations 1 This is some interesting TCP data
Return a single value which is the number of records in the file tcp-data.rw:
$ rwfileinfo --no-titles --field=count-records tcp-data.rw 7
This environment variable is used as the value for the --site-config-file when that switch is not provided.
This environment variable specifies the root directory of data repository. As described in the "FILES" section, rwfileinfo may use this environment variable when searching for the SiLK site configuration file.
This environment variable gives the root of the install tree. When searching for configuration files, rwfileinfo may use this environment variable. See the "FILES" section for details.
Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.