rwfglob - Print files that rwfilter's File Selection switches will access
rwfglob { [--class=CLASS] [--type={all | TYPE[,TYPE ...]}]
| [--flowtypes=CLASS/TYPE[,CLASS/TYPE ...]] }
[--sensors=SENSOR[,SENSOR ...]]
[--start-date=YYYY/MM/DD[:HH] [--end-date=YYYY/MM/DD[:HH]]]
[--data-rootdir=ROOT_DIRECTORY] [--site-config-file=FILENAME]
[--print-missing-files] [--no-block-check] [--no-file-names]
[--no-summary]
rwfglob [--data-rootdir=ROOT_DIRECTORY]
[--site-config-file=FILENAME] --help
rwfglob --version
rwfglob accepts the same File "Selection Switches" of rwfilter(1) and prints, to the standard output, the pathnames of the files that rwfilter would process, one file name per line. At the end, a summary is printed to the standard output of the number of files that rwfglob found. To suppress the printing of the file names and/or the summary, specify the --no-file-names and/or --no-summary switches, respectively.
By default, rwfglob only prints the names of files that exist. When the --print-missing-files switch is provided, rwfglob prints, to the standard error, the names of files that it did not find, one file name per line, preceded by the text 'Missing '. To redirect the output of --print-missing-files to the standard output, use the following in a Bourne-compatible shell:
$ rwfglob --print-missing-files ... 2>&1
As of SiLK 3.20, the "Selection Switches" --class, --type, --flowtypes, and --sensors accept a value in the form "@PATH", where @
is the "at" character (ASCII 0x40) and PATH names a file or a path to a file. For example, the following reads the name of types from the file t.txt and uses the sensors S3
, S7
, and the names and/or IDs read from /tmp/sensor.txt:
rwfglob --type=@t.txt --sensors=S3,@/tmp/sensor.txt,S7
Multiple @PATH values are allowed within a single argument. If the name of the file is -
, the names are read from the standard input.
The file must be a text file. Blank lines are ignored as are comments, which begin with the #
character and continue to the end of the line. Whitespace at the beginning and end of a line is ignored as is whitespace that surrounds commas; all other whitespace within a line is significant.
A file may contain a value on each line and/or multiple values on a line separated by commas and optional whitespace. For example:
# Sensor 4
S4
# The first sensors
S0, S1,S2
S3 # Sensor 3
An attempt to use an @PATH directive in a file is an error.
When rwfglob is parsing the name of a file, it converts the sequences @,
and @@
to ,
and @
, respectively. For example, --class=@cl@@ss.txt@,v reads the class from the file cl@ss.txt,v. It is an error if any other character follows an embedded @
(--flowtypes=@f@il contains @i
) or if a single @
occurs at the end of the name (--sensor=@errat@).
For each file it finds, rwfglob will check the size of the file and the number of blocks allocated to the file. If the block count is zero but the file size is non-zero, rwfglob treats the file as existing but as residing on tape. The names of these files are printed to the standard output, but each name is preceded by the text ' \t*** ON_TAPE ***' where '\t' represents a tab character. The summary line will include the number of files that rwfglob believes are on tape. To suppress this check and to remove the count from the summary line, use the --no-block-check switch.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
This set of switches are the same as those used by rwfilter to select the files to process. At least one of these switches must be provided.
The --class switch is used to specify a group of files to print. Only a single class may be selected with the --class switch; for multiple classes, use the --flowtypes switch. The argument may be "@PATH" which causes rwfglob to open the file PATH and read the class name from it; see "Read Selection Argument Values from a File" for details. Classes are defined in the silk.conf(5) site configuration file. If neither the --class nor --flowtypes option is given, the default-class as specified in silk.conf is used. To see the available classes and the default class, either examine the output from rwfglob --help or invoke rwsiteinfo(1) with the switch --fields=class,default-class.
all
| TYPE[,TYPE,@PATH ...]}The --type predicate further specifies data within the selected CLASS by listing the TYPEs of traffic to process. The switch takes either the keyword all
to select all types for CLASS or a comma-separated list of type names and "@PATH" directives, where @PATH tells rwfglob to read type names from the file PATH; see "Read Selection Argument Values from a File" for details. Types are defined in silk.conf, they typically refer to the direction of the flow, and they may vary by class. When neither the --type nor --flowtypes switch is given, a list of default types is used: The default-type list is determined by the value of CLASS, and the default types often include only incoming traffic. To see the available types and the default types for each class, examine the --help output of rwfglob or run rwsiteinfo with --fields=class,type,default-type.
The --flowtypes predicate provides an alternate way to specify class/type pairs. The --flowtypes switch allows a single rwfglob invocation to print filenames from multiple classes. The keyword all
may be used for the CLASS and/or TYPE to select all classes and/or types. As of SiLK 3.20.0, the arguments may also include "@PATH" which causes rwfglob to open the file PATH and read the class/type pairs from it; see "Read Selection Argument Values from a File".
The --sensors switch is used to select data from specific sensors. The parameter is a comma separated list of sensor names, sensor IDs (integers), ranges of sensor IDs, sensor group names, and/or "@PATH" directives. As described in "Read Selection Argument Values from a File", @PATH tells rwfglob to read the names of the sensors from the file PATH. Sensors and sensor groups are defined in the silk.conf(5) site configuration file, and the rwsiteinfo(1) command can be used to print a mapping of sensor names to IDs and classes (--fields=sensor,id-sensor,class:list). When the --sensors switch is not specified, the default is to use all sensors which are valid for the specified class(es). Support for using sensor group names was added in SiLK 3.21.0.
The date predicates indicate which days and hours to consider when creating the list of files. The dates may be expressed as seconds since the UNIX epoch or in YYYY/MM/DD[:HH]
format, where the hour is optional. A T
may be used in place of the :
to separate the day and hour. Whether the YYYY/MM/DD[:HH]
strings represent times in UTC or the local timezone depend on how SiLK was compiled. To determine how your version of SiLK was compiled, see the Timezone support
setting in the output from rwfglob --version.
When times are expressed in YYYY/MM/DD[:HH]
format:
When both --start-date and --end-date are specified to hour precision, all hours within that time range are processed.
When --start-date is specified to day precision, the hour specified in --end-date (if any) is ignored, and files for all dates between midnight on start-date and 23:59 on end-date are processed.
When --start-date is specified to hour precision and --end-date is specified to day precision, the hour of the start-date is used as the hour for the end-date.
When --end-date is not specified and --start-date is specified to day precision, files for that complete day are processed.
When --end-date is not specified and --start-date is specified to hour precision, files for that single hour are processed.
When at least one time is expressed as seconds since the UNIX epoch:
When --end-date is specified in epoch seconds, the given --start-date and --end-date are considered to be in hour precision.
When --start-date is specified in epoch seconds and --end-date is specified in YYYY/MM/DD[:HH]
format, the start-date is considered to be in day precision if it divisible by 86400, and hour precision otherwise.
When --start-date is specified in epoch seconds and --end-date is not given, the start-date is considered to be in hour-precision.
When neither --start-date nor --end-date is given, rwfglob prints all files for the current day.
It is an error to specify --end-date without specifying --start-date.
Tell rwfglob to use ROOT_DIRECTORY as the root of the data repository, which overrides the location given in the SILK_DATA_ROOTDIR environment variable, which in turn overrides the location that was compiled into rwfglob (/data).
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwfglob searches for the site configuration file in the locations specified in the "FILES" section.
This option prints to the standard error the names of the files that rwfglob expected to find but did not. The file names are preceded by the text 'Missing '; each file name appears on a separate line. This switch is useful for debugging, but the list of files it produces can be misleading. For example, suppose there is a decommissioned sensor that still appears in the silk.conf file; rwfglob considers these data files as missing even though their absence is expected. Use the output from this switch judiciously.
This option instructs rwfglob not to check whether the file exists on tape by checking whether the number of blocks allocated to the file is zero. By default, rwfglob precedes a file name that has a block count of 0 with the text ' \t*** ON_TAPE ***'.
This option instructs rwfglob not to print the names of the files that it successfully finds. By default, rwfglob prints the names of the files it finds and a summary line showing the number of files it found. When both this switch and --print-missing-files are specified, rwfglob prints only the names of missing files (and the summary).
This option instructs rwfglob not to print the summary line (that is, the line that shows the number of files found). By default, rwfglob prints the names of the files it finds and a summary line showing the number of files it found.
Print the available options and exit. The available classes and types will be included in output; you may specify a different root directory or site configuration file before --help to see the classes and types available for that site.
Print the version number and information about how SiLK was configured, then exit the application.
In the following examples, the dollar sign ($
) represents the shell prompt. The text after the dollar sign represents the command line.
Looking at a day on a single sensor:
$ rwfglob --start=2003/10/11 --sensor=2
/data/in/2003/10/11/in-GAMMA_20031011.23
/data/in/2003/10/11/in-GAMMA_20031011.22
/data/in/2003/10/11/in-GAMMA_20031011.21
/data/in/2003/10/11/in-GAMMA_20031011.20
/data/in/2003/10/11/in-GAMMA_20031011.19
/data/in/2003/10/11/in-GAMMA_20031011.18
/data/in/2003/10/11/in-GAMMA_20031011.17
/data/in/2003/10/11/in-GAMMA_20031011.16
/data/in/2003/10/11/in-GAMMA_20031011.15
/data/in/2003/10/11/in-GAMMA_20031011.14
/data/in/2003/10/11/in-GAMMA_20031011.13
/data/in/2003/10/11/in-GAMMA_20031011.12
/data/in/2003/10/11/in-GAMMA_20031011.11
/data/in/2003/10/11/in-GAMMA_20031011.10
/data/in/2003/10/11/in-GAMMA_20031011.09
/data/in/2003/10/11/in-GAMMA_20031011.08
/data/in/2003/10/11/in-GAMMA_20031011.07
/data/in/2003/10/11/in-GAMMA_20031011.06
/data/in/2003/10/11/in-GAMMA_20031011.05
/data/in/2003/10/11/in-GAMMA_20031011.04
/data/in/2003/10/11/in-GAMMA_20031011.03
/data/in/2003/10/11/in-GAMMA_20031011.02
/data/in/2003/10/11/in-GAMMA_20031011.01
/data/in/2003/10/11/in-GAMMA_20031011.00
globbed 24 files; 0 on tape
If you only want the summary, specify --no-file-names
$ rwfglob --start-date=2003/10/11 --sensor=2 --no-file-names
globbed 24 files; 0 on tape
This environment variable is used as the value for the --site-config-file when that switch is not provided.
This environment variable specifies the root directory of data repository. This value overrides the compiled-in value, and rwfglob uses it unless the --data-rootdir switch is specified. In addition, rwfglob may use this value when searching for the SiLK site configuration file. See the "FILES" section for details.
This environment variable gives the root of the install tree. When searching for configuration files, rwfglob may use this environment variable. See the "FILES" section for details.
When a SiLK installation is built to use the local timezone (to determine if this is the case, check the Timezone support
value in the output from rwfglob --version), the value of the TZ environment variable determines the timezone in which rwfglob parses timestamps. (The date on the filenames that rwfglob returns are always in UTC.) If the TZ environment variable is not set, the default timezone is used. Setting TZ to 0 or the empty string causes timestamps to be parsed as UTC. The value of the TZ environment variable is ignored when the SiLK installation uses utc. For system information on the TZ variable, see tzset(3) or environ(7).
Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided, where ROOT_DIRECTORY/ is the directory rwfglob is using as the root of the data repository.
Locations for the root directory of the data repository when the --data-rootdir switch is not specified.
rwfilter(1), rwsiteinfo(1), silk.conf(5), silk(7), tzset(3), environ(7)
The ability to use @PATH in --class, --type, --flowtypes, and --sensors was added in SiLK 3.20.0.
As of SiLK 3.20.0, --types is an alias for --type.
The --sensors switch also accepts the names of groups defined in the silk.conf(5) file as of SiLK 3.21.0.
The output of --print-missing-files goes to the standard error, while all other output goes to the standard output. To redirect the output of --print-missing-files to the standard output, use the following in a Bourne-compatible shell:
$ rwfglob --print-missing-files ... 2>&1
The --print-missing-files option needs to be smarter about what files are really missing.
The block count check is of unknown portability across different tape-farm systems.