NAME

mothra-filesanitizer - Remove Information Elements from a Mothra repository

SYNOPSIS

  mothra-filesanitizer [--help] [--version]

  mothra-filesanitizer TARGET-1 [ ... TARGET-N ]
                       --remove-ie=NAME-1 [ ... --remove-ie=NAME-N ]
                       [--compression=CODEC] [--maximum-size=N]
                       [--max-threads=N] [--spawn-thread=MODE]

DESCRIPTION

mothra-filesanitizer removes Information Element fields from the data files in a Mothra repository. In addition, when multiple files share the same name except for the UUID, mothra-filesanitizer combines those files together.

Multiple directories (whether part of a single Mothra repository or of several distinct Mothra repositories) may be processed at the same time. Multiple Information Elements may be removed in the same invocation of mothra-filesanitizer.

This tool runs as a batch process, never as a daemon.

It makes a single recursive scan of the target directories TARGET-1 ... TARGET-N for files whose names match the pattern YYYYMMDD.HH. or YYYYMMDD.HH-PTNNH. (Specifically, it looks for files matching the regular expression ^\d{8}\.\d{2}(?:-PT\d\d?H)?\.) Files whose names match that pattern are processed by mothra-filesanitizer to remove the named Information Elements. All files where the regular expression matched the same string are joined into a single file, similar to the behavior of mothra-filejoiner. Finally, the original files are removed.

OPTIONS

--compression=CODEC

If --compression is provided then files written to HDFS will use the compression codec named CODEC. If this codec cannot be found, mothra-filesanitizer will exit with an error. Values typically supported by Hadoop include bzip2, gzip, lz4, lzo, lzop, snappy, and default. If none or the empty string is given or if this option is not specified, no compression will be used.

--help

Print the available options and exit.

--max-threads=N

When --max-threads is specified, it determines the maximum number of threads which will be used to sanitize and join files simultaneously. One thread is always used to recursively scan the target directories. This value determines the number of threads started as described in the --spawn-thread option.

--maximum-size=N

If --maximum-size is provided, it determines the maximum size in bytes that a file may have before it is closed. After at least this many compressed bytes have been written the output file will be closed and a new output file created. Files will be slightly larger than N bytes, since files are not closed until they exceed this size.

--remove-ie=NAME

The required option --remove-ie, which may be specified multiple times but must be specified at least once, determines which IPFIX Information Elements should be removed from files in the target Mothra spool directories. For example, specifying --remove-ie ingressInterface would remove Information Element 10 (as IANA specifies uses this name) from all templates and records in the affected files. Also specifying --remove-ie=egressInterface would also remove Information Element 14. Other related Information Elements, such as 368 and 369 (ingressInterfaceType and egressInterfaceType) will not be changed.

The specified IEs will, however, be removed from all possible locations in the target files. This includes from all templates (including those used in subTemplateList- and subTemplateMultiList-typed elements), and it also includes basicList-typed elements which contain the excluded Information Elements.

--spawn-thread=MODE

Specifying the --spawn-threads option determines how mothra-filesanitizer allocates work to individual threads. If MODE is by-directory, then a single thread is used to process all of the files in each directory which contains files to process. If MODE is by-prefix, then within each directory one thread is used for all of the files sharing a common YYYYMMDD.HH. or YYYYMMDD.HHPTNNH. prefix. The number of threads which run simultaneously is determined by --max-threads. If --spawn-threads is not specified, the default value of by-directory is used.

--version

Print the version number and information about how Mothra was configured, then exit.