NAME
rwsettool - Perform intersection operations on IPset files
SYNOPSIS
rwsettool { --union | --intersect | --difference | --mask=MASK
| --sample { --size=SIZE | --ratio=RATIO } [--seed=SEED] }
[--output-path=OUTPUT_PATH]
[--note-strip] [--note-add=TEXT] [--note-file-add=FILE]
[--compression-method=COMP_METHOD] [INPUT_SET ...]
DESCRIPTION
rwsettool performs a single operation on one or more IPset file(s)
to produce a new IPset file. rwsettool reads the IPsets specified
on the command line; if no IPsets are listed, rwsettool attempts to
read an IPset from the standard input. The string stdin can be
used as the name of an input file to force rwsettool to read from
the standard input. The output is written to the specified
OUTPUT_PATH or to the standard output if it is not connected to a
terminal. Passing the string stdout will also cause rwsettool
to write the IPset to the standard output.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
Operation Switches
At least one of the following operation switches must be provided:
- --union
- Perform the set union operation: The resulting IPset will contain the IPs that exist in any of the input IPsets.
- --intersect
- Perform the set intersection operation: The resulting IPset will contain the IPs that exist in all of the input IPsets.
- --difference
- Perform the set difference (relative complement) operation: The resulting IPset will contain all IPs from the first input IPset that do not exist in any of the subsequent input IPsets.
- --mask=MASK
- Perform the set union operation on the input IPsets and mask the IPs in the resulting IPset by the specified MASK, an integer value from 1 to 32. When the final IPset has multiple IP addresses within the specified MASK, the IPs are merged into one IP address within the masked block.
- --sample
- Select a random sample of IPs from the input IPsets. The size of the subset must be specified by either the --size or --ratio switches described below. In the case of multiple input IPsets, the resulting IPset is the union of all IP addresses sampled from each of the input IPsets.
Sampling Switches
These switches control how records are sampled by the --sample operation.
- --size=SIZE
- Select a random sample containing SIZE randomly selected records from each input IPset. If the input set is smaller than SIZE, all input IPs will be selected from that IPset.
- --ratio=RATIO
- Select a random sample where the selection probability for each record of each input set is RATIO, specified as a decimal number between 0.0 and 1.0. The exact size of the subset selected from each file will vary between different runs with the same data.
- --seed=SEED
- Seed the pseudo-random number generator with value SEED. By default, the seed will vary between runs. Seeding with specific values will produce repeatable results given the same input sets.
Output Switches
These switches control the output:
- --output-path=OUTPUT_PATH
- Write the resulting IPset to OUTPUT_PATH. If this switch is not provided, rwsettool will attempt to write the IPset to the standard output, unless it is connected to a terminal.
- --note-strip
- Do not copy the notes (annotations) from the input files to the output file. Normally notes from the input files are copied to the output.
- --note-add=TEXT
- Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
- --note-file-add=FILENAME
- Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.
- --compression-method=COMP_METHOD
- Set the compression method of the output to COMP_METHOD. Some SiLK tools can use an external library to compress their binary output. The list of available compression methods and the default method are set when SiLK is compiled (the --help and --version switches print the available and default compression methods) and depend on which supported libraries are found. SiLK can support:
- none
- Do not compress the output using an external library
- zlib
- Use the zlib(3) library for compressing the output
- lzo1x
- Use the lzo1x algorithm from the LZO real time compression library for compression
- best
-
Use whichever available method gives the
bestcompression in general, though not necessarily thebestfor this particular output.
EXAMPLES
Assume the following IPsets:
A.set = { 1, 2, 4, 6 }
B.set = { 1, 3, 5, 7 }
C.set = { 1, 3, 6, 8 }
D.set = { } (empty set)
Then the following commands will produce the following result IPsets:
+---------------------------------+----------------------------+
| OPTIONS | RESULT |
+---------------------------------+----------------------------+
| --union A.set B.set | { 1, 2, 3, 4, 5, 6, 7 } |
| --union A.set C.set | { 1, 2, 3, 4, 6, 8 } |
| --union A.set B.set C.set | { 1, 2, 3, 4, 5, 6, 7, 8 } |
| --union C.set D.set | { 1, 3, 6, 8 } |
| --intersect A.set B.set | { 1 } |
| --intersect A.set C.set | { 1, 6 } |
| --intersect A.set B.set C.set | { 1 } |
| --intersect A.set D.set | { } |
| --difference A.set B.set | { 2, 4, 6 } |
| --difference B.set A.set | { 3, 5, 7 } |
| --difference A.set B.set C.set | { 2, 4 } |
| --difference C.set B.set A.set | { 8 } |
| --difference C.set D.set | { 1, 3, 6, 8 } |
| --difference D.set C.set | { } |
|---------------------------------+----------------------------+
Sampling yields variable results, but here some example runs:
+---------------------------------+----------------------------+
| COMMAND | RESULT |
+---------------------------------+----------------------------+
| --sample -size 2 A.set | { 1, 4 } |
| --sample -size 2 A.set | { 1, 6 } |
| --sample -size 3 A.set | { 2, 4, 6 } |
| --sample -size 2 A.set B.set | { 1, 2, 5, 7 } |
| --sample -size 2 A.set B.set | { 3, 4, 5, 6 } |
| --sample -size 2 A.set B.set | { 1, 4, 5 } |
| --sample -ratio 0.5 A.set | { 2, 6 } |
| --sample -ratio 0.5 A.set | { 4 } |
| --sample -ratio 0.5 A.set B.set | { 1 } |
| --sample -ratio 0.5 A.set B.set | { 2, 3, 5, 6, 7 } |
+---------------------------------+----------------------------+
These examples demonstrate some important points about sampling from IPsets:
-
When using --size, an exact number of items is selected from each
input set.
When using --size with multiple input sets, the number of records
in the output set may not be (num_input_sets*size) in all cases.
When using --ratio, the number of items sampled is not stable
between runs.
SEE ALSO
rwset(1), rwsetbuild(1), rwsetcat(1), rwsetintersect(1), rwsetunion(1), rwfileinfo(1)
NOTES
rwsettool supersedes the rwsetunion(1) and rwsetintersect(1) tools.


