NAME

rwsettool - Operate on IPset files to produce a new IPset

SYNOPSIS

  rwsettool { --union | --intersect | --difference
              | --symmetric-difference
              | --sample {--size=SIZE | --ratio=RATIO} [--seed=SEED]
              | --mask=NET_BLOCK_SIZE | --fill-blocks=NET_BLOCK_SIZE }
        [--output-path=PATH] [--record-version=VERSION]
        [--invocation-strip]
        [--note-strip] [--note-add=TEXT] [--note-file-add=FILE]
        [--compression-method=COMP_METHOD] [INPUT_SET ...]

  rwsettool --help

  rwsettool --version

DESCRIPTION

rwsettool performs a single operation on one or more IPset file(s) to produce a new IPset file.

The operations that rwsettool provides are

union

The union (or addition) of two IPsets is the set of IP addresses that are members in either set.

intersection

The intersection of two IPsets is the set of IP addresses that are members of both sets.

difference

The difference (or relative complement) of two IPsets is the set of IP addresses that are members of the first set but not members of the second.

symmetric-difference

The symmetric difference (or disjunctive union) of two IPsets is the set of IP addresses that are members of either set but not members of both. This is the equivalent to the intersection of the IPsets subtracted from the union of the IPsets. It is also equivalent to computing the union of both relative complements (the first set from the second and the second set from the first).

sample

The set of IP addresses in an IPset is randomly selected to produce a subset.

mask

For each CIDR-block (or net-block) of a user-specified size in the IPset, the IP addresses that are members of that net-block are replaced by a single IP address at the start of the net-block. Empty net-blocks are not changed.

fill-blocks

For each CIDR-block (or net-block) of a user-specified size in the IPset, the IP addresses that are members of that net-block are extended so that every IP address in that net-block is a member of the set. Empty net-blocks are not changed.

More details are provided in the "OPTIONS" section.

rwsettool reads the IPsets specified on the command line; when no IPsets are listed, rwsettool attempts to read an IPset from the standard input. The strings stdin or - can be used as the name of an input file to force rwsettool to read from the standard input. The resulting IPset is written to the location specified by the --output-path switch or to the standard output if that switch is not provided. Using the strings stdout or - as the argument to --output-path causes rwsettool to write the IPset to the standard output. rwsettool exits with an error if an attempt is made to read an IPset from the terminal or write an IPset to the terminal.

To create an IPset file from SiLK Flow records, use rwset(1), and to create one from text, use rwsetbuild(1). rwsetcat(1) prints an IPset file as text. To determine whether an IPset file contains an IP address, use rwsetmember(1).

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

Operation Switches

Exactly one of the following operation switches must be provided:

--union

Perform the set union operation: The resulting IPset contains each IP address that is a member of any of the input IPsets.

--intersect

Perform the set intersection operation: The resulting IPset contains each IP address that is a member of all of the input IPsets.

--difference

Perform the set difference operation: The resulting IPset contains each IP address that is a member of the first IPset and not a member of any subsequent IPsets.

--symmetric-difference

Perform the symmetric difference operation: For two input sets, the resulting IPset contains each IP address that is a member of one of the input IPsets but not both. For each additional IPset, rwsettool computes the symmetric difference of the current result with the additional IPset. For three input sets, the output IPset contains each IP address that is a member of either one of the IPsets or of all three IPsets. Since SiLK 3.13.0.

--sample

Select a random sample of IPs from the input IPsets. The size of the subset must be specified by either the --size or --ratio switches described next. In the case of multiple input IPsets, the resulting IPset is the union of all IP addresses sampled from each of the input IPsets. That is, each IPset is individually sampled, and the results are merged.

--size=SIZE

Create an IPset containing the union of randomly selecting exactly SIZE IP addresses from each input IPset. If the number of IP addresses in an input IPset is less than or equal to SIZE, all members of that IPset are included in the result. When the input sets are completely disjoint and each set has at least SIZE members, the number of IP addresses in the result is the product of SIZE and the number of inputs.

--ratio=RATIO

Create an IPset where the probability of including each IP address of each input IPset in the result is RATIO, specified as a floating point number between 0.0 and 1.0. For each input IP address, rwsettool computes a pseudo-random number between 0 and 1 and adds the IP address to the result when the number is less than RATIO. The exact size of the subset may vary with each invocation.

--seed=SEED

Seed the pseudo-random number generator with value SEED. By default, the seed varies for each invocation. Seeding with a specific value produces repeatable results given the same input sets.

--mask=NET_BLOCK_SIZE

Perform a (sparse) masking operation: The resulting IPset contains one IP address for each /NET_BLOCK_SIZE CIDR block in the input IPset(s) that contains one or more IP addresses in that CIDR block. That is, rwsettool visits each /NET_BLOCK_SIZE CIDR block in the IPset. If the block is empty, no change is made; otherwise the block is cleared (all IPs removed) and the lowest IP address in that block is made a member of the set. NET_BLOCK_SIZE should be value between 1 and 32 for IPv4 sets and between 1 and 128 for IPv6 sets. Contrast with --fill-blocks.

--fill-blocks=NET_BLOCK_SIZE

Perform a (non-sparse) masking operation: The resulting IPset contains a completely full /NET_BLOCK_SIZE block for each /NET_BLOCK_SIZE CIDR block in the input IPset(s) that contain one or more IP addresses in that CIDR block. That is, rwsettool visits each /NET_BLOCK_SIZE CIDR block in the IPset; if the block is empty, no change is made, otherwise all IP addresses in the block are made members of the set. NET_BLOCK_SIZE should be value between 1 and 32 for IPv4 sets and between 1 and 128 for IPv6 sets. Contrast with --mask.

Output Switches

These switches control the output:

--output-path=PATH

Write the resulting IPset to PATH, where PATH is a filename, a named pipe, the keyword stderr to write the output to the standard error, or the keyword stdout or - to write the output to the standard output. If PATH names an existing file, rwsettool exits with an error unless the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If this switch is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwsettool to exit with an error.

--record-version=VERSION

Specify the format of the IPset records that are written to the output. VERSION may be 2, 3, 4, 5 or the special value 0. When the switch is not provided, the SILK_IPSET_RECORD_VERSION environment variable is checked for a version. The default version is 0.

0

Use the default version for an IPv4 IPset and an IPv6 IPset, currently 2 and 3, respectively.

2

Create a file that may hold only IPv4 addresses and is readable by all versions of SiLK.

3

Create a file that may hold IPv4 or IPv6 addresses and is readable by SiLK 3.0 and later.

4

Create a file that may hold IPv4 or IPv6 addresses and is readable by SiLK 3.7 and later. These files are more compact that version 3 and often more compact than version 2.

5

Create a file that may hold only IPv6 addresses and is readable by SiLK 3.14 and later. When this version is specified, IPsets containing only IPv4 addresses are written in version 4. These files are usually more compact that version 4.

--invocation-strip

Do not record any command line history; that is, do not copy the invocation history from the input files to the output file, and do not record the current command line invocation in the output.

--note-strip

Do not copy the notes (annotations) from the input files to the output file. Normally notes from the input files are copied to the output.

--note-add=TEXT

Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.

--note-file-add=FILENAME

Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.

--compression-method=COMP_METHOD

Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.

none

Do not compress the output using an external library.

zlib

Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.

lzo1x

Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.

snappy

Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.

best

Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.

Additional Switches

rwsettool supports these additional switches:

--help

Print the available options and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

Assume the following IPsets:

 A.set = { 1, 2, 4, 6 }
 B.set = { 1, 3, 5, 7 }
 C.set = { 1, 3, 6, 8 }
 D.set = { } (empty set)

Set Union Examples

The union of two IPsets contains the IP addresses that are members of either IPset. The union of multiple IPsets contains the IP addresses that are members of any of the sets. The resulting IPset does not depend on the order of the input IPsets. The union of a single IPset, of an IPset with itself, and of an IPset with an empty IPset is the original IPset.

 +---------------------------------+----------------------------+
 | OPTIONS                         | RESULT                     |
 +---------------------------------+----------------------------+
 | --union A.set B.set             | { 1, 2, 3, 4, 5, 6, 7 }    |
 | --union A.set C.set             | { 1, 2, 3, 4, 6, 8 }       |
 | --union A.set B.set C.set       | { 1, 2, 3, 4, 5, 6, 7, 8 } |
 | --union C.set D.set             | { 1, 3, 6, 8 }             |
 | --union A.set                   | { 1, 2, 4, 6 }             |
 | --union A.set A.set             | { 1, 2, 4, 6 }             |
 +---------------------------------+----------------------------+

Set Intersection Examples

The intersection of two IPsets contains the IP addresses that are members of both IPsets (that is, the IP addresses they have in common). The intersection of multiple IPsets contains the IP addresses that are members of all of the sets. The resulting IPset does not depend on the order of the input IPsets. The intersection of a single IPset is the original IPset. The intersection of an IPset with itself is the original IPset. The intersection of an IPset with an empty IPset is an empty IPset.

 +---------------------------------+----------------------------+
 | OPTIONS                         | RESULT                     |
 +---------------------------------+----------------------------+
 | --intersect A.set B.set         | { 1 }                      |
 | --intersect A.set C.set         | { 1, 6 }                   |
 | --intersect B.set C.set         | { 1, 3 }                   |
 | --intersect A.set B.set C.set   | { 1 }                      |
 | --intersect A.set D.set         | { }                        |
 | --intersect A.set               | { 1, 2, 4, 6 }             |
 | --intersect A.set A.set         | { 1, 2, 4, 6 }             |
 +---------------------------------+----------------------------+

Set Difference Examples

The difference of two IPsets contains the IP addresses that are members of the first set but not members of the second. The difference of multiple IPsets contains the IP addresses in the first set that are not members of any other IPset. The resulting IPset is dependent on the order of the input IPsets. Using the difference operation on a single IPset gives that IPset. The difference of an IPset with an empty IPset is the first IPset. The difference of an IPset with itself is the empty IPset.

 +---------------------------------+----------------------------+
 | OPTIONS                         | RESULT                     |
 +---------------------------------+----------------------------+
 | --difference A.set B.set        | { 2, 4, 6 }                |
 | --difference B.set A.set        | { 3, 5, 7 }                |
 | --difference A.set B.set C.set  | { 2, 4 }                   |
 | --difference C.set B.set A.set  | { 8 }                      |
 | --difference C.set D.set        | { 1, 3, 6, 8 }             |
 | --difference D.set C.set        | { }                        |
 | --difference A.set              | { 1, 2, 4, 6 }             |
 | --difference A.set A.set        | { }                        |
 +---------------------------------+----------------------------+

Set Symmetric Difference Examples

The symmetric difference (or relative complement) of two IPsets contains the IP addresses that are members of either set but not members of both sets. For each additional input IPset, rwsettool computes the symmetric difference of the current result with the that IPset. The resulting IPset contains the IP addresses that are members of an odd number of the input sets. The resulting IPset does not depend on the order of the input IPsets. Using the symmetric difference operation on a single IPset gives that IPset. The symmetric difference of an IPset with an empty IPset is the first IPset. The symmetric difference of an IPset with itself is the empty IPset.

 +---------------------------------+----------------------------+
 | OPTIONS                         | RESULT                     |
 +---------------------------------+----------------------------+
 | --symmetric A.set B.set         | { 2, 3, 4, 5, 6, 7 }       |
 | --symmetric A.set C.set         | { 2, 3, 4, 8 }             |
 | --symmetric A.set D.set         | { 1, 2, 4, 6 }             |
 | --symmetric C.set B.set         | { 5, 6, 7, 8 }             |
 | --symmetric A.set B.set C.set   | { 1, 2, 4, 5, 7, 8 }       |
 | --symmetric A.set               | { 1, 2, 4, 6 }             |
 | --symmetric A.set A.set         | { }                        |
 +---------------------------------+----------------------------+

Finding IP Addresses Unique to an Input Set

Using the symmetric difference on three or more IPsets does not result in an IPset containing the IP addresses that are members of a single input set. To compute that, use the Bag tools as follows.

  1. First, use rwbagbuild(1) to create an empty bag file /tmp/b.bag.

     $ echo "" | rwbagbuild --bag-input=stdin --output-path=/tmp/b.bag
  2. For each input IPset, i.set, use rwbagbuild to create a bag from the IPset, and use rwbagtool(1) to add that bag to b.bag.

     $ rwbagbuild --set-input=i.set   \
       | rwbagtool --add - /tmp/b.bag --output-path=/tmp/b2.bag
     $ mv /tmp/b2.bag /tmp/b.bag

    To do that in a loop, run

     $ for i in *.set ; do \
             rwbagbuild --set-input=$i  \
             | rwbagtool --add - /tmp/b.bag --output-path=/tmp/b2.bag ; \
             mv /tmp/b2.bag /tmp/b.bag ; \
       done
  3. Use rwbagtool to create a coverset named unique.set that contains the IP addresses in b.bag whose counter is 1.

     $ rwbagtool --maxcounter=1 --coverset --output-path=unique.set \
             /tmp/b.bag

A different approach may be used which does not require temporary files. Use rwsetcat(1) to convert the IPset files to text and feed that data to rwbagbuild. (When rwsetcat is invoked on multiple IPset files, it prints the contents of each individual IPset file, and as rwbagbuild processes the text, it increments an IP address's counter each time the IP appears in the input.) Use rwbagtool to create the IPset as shown in Step 3 above.

 $ rwsetcat --cidr-blocks=1 *.set   \
   | rwbagbuild --bag-input=-       \
   | rwbagtool  --maxcounter=1 --coverset --output=unique.set

Set Sampling Examples

The --sample switch creates a subset that contains IP addresses that have been randomly selected from the input IPset(s).

The --size switch selects exactly SIZE IP addresses from each input set, but the number of IP addresses in the result may be less than the product of SIZE and the number of inputs when the input sets have IPs in common or when an IPset has fewer than SIZE members.

When using the --size switch, the probability of selecting an individual IP address varies with the number of IPs to be selected and the number of IPs remaining in the set. If N is the number of IPs in a set, the probability of selecting the first IP is SIZE/N. If that IP is selected, the probability of selecting the second is (SIZE-1)/(N-1), but if the first IP is not selected, the probability of selecting the second is SIZE/(N-1).

 +----------------------------------+----------------------------+
 | COMMAND                          | RESULT                     |
 +----------------------------------+----------------------------+
 | --sample --size 2 A.set          | { 1, 4 }                   |
 | --sample --size 2 A.set          | { 1, 6 }                   |
 | --sample --size 3 A.set          | { 2, 4, 6 }                |
 | --sample --size 2 A.set B.set    | { 1, 2, 5, 7 }             |
 | --sample --size 2 A.set B.set    | { 3, 4, 5, 6 }             |
 | --sample --size 2 A.set B.set    | { 1, 4, 5 }                |
 +----------------------------------+----------------------------+

The argument to the --ratio switch is the probability of choosing an individual IP address. For each IP address in the input, the IP is added to the output when a pseudo-random number between 0 and 1 is less then the argument to --ratio. The number of IP addresses in the result varies with each invocation.

 +----------------------------------+----------------------------+
 | COMMAND                          | RESULT                     |
 +----------------------------------+----------------------------+
 | --sample --ratio 0.5 A.set       | { 2, 6 }                   |
 | --sample --ratio 0.5 A.set       | { 4 }                      |
 | --sample --ratio 0.5 A.set B.set | { 1, 3 }                   |
 | --sample --ratio 0.5 A.set B.set | { 2, 3, 5, 6, 7 }          |
 +----------------------------------+----------------------------+

Set Masking and Block-Filling Examples

The goal of the --mask and --fill-blocks switches is to produce an IPset whose members are on user-defined CIDR-block (or net-block) boundaries. (In some ways, these switches produce output that is similar to the --network-structure switch on rwsetcat(1).)

The --mask and --fill-blocks switches require a decimal argument that is a CIDR-block network mask size. For example, the argument 24 represents 256 IPv4 addresses. rwsettool visits each block of that size in the input IPset. If no IP addresses appear in that block, the result also has no IPs in the block. If one or more IP addresses appear in that block, the output IPset has either the lowest address in that block as a member (for --mask) or all IP addresses in that block as members (for --fill-blocks.

For example, consider the IPset s.set containing the three IP addresses.

 $ rwsetcat --cidr-blocks=1 s.set
 10.1.1.1
 10.1.1.2
 10.1.3.1

Specifying --mask=24 produces an IPset containing two IP addresses.

 $ rwsettool --mask=24 s.set | rwsetcat --cidr-blocks=1
 10.1.1.0
 10.1.3.0

Specifying --fill-blocks=24 produces an IPset containing 512 IP addresses.

 $ rwsettool --fill-block=24 s.set | rwsetcat --cidr-blocks=1
 10.1.1.0/24
 10.1.3.0/24

Consider t.set that contains four IP addresses.

 $ rwsetcat --cidr-blocks=1 t.set
 10.1.1.1
 10.1.1.2
 10.1.2.5
 10.1.3.1

Running --mask=24 and --fill-blocks=24 on that file produces the following.

 $ rwsettool --mask=24 t.set | rwsetcat --cidr-blocks=1
 10.1.1.0
 10.1.2.0
 10.1.3.0

 $ rwsettool --fill-block=24 t.set | rwsetcat --cidr-blocks=1
 10.1.1.0/24
 10.1.2.0/23

rwsetcat merges 10.1.2.0/24 and 10.1.3.0.24 into a single /23.

When multiple IPsets are specified on the command line, the union of the IPsets is computed prior to performing the mask or fill-blocks operation. The result is not dependent on the order of the IPsets.

Mixed IPv4 and IPv6 Examples

Suppose the IPset file mixed.set contains IPv4 and IPv6 addresses. To create an IPset file that contains only the IPv4 addresses, intersect mixed.set with the IPset all-v4.set, which is an IPset that contains all of IPv4 space (::ffff:0:0/96).

 $ echo '::ffff:0:0/96' | rwsetbuild - all-v4.set

 $ rwsettool --intersect mixed.set all-v4.set > subset-v4.set

To create an IPset file that contains only the IPv6 addresses, subtract all-v4.set from mixed.set:

 $ rwsettool --difference mixed.set all-v4.set > subset-v6.set

The previous two commands may also be performed without having to write create the all-v4.set IPset file.

 $ echo '::ffff:0:0/96'   \
   | rwsettool --intersect mixed.set - > subset-v4.set

 $ echo '::ffff:0:0/96'   \
   | rwsettool --difference mixed.set - > subset-v6.set

Comparing Two IPsets Example

To determine if two IPset files contain the same set of IP addresses, use the --symmetric-difference switch and then count the number of IP addresses of the result with rwsetcat. If the count is 0, the files contain the same IP addresses.

 $ cp A.set A2.set
 $ rwsettool --symmetric-difference A.set A2.set  \
   | rwsetcat --count
 0

Changing a File's Format

To share an IPset file with a user who has an older version of SiLK that includes different compression libraries, it may be necessary to change the the record-version or the compression-method of an IPset file.

It is not possible to change those aspects of the file directly. A new file must be created first, and then you may then replace the old file with the new file.

To create a new file that uses a different record-version or compression-method of the IPset file A.set, use rwsettool with the --union switch and specify the desired arguments:

 $ rwsettool --union --record-version=5 --output-path=A2.set A.set

 $ rwsettool --union --compression=none --output-path=A3.set A.set

 $ rwsettool --union --record-version=2 --compression=best \
        --output-path=A4.set A.set

ENVIRONMENT

SILK_IPSET_RECORD_VERSION

This environment variable is used as the value for the --record-version when that switch is not provided. Since SiLK 3.7.0.

SILK_CLOBBER

The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.

SILK_COMPRESSION_METHOD

This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.

SEE ALSO

rwset(1), rwsetbuild(1), rwsetcat(1), rwsetmember(1), rwbagbuild(1), rwbagtool(1), rwfileinfo(1), silk(7), zlib(3)

NOTES

Prior to SiLK 3.0, an IPset file could not contain IPv6 addresses and the record version was 2. The --record-version switch was added in SiLK 3.0 and its default was 3. In SiLK 3.6, an argument of 0 was allowed and made the default. Version 4 was added in SiLK 3.7 as was support for the SILK_IPSET_RECORD_VERSION environment variable. Version 5 was added in SiLK 3.14.