ryhilbert

SYNOPSIS

ryhilbert [options]

DESCRIPTION

ryhilbert plots values on a Hilbert curve. A Hilbert curve preserves locality, meaning a range of values (ryhilbert can map numbers or IP address space) can be mapped into 2-dimensional space in such a way that values which are close to one another numerically are also close spatially. Because a Hilbert curve maps a single line, only one value is needed to compute a two-dimensional value (versus the two values required to plot a point in a scatterplot, for example).

Hilbert curves always take up a square; therefore, the range of numbers plotted is always a power of two.

ryhilbert input should be a column of numbers (or IP addresses) representing position (called the indexes or index input) and a column of numbers representing values to plot at that position. The index data should all be greater than or equal to 0. The values can be any decimal number.

Value data will be plotted either in a single color for any observation at that index, or as a color gradient indicating the quantity at that index.

CONFIGURATION

Every option available at the command line may be specified in a configuration file. For more information on the format of the configuration file, see ryrc(5).

REQUIRED ARGUMENTS

--input-path <file>
Required. A file containing the input data. If --input-path is a hyphen (-), input will be read from standard input. The data should be in the format described in rydataformat(5).
--output-path <file>
Required. A file containing the output visualization. If the file does not exist, it will be created; if it does exist it will be overwritten. The extension of the file determines the output file format. Understood extensions are .png (PNG), .svg (SVG), .ps (PostScript) and .pdf (PDF).

INPUT ASSOCIATION

ryhilbert can requires one of two kinds of input:

  • A single-column of indexes. (E.g., the output of rwsetcat(1).) In this case, --binary-plot is a required option.
  • Two columns, one of which contains index data, and one of which contains value data.

Additional columns may be present in the data, but will be ignored.

The following options associate data with dimensions in the visualization.

--index-input <colspec>
A column specification (see ryspecs(5)) of the data column to be used for position.
--val-input <colspec>
A column specification (see ryspecs(5)) of the data column to be used for value input.

SCALING DATA

Index data is mapped 1:1 to discrete values on the Hilbert curve, and so is not scaled further. (The range of plottable values may be adjusted, however.) Value data is normally scaled linearly, but may be logarithmically scaled.

--log-colors
Flag. Compute color values on a logarithmic scale.
--index-max
The largest index value to be plotted on the Hilbert curve. This can be numeric (e.g. 255) or an IPv4 address in dotted-quad notation (e.g., 255.255.255.255, the largest allowable value). The value will be rounded up to the nearest power of 2.
--cidr-netmask <num>
A numeric value indicating the “resolution” of the visualization. Points on the curve will represent bins of size 2^(32 - x), where x is the value of --cidr-netmask. This notation was chosen to work will with IPv4 data, but can be used with both IP and numeric indexes.

FILTERING DATA

It may sometimes be desirable to ignore some of the input data. To facilitate this, ryhilbert defines four levels in the data:

  • The floor is the “lowest plotted point.”
  • The ceiling (abbreviated ceil) is the “highest plotted point.”
  • Points below the floor are said to be in the basement. Basement points may optionally be visualized in a different color.
  • Conversely, points above the ceiling are said to be in the attic. Like basement points, attic points may optionally be visualized in a different

These values apply only to the value data. Data is not filtered based on index.

--floor <num>
A numeric value. Data below this point in the input will not be visualized.
--floor-pct <pct>
A percentile value between 1 and 100. Data below this percentile in the input will not be visualized.
--ceil <num>
A numeric value. Data above this point in the input will not be visualized.
--ceil-pct <pct>
A percentile value between 1 and 100. Data above this percentile in the input will not be visualized.

Colors for the floor, ceiling, basement and attic may be selected using options in the DISPLAY section.

DISPLAY

The following options control how the data is displayed and what decorations are applied to the visualization.

Size

The size of the Hilbert curve is a function of the resolution, as given by --cidr-netmask. After producing the visualization at that resolution, it may then be scaled to an arbitrary width and height using the following options. (This may introduce some distortions. At a minimum, it is recommended that width and height be equal, to preserve the aspect ratio of the original visualization.)

--width <num>
The width of the image, as a number of points or pixels.
--height <num>
The width of the image, as a number of points or pixels.

Selecting Colors

--num-colors <num>
The number of color “steps” to be used in the color gradient between the floor and the ceiling.
--floor-color <color>
A specification of the color to use for the “lowest” plotted value. (See ryspecs(5).)
--ceil-color <color>
A specification of the color to use for the “highest” plotted value. (See ryspecs(5).)
--basement-color <color>
A specification of the color to use for all points below the “floor” value. (See ryspecs(5).)
--attic-color <color>
A specification of the color to use for all points above the “ceiling” value. (See ryspecs(5).)
--binary-plot
Flag. Instead of plotting quantity on a color gradient, color each position on the curve containing any data. --ceil-color will be used as the color.
--background-color <color>
A specification of the color to use for the background of the visualization. (See ryspecs(5).)

Overlays

A user may overlay a PNG image highlighting regions of the curve, to provide the user with context when reading this visualization. ryhilbert ships with an overlay subdividing the 2^32 IPv4 addresses by the entities to which they were assigned by the Internet Assigned Numbers Authority.

--overlay-file <file>
A PNG image to use as the overlay. The dimensions of the image should be square, and should match the resolution of the image (given using --cider-netmask), such that both width and height of the image are 2^(x/2) pixels on a side, where x is the resolution given by --cidr-netmask.
--no-overlay
Flag. Don’t use an overlay image.

GENERAL OPTIONS

--quiet
Flag. Produce no output on standard output or standard error. Otherwise, certain warnings may be emitted for values ryhilbert deems questionable.

EXAMPLES

Create a PNG visualization using the default options:

ryhilbert --input-path foo.txt --output-path bar.png

Generate a binary plot from a SiLK IP set using the SiLK rwsetcat(1) tool:

rwsetcat foo.set | \
ryhilbert --input-path - --output-path bar.png --binary-plot

Generate a plot from a SiLK bag using the SiLK rwbagcat(1) tool. Lower values in light blue, higher values in dark blue, with values over the 90th percentile in red:

rwbagcat foo.bag | \
ryhilbert --input-path - --output-path bar.png \
    --floor-color 000088 \
    --ceil-color  0000ff \
    --ceil-pct 90 \
    --attic-color ff0000

SEE ALSO

rytools(5)