rydataformat

DESCRIPTION

The rytools take the Rayon data format as input. The Rayon data format is primarily delimited text, with extensions to permit the description and proper typing of input data.

USING SILK OUTPUT WITH RAYON

The text format of most SiLK tools’ output is a subset of the Rayon input format, provided the --no-columns option is provided to the tool. This data will contain no type hinting or column names. To retain the SiLK column names, it is sufficient to prepend the SiLK output with a single octothorpe (#). Rayon will generally infer the correct types from SiLK input data; if SiLK plugins altering display are used, it may rarely be necessary to prepend the SiLK output with a header line describing the data. (See SPECIAL HEADERS.)

BASIC FORMAT

Rayon’s default delimiter is the pipe character (|). The delimiting character must not appear in the input. An example of valid input is:

foo|1|2
bar|3|4

Whitespace around delimiters will be removed, so the following is equivalent to the above:

foo| 1| 2
bar| 3| 4

Whitespace lines are also ignored, so this is also equivalent to the above:

foo|1|2

bar|3|4

The delimiter may be changed using the Delimiter header. (See SPECIAL HEADERS.)

COMMENTS

Lines beginning with an octothorpe (#) are comments, and are ignored (with the exceptions outlined in SPECIAL HEADERS and COLUMN NAMES). Therefore, this is also equivalent to the above:

foo| 1| 2
# this line will be ignored
bar| 3| 4

The Rayon data format does not support infix or postfix comments. In the following, the text # this is not will be interpreted as content:

# this is a comment
foo | a | a
bar | b | b # this is not

HEADERS

Headers are special comments at the top of the file (or beginning of the stream) that start with exactly two octothorpes (##). Headers are used to store metadata about a dataset.

A header contains a name, followed by a colon, followed by a value. There may be whitespace between the colon and either the name or value. Here is an example of a header:

## Description: This is an example dataset
foo| 1| 2
bar| 3| 4

This data set contains a header named Description. The header has the value This is an example dataset.

Header names may contain the upper- and lower-case letters, numbers, underscore (_) and hyphen (-). Header values may contain any of the ASCII character set between 0x20 and 0x7e, inclusive. Certain header names are reserved, specifically those in SPECIAL HEADERS and names beginning with Rayon-; notwithstanding these restrictions, users may create arbitrary headers as they see fit. Header case will be preserved, but header lookups are case-insensitive.

The first line of data will terminate the processing of headers; any subsequent lines beginning with any number of octothorpes will be treated as a comment.

SPECIAL HEADERS

Some headers have special meaning when parsed. For instance, the Delimiter header may be used to change the delimiting character of the file:

## Delimiter: ,
foo,1,2
bar,3,4

The following header names have special meanings:

Title
The title of the dataset
Delimiter
A single character to be used as the delimiting character between items in a row.
Typemap
A set of type names used to convert data in the file from text to a native data type.
man-rydata-column-names
A list of column names, delimited by the same character as the data. (See COLUMN NAMES)

COLUMN NAMES

Column names may be specified with the Column-Names header:

## Column-Names: label|value1|value2
foo|1|2
bar|3|4

This input will generate a dataset with the column names “label”, “value1” and “value2”, respectively.

As a convenience, there is an alternate syntax for specifying column names. The last comment line before the first data row may optionally specify the names of the columns in the dataset. If the last comment line before the first data row is delimited with the delimiting character and contains as many elements as the first data line of the file, its contents will be used as the names of the dataset columns. The following is equivalent to the previous example:

# label| value1| value2
foo|1|2
bar|3|4

As with headers, whitespace between the comment character and the column name designation will be ignored, but multiple comment characters will probably give unwanted results. Thus, the following is legal:

#label| value1| value2
foo|1|2
bar|3|4

The following is also legal; the dataset ignores whitespace surrounding column names:

#label|value1|value2
foo|1|2
bar|3|4