Packages

package root

This is documentation for Mothra, a collection of Scala and Spark library functions for working with Internet-related data.

This is documentation for Mothra, a collection of Scala and Spark library functions for working with Internet-related data. Some modules contain APIs of general use to Scala programmers. Some modules make those tools more useful on Spark data-processing systems.

Please see the documentation for the individual packages for more details on their use.

Scala Packages

These packages are useful in Scala code without involving Spark:

org.cert.netsa.data

This package, which is collected as the netsa-data library, provides types for working with various kinds of information:

org.cert.netsa.data.net - types for working with network data
org.cert.netsa.data.time - types for working with time data
org.cert.netsa.data.unsigned - types for working with unsigned integral values

org.cert.netsa.io.ipfix

The netsa-io-ipfix library provides tools for reading and writing IETF IPFIX data from various connections and files.

org.cert.netsa.io.silk

To read and write CERT NetSA SiLK file formats and configuration files, use the netsa-io-silk library.

org.cert.netsa.util

The "junk drawer" of netsa-util so far provides only two features: First, a method for equipping Scala Iterators with exception handling. And second, a way to query the versions of NetSA libraries present in a JVM at runtime.

Spark Packages

These packages require the use of Apache Spark:

org.cert.netsa.mothra.datasources

Spark datasources for CERT file types. This package contains utility features which add methods to Apache Spark DataFrameReader objects, allowing IPFIX and SiLK flows to be opened using simple spark.read... calls.

The mothra-datasources library contains both IPFIX and SiLK functionality, while mothra-datasources-ipfix and mothra-datasources-silk contain only what's needed for the named datasource.

org.cert.netsa.mothra.analysis

A grab-bag of analysis helper functions and example analyses.

org.cert.netsa.mothra.functions

This single Scala object provides Spark SQL functions for working with network data. It is the entirety of the mothra-functions library.

Definition Classes: root

package org

Definition Classes: root

package cert

Definition Classes: org

package netsa

Definition Classes: cert

package data

The org.cert.netsa.data.net package is for working with network-related data.

The org.cert.netsa.data.net package is for working with network-related data. This includes types for IP addresses, port numbers, protocol numbers, and the like. Many of these types have namespaces managed by IANA, and the types provide mechanisms for looking up names from numbers and vice-versa based on embedded copies of IANA's tables.

In org.cert.netsa.data.time you can find an Ordering for Java LocalDate objects, and a type LocalDateSet for working with sets of those dates.

Finally, org.cert.netsa.data.unsigned contains types for working with unsigned integer values.

Definition Classes: netsa

package io

Definition Classes: netsa

package ipfix

The ipfix package provides classes and objects for reading and writing IPFIX data.

Class / Object Overview

(For a quick overview of the IPFIX format, see the end of this description.)

The Message trait describes the attributes of an IPFIX message, and the CollectedMessage class and object are implementations of that trait when reading data. (Record export does not create specific Message instance.)

The IpfixSet abstract class and object hold the attributes of a Set. The TemplateSet class may represent a Template Set or an Options Template Set.

The Template class and object are used to represent a Template Record or an Options Template Record.

The IEFieldSpecifier class and object represent a Field Specifier within an existing Template. To search for a field within a Template, the user of the ipfix package creates a FieldSpec (the companion object) and attempts to find it within a Template.

The Field Specifier uses the numeric Identifier to identify an Information Element, and an Element is represented by the InfoElement class and object. The InfoModel class and object represent the Information Model.

To describe the attributes of an InfoElement, several support classes are defined: DataTypes is an enumeration that describes the type of data that the element contains, and DataType is a class that extracts a Field Value with that DataType. IESemantics describes the data semantics of an Information Element (e.g., a counter, an identifier, a set of flags), and IEUnits describes its units.

The Data Set is represented by the RecordSet class and object.

A Data Record is represented by the Record abstract class. This class has three subclasses:

The CollectedRecord class and object are its implementation when reading data. Its members are always referenced by numeric position.
The ArrayRecord (I do not like this name) and object may be used to build a Record from Scala objects; its fields are also referenced by numeric position.
ExportRecord is an abstract class that also supports building a Record from Scala objects. The user extends the class and uses the IPFIXExtract annotation to mark the members of the subclass that are to be used when writing the Record.

A user-defined class that extends Fillable trait may use the Record's fill() method to copy fields from a Record to the user's class. It also uses the IPFIXExtract annotation.

A Structured Data Field Value in a Data Record is represented by the ListElement abstract class. That abstract class has three abstract subclasses, and each of those has two concrete subclasses (one for reading and one for writing):

The BasicList abstract class (object) has subclasses CollectedBasicList and ExportBasicList.
The SubTemplateList abstract class (object) has subclasses CollectedSubTemplateList and ExportSubTemplateList.
The SubTemplateMultiList abstract class (object) has subclasses CollectedSubTemplateMultiList and ExportSubTemplateMultiList.

Reading data

When reading data, a Record instance is returned by a RecordReader. The RecordReader uses a class that extends the MessageReader trait. The ipfix package includes two: ByteBufferMessageReader and StreamMessageReader.

A Session value represent an IPFIX session, which is part of a SessionGroup.

Writing data

For writing data, an instance of an ExportStream must be created using a Session and the destination FileChannel. The user adds Records or Templates to the ExportStream and they are written to the FileChannel.

Overview of IPFIX

An IPFIX stream is composed of Messages. Each Message has a 16-byte Message Header followed by one or more Sets. There are three types of Sets: A Data Set, a Template Set, and an Options Template Set.

Each Set has a 4-byte set header followed by one or more Records. A Data Set contains Data Records and a Template Set contains Template Records.

A Template Record describes the shape of the data that appears in a Data Record. A Template Record contains a 4-byte header followed by zero or more Field Specifiers. Each Field Specifier is either a 4-byte or an 8-byte value that describes a field in the Data Record.

A Field Specifier has two parts. The first is the numeric Information Element Identifier that is defined in an Information Model. The second is the number of octets the field occupies in the Data Record.

A Data Set contains one or more Data Records of the same type, where the type is determined by the Template Record that the Data Set Header refers to. Each Data Record contains one or more Field Values, where the order and length of the Field Values is given by the Template.

A Field Value in a Data Record may be a Structured Data. There are three types of Structured Data:

A Basic List contains one or more instances of a Single Information Element.
A SubTemplateList references a single Template ID, and it contains one or more Records that match that Template.
The SubTemplateMultiList contains a series of Template IDs and Records that match that Template ID.

An IPFIX stream exists in a Transport Session, where a Transport Session is part of a Session Group. All Sessions in a Session Group use the same Transport Protocol, and only differ in the numeric Observation Domain that is part of the Message Header.

package silk

SiLK file formats, data types, and methods to read them, including support for reading them from Spark.

RWRec is the type of SiLK flow records.

You can use RWRecReader to read SiLK files from Scala, including compressed files if Hadoop native libraries are available. For example:

import org.cert.netsa.io.silk.RWRecReader
import java.io.FileInputStream

val inputFile = new FileInputStream("path/to/silk/rw/file")

for ( rec <- RWRecReader.ofInputStream(inputFile) ) {
  println(rec.sIP)
}

See also: org.cert.netsa.mothra.datasources.silk.flow for working with SiLK data in Spark using the Mothra SiLK datasource.

package mothra

Definition Classes: netsa

package util

Definition Classes: netsa

org.cert.netsa

io

package io

Package Members

package ipfix
The ipfix package provides classes and objects for reading and writing IPFIX data.
The ipfix package provides classes and objects for reading and writing IPFIX data.
Class / Object Overview
(For a quick overview of the IPFIX format, see the end of this description.)
The Message trait describes the attributes of an IPFIX message, and the CollectedMessage class and object are implementations of that trait when reading data. (Record export does not create specific Message instance.)
The IpfixSet abstract class and object hold the attributes of a Set. The TemplateSet class may represent a Template Set or an Options Template Set.
The Template class and object are used to represent a Template Record or an Options Template Record.
The IEFieldSpecifier class and object represent a Field Specifier within an existing Template. To search for a field within a Template, the user of the ipfix package creates a FieldSpec (the companion object) and attempts to find it within a Template.
The Field Specifier uses the numeric Identifier to identify an Information Element, and an Element is represented by the InfoElement class and object. The InfoModel class and object represent the Information Model.
To describe the attributes of an InfoElement, several support classes are defined: DataTypes is an enumeration that describes the type of data that the element contains, and DataType is a class that extracts a Field Value with that DataType. IESemantics describes the data semantics of an Information Element (e.g., a counter, an identifier, a set of flags), and IEUnits describes its units.
The Data Set is represented by the RecordSet class and object.
A Data Record is represented by the Record abstract class. This class has three subclasses:
1. The CollectedRecord class and object are its implementation when reading data. Its members are always referenced by numeric position.
2. The ArrayRecord (I do not like this name) and object may be used to build a Record from Scala objects; its fields are also referenced by numeric position.
3. ExportRecord is an abstract class that also supports building a Record from Scala objects. The user extends the class and uses the IPFIXExtract annotation to mark the members of the subclass that are to be used when writing the Record.
A user-defined class that extends Fillable trait may use the Record's fill() method to copy fields from a Record to the user's class. It also uses the IPFIXExtract annotation.
A Structured Data Field Value in a Data Record is represented by the ListElement abstract class. That abstract class has three abstract subclasses, and each of those has two concrete subclasses (one for reading and one for writing):
1. The BasicList abstract class (object) has subclasses CollectedBasicList and ExportBasicList.
2. The SubTemplateList abstract class (object) has subclasses CollectedSubTemplateList and ExportSubTemplateList.
3. The SubTemplateMultiList abstract class (object) has subclasses CollectedSubTemplateMultiList and ExportSubTemplateMultiList.
Reading data
When reading data, a Record instance is returned by a RecordReader. The RecordReader uses a class that extends the MessageReader trait. The ipfix package includes two: ByteBufferMessageReader and StreamMessageReader.
A Session value represent an IPFIX session, which is part of a SessionGroup.
Writing data
For writing data, an instance of an ExportStream must be created using a Session and the destination FileChannel. The user adds Records or Templates to the ExportStream and they are written to the FileChannel.
Overview of IPFIX
An IPFIX stream is composed of Messages. Each Message has a 16-byte Message Header followed by one or more Sets. There are three types of Sets: A Data Set, a Template Set, and an Options Template Set.
Each Set has a 4-byte set header followed by one or more Records. A Data Set contains Data Records and a Template Set contains Template Records.
A Template Record describes the shape of the data that appears in a Data Record. A Template Record contains a 4-byte header followed by zero or more Field Specifiers. Each Field Specifier is either a 4-byte or an 8-byte value that describes a field in the Data Record.
A Field Specifier has two parts. The first is the numeric Information Element Identifier that is defined in an Information Model. The second is the number of octets the field occupies in the Data Record.
A Data Set contains one or more Data Records of the same type, where the type is determined by the Template Record that the Data Set Header refers to. Each Data Record contains one or more Field Values, where the order and length of the Field Values is given by the Template.
A Field Value in a Data Record may be a Structured Data. There are three types of Structured Data:
1. A Basic List contains one or more instances of a Single Information Element.
2. A SubTemplateList references a single Template ID, and it contains one or more Records that match that Template.
3. The SubTemplateMultiList contains a series of Template IDs and Records that match that Template ID.
An IPFIX stream exists in a Transport Session, where a Transport Session is part of a Session Group. All Sessions in a Session Group use the same Transport Protocol, and only differ in the numeric Observation Domain that is part of the Message Header.
package silk
SiLK file formats, data types, and methods to read them, including support for reading them from Spark.
SiLK file formats, data types, and methods to read them, including support for reading them from Spark.
RWRec is the type of SiLK flow records.
You can use RWRecReader to read SiLK files from Scala, including compressed files if Hadoop native libraries are available. For example:
```
import org.cert.netsa.io.silk.RWRecReader
import java.io.FileInputStream

val inputFile = new FileInputStream("path/to/silk/rw/file")

for ( rec <- RWRecReader.ofInputStream(inputFile) ) {
  println(rec.sIP)
}
```
See also
org.cert.netsa.mothra.datasources.silk.flow for working with SiLK data in Spark using the Mothra SiLK datasource.

Packages

Scala Packages

org.cert.netsa.data

org.cert.netsa.io.ipfix

org.cert.netsa.io.silk

org.cert.netsa.util

Spark Packages

org.cert.netsa.mothra.datasources

org.cert.netsa.mothra.analysis

org.cert.netsa.mothra.functions

Class / Object Overview

Reading data

Writing data

Overview of IPFIX

io

package io

Package Members

Class / Object Overview

Reading data

Writing data

Overview of IPFIX

Ungrouped

io