Packages

package root

This is documentation for Mothra, a collection of Scala and Spark library functions for working with Internet-related data.

This is documentation for Mothra, a collection of Scala and Spark library functions for working with Internet-related data. Some modules contain APIs of general use to Scala programmers. Some modules make those tools more useful on Spark data-processing systems.

Please see the documentation for the individual packages for more details on their use.

Scala Packages

These packages are useful in Scala code without involving Spark:

org.cert.netsa.data

This package, which is collected as the netsa-data library, provides types for working with various kinds of information:

org.cert.netsa.data.net - types for working with network data
org.cert.netsa.data.time - types for working with time data
org.cert.netsa.data.unsigned - types for working with unsigned integral values

org.cert.netsa.io.ipfix

The netsa-io-ipfix library provides tools for reading and writing IETF IPFIX data from various connections and files.

org.cert.netsa.io.silk

To read and write CERT NetSA SiLK file formats and configuration files, use the netsa-io-silk library.

org.cert.netsa.util

The "junk drawer" of netsa-util so far provides only two features: First, a method for equipping Scala Iterators with exception handling. And second, a way to query the versions of NetSA libraries present in a JVM at runtime.

Spark Packages

These packages require the use of Apache Spark:

org.cert.netsa.mothra.datasources

Spark datasources for CERT file types. This package contains utility features which add methods to Apache Spark DataFrameReader objects, allowing IPFIX and SiLK flows to be opened using simple spark.read... calls.

The mothra-datasources library contains both IPFIX and SiLK functionality, while mothra-datasources-ipfix and mothra-datasources-silk contain only what's needed for the named datasource.

org.cert.netsa.mothra.analysis

A grab-bag of analysis helper functions and example analyses.

org.cert.netsa.mothra.functions

This single Scala object provides Spark SQL functions for working with network data. It is the entirety of the mothra-functions library.

Definition Classes: root

package org

Definition Classes: root

package cert

Definition Classes: org

package netsa

Definition Classes: cert

package mothra

Definition Classes: netsa

package datasources

This package contains the Mothra datasources, along with mechanisms for working with those datasources.

This package contains the Mothra datasources, along with mechanisms for working with those datasources. The primary novel feature of these datasources is the fields mechanism.

To use the IPFIX or SiLK data sources, you can use the following methods added by the implicit CERTDataFrameReader on DataFrameReader after importing from this package:

import org.cert.netsa.mothra.datasources.*
val silkDF = spark.read.silkFlow()                                    // to read from the default SiLK repository
val silkRepoDF = spark.read.silkFlow(repository="...")                // to read from an alternate SiLK repository
val silkFilesDF = spark.read.silkFlow("/path/to/silk/files")          // to read from loose SiLK files
val ipfixDF = spark.read.ipfix(repository="/path/to/mothra/data/dir") // for packed Mothra IPFIX data
val ipfixS3DF = spark.read.ipfix(s3Repository="bucket-name")          // for packed Mothra IPFIX data from an S3 bucket
val ipfixFilesDF = spark.read.ipfix("/path/to/ipfix/files")           // for loose IPFIX files

(The additional methods are defined on the implicit class CERTDataFrameReader.)

Using the fields method allows you to configure which SiLK or IPFIX fields you wish to retrieve. (This is particularly important for IPFIX data, as IPFIX files may contains many many possible fields organized in various ways.)

import org.cert.netsa.mothra.datasources.*
val silkDF = spark.read.fields("sIP", "dIP").silkFlow(...)
val ipfixDF = spark.read.fields("sourceIPAddress", "destinationIPAddress").ipfix(...)

Both of these dataframes will contain only the source and destination IP addresses from the specified data sources. You may also provide column names different from the source field names:

val silkDF = spark.read.fields("server" -> "sIP", "client" -> "dIP").silkFlow(...)
val ipfixDF = spark.read.fields("server" -> "sourceIPAddress", "client" -> "destinationIPAddress").ipfix(...)

You may also mix the mapped and the default names in one call:

val df = spark.read.fields("sIP", "dIP", "s" -> "sensor").silkFlow(...)

Definition Classes: mothra
See also: IPFIX datasource
SiLK flow datasource

package silk

Definition Classes: datasources

package flow

A data source as defined by the Spark Data Source API for reading SiLK records from SiLK data spools and from loose files.

You can use this by importing org.cert.netsa.mothra.datasources.* like this:

import org.cert.netsa.mothra.datasources.*
val df1 = spark.read.silkFlow() // to read from the default SiLK repository
val df2 = spark.read.silkFlow("path/to/files") // to read from loose files
val df3 = spark.read.silkFlow(..., repository="/path/to/repo") // to override the default repo
val df4 = spark.read.silkFlow(..., configFile="/path/to/silk.conf") // to use a specific non-default silk.conf file

The default SiLK data repository location is defined by the JAVA system property org.cert.netsa.mothra.datasources.silk.defaultRepository. The default configuration file is silk.conf under the default repository directory.

If you don't have a SiLK data repository or a silk.conf file, you can still work with loose SiLK data files, however class, type, and sensor names will not be available. (Any numeric IDs in the input data will, however, still be usable.)

The SiLK flow datasource uses the fields mechanism from org.cert.netsa.mothra.datasources. You can make use of this mechanism like these examples:

import org.cert.netsa.mothra.datasources.*
spark.read.fields("sTime", "eTime", "sIP", "dIP").silkFlow(...)
spark.read.fields("sTime", "sId" -> "sensorId").silkFlow(...)

import org.cert.netsa.mothra.datasources.silk.flow.SilkFields
spark.read.fields(SilkFields.default, "sId" -> "sensorId").silkFlow(...)

with arbitrary sets of fields and field name mappings.

See SilkFields for details about the default set of fields.

SiLK field names match the names used by the SiLK rwcut tool when possible. Some fields which have variants in rwcut, such as duration and dur+msec, all mean the same thing in the SiLK flow datasource, to the maximum available resolution. Also note that unsigned numeric fields are generally one size larger in order to accommodate values too large to be represented in their base signed type. The full set of fields (along with aliases and Spark types) are listed below:

"application": Int
"attributes": Byte
"bytes": Long
"class": String
"dIP": String
"dPort": Int
"duration", "dur", "dur+msec": Long (in milliseconds)
"eTime", "eTime+msec": Timestamp
"filename": String
"flags": Byte
"flowType": String
"flowTypeId": Short
"iCode": Short
"iType": Short
"in": Int
"initialFlags": Byte
"isIPv6": Boolean
"memo": Short
"nhIP": String
"out": Int
"packets", "pkts": Long
"protocol": Short
"sIP": String
"sPort": Int
"sTime", "sTime+msec": Timestamp
"sensor": String
"sensorId": Int
"sessionFlags": Byte
"type": String
"filename": String

Definition Classes: silk

DefaultSource

GlobInfoSparkVerImpl

SilkFields

org.cert.netsa.mothra.datasources.silk

flow

package flow

A data source as defined by the Spark Data Source API for reading SiLK records from SiLK data spools and from loose files.

You can use this by importing org.cert.netsa.mothra.datasources.* like this:

import org.cert.netsa.mothra.datasources.*
val df1 = spark.read.silkFlow() // to read from the default SiLK repository
val df2 = spark.read.silkFlow("path/to/files") // to read from loose files
val df3 = spark.read.silkFlow(..., repository="/path/to/repo") // to override the default repo
val df4 = spark.read.silkFlow(..., configFile="/path/to/silk.conf") // to use a specific non-default silk.conf file

The SiLK flow datasource uses the fields mechanism from org.cert.netsa.mothra.datasources. You can make use of this mechanism like these examples:

import org.cert.netsa.mothra.datasources.*
spark.read.fields("sTime", "eTime", "sIP", "dIP").silkFlow(...)
spark.read.fields("sTime", "sId" -> "sensorId").silkFlow(...)

import org.cert.netsa.mothra.datasources.silk.flow.SilkFields
spark.read.fields(SilkFields.default, "sId" -> "sensorId").silkFlow(...)

with arbitrary sets of fields and field name mappings.

See SilkFields for details about the default set of fields.

"application": Int
"attributes": Byte
"bytes": Long
"class": String
"dIP": String
"dPort": Int
"duration", "dur", "dur+msec": Long (in milliseconds)
"eTime", "eTime+msec": Timestamp
"filename": String
"flags": Byte
"flowType": String
"flowTypeId": Short
"iCode": Short
"iType": Short
"in": Int
"initialFlags": Byte
"isIPv6": Boolean
"memo": Short
"nhIP": String
"out": Int
"packets", "pkts": Long
"protocol": Short
"sIP": String
"sPort": Int
"sTime", "sTime+msec": Timestamp
"sensor": String
"sensorId": Int
"sessionFlags": Byte
"type": String
"filename": String

Linear Supertypes

AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

flow
AnyRef
Any

Hide All
Show All

Visibility

Public
Protected

Type Members

class DefaultSource extends RelationProvider
A Spark datasource for working with SiLK flow data.
A Spark datasource for working with SiLK flow data. This is the entrypoint for Spark to call the datasource. See org.cert.netsa.mothra.datasources.silk.flow's documentation for details on how to use it as a user.
trait GlobInfoSparkVerImpl extends AnyRef

Packages

Scala Packages

org.cert.netsa.data

org.cert.netsa.io.ipfix

org.cert.netsa.io.silk

org.cert.netsa.util

Spark Packages

org.cert.netsa.mothra.datasources

org.cert.netsa.mothra.analysis

org.cert.netsa.mothra.functions

flow

package flow

Type Members

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

Scala Packages

org.cert.netsa.data

org.cert.netsa.io.ipfix

org.cert.netsa.io.silk

org.cert.netsa.util

Spark Packages

org.cert.netsa.mothra.datasources

org.cert.netsa.mothra.analysis

org.cert.netsa.mothra.functions

flow

package flow

Type Members

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

flow