package mothra
- Alphabetic
- Public
- Protected
Package Members
- package analysis
- package datasources
This package contains the Mothra datasources, along with mechanisms for working with those datasources.
This package contains the Mothra datasources, along with mechanisms for working with those datasources. The primary novel feature of these datasources is the
fields
mechanism.To use the IPFIX or SiLK data sources, you can use the following methods added by the implicit CERTDataFrameReader on DataFrameReader after importing from this package:
import org.cert.netsa.mothra.datasources._ val silkDF = spark.read.silkFlow() // to read from the default SiLK repository val silkRepoDF = spark.read.silkFlow(repository="...") // to read from an alternate SiLK repository val silkFilesDF = spark.read.silkFlow("/path/to/silk/files") // to read from loose SiLK files val ipfixDF = spark.read.ipfix(repository="/path/to/mothra/data/dir") // for packed Mothra IPFIX data val ipfixS3DF = spark.read.ipfix(s3Repository="bucket-name") // for packed Mothra IPFIX data from an S3 bucket val ipfixFilesDF = spark.read.ipfix("/path/to/ipfix/files") // for loose IPFIX files
(The additional methods are defined on the implicit class CERTDataFrameReader.)
Using the
fields
method allows you to configure which SiLK or IPFIX fields you wish to retrieve. (This is particularly important for IPFIX data, as IPFIX files may contains many many possible fields organized in various ways.)import org.cert.netsa.mothra.datasources._ val silkDF = spark.read.fields("sIP", "dIP").silkFlow(...) val ipfixDF = spark.read.fields("sourceIPAddress", "destinationIPAddress").ipfix(...)
Both of these dataframes will contain only the source and destination IP addresses from the specified data sources. You may also provide column names different from the source field names:
val silkDF = spark.read.fields("server" -> "sIP", "client" -> "dIP").silkFlow(...) val ipfixDF = spark.read.fields("server" -> "sourceIPAddress", "client" -> "destinationIPAddress").ipfix(...)
You may also mix the mapped and the default names in one call:
val df = spark.read.fields("sIP", "dIP", "s" -> "sensor").silkFlow(...)
This is documentation for Mothra, a collection of Scala and Spark library functions for working with Internet-related data. Some modules contain APIs of general use to Scala programmers. Some modules make those tools more useful on Spark data-processing systems.
Please see the documentation for the individual packages for more details on their use.
Scala Packages
These packages are useful in Scala code without involving Spark:
org.cert.netsa.data
This package, which is collected as the
netsa-data
library, provides types for working with various kinds of information:org.cert.netsa.data.net
- types for working with network dataorg.cert.netsa.data.time
- types for working with time dataorg.cert.netsa.data.unsigned
- types for working with unsigned integral valuesorg.cert.netsa.io.ipfix
The
netsa-io-ipfix
library provides tools for reading and writing IETF IPFIX data from various connections and files.org.cert.netsa.io.silk
To read and write CERT NetSA SiLK file formats and configuration files, use the
netsa-io-silk
library.org.cert.netsa.util
The "junk drawer" of
netsa-util
so far provides only two features: First, a method for equipping Scala scala.collection.Iterators with exception handling. And second, a way to query the versions of NetSA libraries present in a JVM at runtime.Spark Packages
These packages require the use of Apache Spark:
org.cert.netsa.mothra.datasources
Spark datasources for CERT file types. This package contains utility features which add methods to Apache Spark DataFrameReader objects, allowing IPFIX and SiLK flows to be opened using simple
spark.read...
calls.The
mothra-datasources
library contains both IPFIX and SiLK functionality, whilemothra-datasources-ipfix
andmothra-datasources-silk
contain only what's needed for the named datasource.org.cert.netsa.mothra.analysis
A grab-bag of analysis helper functions and example analyses.
org.cert.netsa.mothra.functions
This single Scala object provides Spark SQL functions for working with network data. It is the entirety of the
mothra-functions
library.