package flow
A data source as defined by the Spark Data Source API for reading SiLK records from SiLK data spools and from loose files.
You can use this by importing org.cert.netsa.mothra.datasources._
like this:
import org.cert.netsa.mothra.datasources._ val df1 = spark.read.silkFlow() // to read from the default SiLK repository val df2 = spark.read.silkFlow("path/to/files") // to read from loose files val df3 = spark.read.silkFlow(..., repository="/path/to/repo") // to override the default repo val df4 = spark.read.silkFlow(..., configFile="/path/to/silk.conf") // to use a specific non-default silk.conf file
The default SiLK data repository location is defined by the JAVA
system property
org.cert.netsa.mothra.datasources.silk.defaultRepository
. The default
configuration file is silk.conf
under the default repository
directory.
If you don't have a SiLK data repository or a silk.conf file, you can still work with loose SiLK data files, however class, type, and sensor names will not be available. (Any numeric IDs in the input data will, however, still be usable.)
The SiLK flow datasource uses the fields mechanism from org.cert.netsa.mothra.datasources. You can make use of this mechanism like these examples:
import org.cert.netsa.mothra.datasources._ spark.read.fields("sTime", "eTime", "sIP", "dIP").silkFlow(...) spark.read.fields("sTime", "sId" -> "sensorId").silkFlow(...) import org.cert.netsa.mothra.datasources.silk.flow.SilkFields spark.read.fields(SilkFields.default, "sId" -> "sensorId").silkFlow(...)
with arbitrary sets of fields and field name mappings.
See SilkFields for details about the default set of fields.
SiLK field names match the names used by the SiLK rwcut
tool when
possible. Some fields which have variants in rwcut
, such as
duration
and dur+msec
, all mean the same thing in the SiLK flow
datasource, to the maximum available resolution. Also note that
unsigned numeric fields are generally one size larger in order to
accommodate values too large to be represented in their base signed
type. The full set of fields (along with aliases and Spark types)
are listed below:
"application"
: Int"attributes"
: Byte"bytes"
: Long"class"
: String"dIP"
: String"dPort"
: Int"duration"
,"dur"
,"dur+msec"
: Long (in milliseconds)"eTime"
,"eTime+msec"
: Timestamp"filename"
: String"flags"
: Byte"flowType"
: String"flowTypeId"
: Short"iCode"
: Short"iType"
: Short"in"
: Int"initialFlags"
: Byte"isIPv6"
: Boolean"memo"
: Short"nhIP"
: String"out"
: Int"packets"
,"pkts"
: Long"protocol"
: Short"sIP"
: String"sPort"
: Int"sTime"
,"sTime+msec"
: Timestamp"sensor"
: String"sensorId"
: Int"sessionFlags"
: Byte"type"
: String"filename"
: String
- Alphabetic
- By Inheritance
- flow
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Type Members
- class DefaultSource extends RelationProvider
A Spark datasource for working with SiLK flow data.
A Spark datasource for working with SiLK flow data. This is the entrypoint for Spark to call the datasource. See org.cert.netsa.mothra.datasources.silk.flow's documentation for details on how to use it as a user.
- trait GlobInfoSparkVerImpl extends AnyRef
Value Members
- object SilkFields
Useful collections of fields relevant to SiLK data sources.
This is documentation for Mothra, a collection of Scala and Spark library functions for working with Internet-related data. Some modules contain APIs of general use to Scala programmers. Some modules make those tools more useful on Spark data-processing systems.
Please see the documentation for the individual packages for more details on their use.
Scala Packages
These packages are useful in Scala code without involving Spark:
org.cert.netsa.data
This package, which is collected as the
netsa-data
library, provides types for working with various kinds of information:org.cert.netsa.data.net
- types for working with network dataorg.cert.netsa.data.time
- types for working with time dataorg.cert.netsa.data.unsigned
- types for working with unsigned integral valuesorg.cert.netsa.io.ipfix
The
netsa-io-ipfix
library provides tools for reading and writing IETF IPFIX data from various connections and files.org.cert.netsa.io.silk
To read and write CERT NetSA SiLK file formats and configuration files, use the
netsa-io-silk
library.org.cert.netsa.util
The "junk drawer" of
netsa-util
so far provides only two features: First, a method for equipping Scala scala.collection.Iterators with exception handling. And second, a way to query the versions of NetSA libraries present in a JVM at runtime.Spark Packages
These packages require the use of Apache Spark:
org.cert.netsa.mothra.datasources
Spark datasources for CERT file types. This package contains utility features which add methods to Apache Spark DataFrameReader objects, allowing IPFIX and SiLK flows to be opened using simple
spark.read...
calls.The
mothra-datasources
library contains both IPFIX and SiLK functionality, whilemothra-datasources-ipfix
andmothra-datasources-silk
contain only what's needed for the named datasource.org.cert.netsa.mothra.analysis
A grab-bag of analysis helper functions and example analyses.
org.cert.netsa.mothra.functions
This single Scala object provides Spark SQL functions for working with network data. It is the entirety of the
mothra-functions
library.