implicit class CERTDataFrameReader extends AnyRef
Additional methods for Spark DataFrameReader to enable reading IPFIX and SiLK data, and to allow specifying fields for IPFIX and SiLK datasources.
- Alphabetic
- By Inheritance
- CERTDataFrameReader
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new CERTDataFrameReader(self: DataFrameReader)
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def fields(fs: FieldsSpec*): DataFrameReader
Specifies the fields which will be used by the IPFIX or SiLK flow datasources.
Specifies the fields which will be used by the IPFIX or SiLK flow datasources. See the individual datasources for more details.
- See also
IPFIX datasource for IPFIX fields
SiLK flow datasource for SiLK flow fields
fields.FieldsSpec object for ways to package groups of fields
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def ipfix(path: String = "", repository: String = "", s3Repository: String = "", s3Prefix: String = "", numSlices: Int = 0, targetSize: Long = 0L, debugMode: String = ""): DataFrame
Loads IPFIX data from files or from a repository in HDFS or S3.
Loads IPFIX data from files or from a repository in HDFS or S3.
NOTE: If
debugMode
is given, *no result data will be generated*.- path
Hadoop path pattern for files to be read directly
- repository
Hadoop path for filesystem repository
- s3Repository
S3 bucket name for IPFiX repository
- s3Prefix
Prefix to S3 keys for repository items
- numSlices
Target parallelism
- targetSize
Target size of slice for parallelism in bytes
- debugMode
Mode and location for debugging (INTERNAL)
- See also
IPFIX datasource for details
org.cert.netsa.mothra.datasources.ipfix.Debug for information about
debugMode
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def silkFlow(path: String = "", repository: String = "", configFile: String = ""): DataFrame
Loads SiLK flow data from files or from a SiLK repository in HDFS.
Loads SiLK flow data from files or from a SiLK repository in HDFS.
- path
Hadoop path pattern for files to be read directly
- repository
Hadoop path for SiLK repository data rootdir
- configFile
Hadoop path for
silk.conf
config file
- See also
SiLK flow datasource for details
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
This is documentation for Mothra, a collection of Scala and Spark library functions for working with Internet-related data. Some modules contain APIs of general use to Scala programmers. Some modules make those tools more useful on Spark data-processing systems.
Please see the documentation for the individual packages for more details on their use.
Scala Packages
These packages are useful in Scala code without involving Spark:
org.cert.netsa.data
This package, which is collected as the
netsa-data
library, provides types for working with various kinds of information:org.cert.netsa.data.net
- types for working with network dataorg.cert.netsa.data.time
- types for working with time dataorg.cert.netsa.data.unsigned
- types for working with unsigned integral valuesorg.cert.netsa.io.ipfix
The
netsa-io-ipfix
library provides tools for reading and writing IETF IPFIX data from various connections and files.org.cert.netsa.io.silk
To read and write CERT NetSA SiLK file formats and configuration files, use the
netsa-io-silk
library.org.cert.netsa.util
The "junk drawer" of
netsa-util
so far provides only two features: First, a method for equipping Scala scala.collection.Iterators with exception handling. And second, a way to query the versions of NetSA libraries present in a JVM at runtime.Spark Packages
These packages require the use of Apache Spark:
org.cert.netsa.mothra.datasources
Spark datasources for CERT file types. This package contains utility features which add methods to Apache Spark DataFrameReader objects, allowing IPFIX and SiLK flows to be opened using simple
spark.read...
calls.The
mothra-datasources
library contains both IPFIX and SiLK functionality, whilemothra-datasources-ipfix
andmothra-datasources-silk
contain only what's needed for the named datasource.org.cert.netsa.mothra.analysis
A grab-bag of analysis helper functions and example analyses.
org.cert.netsa.mothra.functions
This single Scala object provides Spark SQL functions for working with network data. It is the entirety of the
mothra-functions
library.