Packages

  • package root

    This is documentation for Mothra, a collection of Scala and Spark library functions for working with Internet-related data.

    This is documentation for Mothra, a collection of Scala and Spark library functions for working with Internet-related data. Some modules contain APIs of general use to Scala programmers. Some modules make those tools more useful on Spark data-processing systems.

    Please see the documentation for the individual packages for more details on their use.

    Scala Packages

    These packages are useful in Scala code without involving Spark:

    org.cert.netsa.data

    This package, which is collected as the netsa-data library, provides types for working with various kinds of information:

    org.cert.netsa.io.ipfix

    The netsa-io-ipfix library provides tools for reading and writing IETF IPFIX data from various connections and files.

    org.cert.netsa.io.silk

    To read and write CERT NetSA SiLK file formats and configuration files, use the netsa-io-silk library.

    org.cert.netsa.util

    The "junk drawer" of netsa-util so far provides only two features: First, a method for equipping Scala scala.collection.Iterators with exception handling. And second, a way to query the versions of NetSA libraries present in a JVM at runtime.

    Spark Packages

    These packages require the use of Apache Spark:

    org.cert.netsa.mothra.datasources

    Spark datasources for CERT file types. This package contains utility features which add methods to Apache Spark DataFrameReader objects, allowing IPFIX and SiLK flows to be opened using simple spark.read... calls.

    The mothra-datasources library contains both IPFIX and SiLK functionality, while mothra-datasources-ipfix and mothra-datasources-silk contain only what's needed for the named datasource.

    org.cert.netsa.mothra.analysis

    A grab-bag of analysis helper functions and example analyses.

    org.cert.netsa.mothra.functions

    This single Scala object provides Spark SQL functions for working with network data. It is the entirety of the mothra-functions library.

    Definition Classes
    root
  • package org
    Definition Classes
    root
  • package cert
    Definition Classes
    org
  • package netsa
    Definition Classes
    cert
  • package mothra
    Definition Classes
    netsa
  • package datasources

    This package contains the Mothra datasources, along with mechanisms for working with those datasources.

    This package contains the Mothra datasources, along with mechanisms for working with those datasources. The primary novel feature of these datasources is the fields mechanism.

    To use the IPFIX or SiLK data sources, you can use the following methods added by the implicit CERTDataFrameReader on DataFrameReader after importing from this package:

    import org.cert.netsa.mothra.datasources._
    val silkDF = spark.read.silkFlow()                                    // to read from the default SiLK repository
    val silkRepoDF = spark.read.silkFlow(repository="...")                // to read from an alternate SiLK repository
    val silkFilesDF = spark.read.silkFlow("/path/to/silk/files")          // to read from loose SiLK files
    val ipfixDF = spark.read.ipfix(repository="/path/to/mothra/data/dir") // for packed Mothra IPFIX data
    val ipfixS3DF = spark.read.ipfix(s3Repository="bucket-name")          // for packed Mothra IPFIX data from an S3 bucket
    val ipfixFilesDF = spark.read.ipfix("/path/to/ipfix/files")           // for loose IPFIX files

    (The additional methods are defined on the implicit class CERTDataFrameReader.)

    Using the fields method allows you to configure which SiLK or IPFIX fields you wish to retrieve. (This is particularly important for IPFIX data, as IPFIX files may contains many many possible fields organized in various ways.)

    import org.cert.netsa.mothra.datasources._
    val silkDF = spark.read.fields("sIP", "dIP").silkFlow(...)
    val ipfixDF = spark.read.fields("sourceIPAddress", "destinationIPAddress").ipfix(...)

    Both of these dataframes will contain only the source and destination IP addresses from the specified data sources. You may also provide column names different from the source field names:

    val silkDF = spark.read.fields("server" -> "sIP", "client" -> "dIP").silkFlow(...)
    val ipfixDF = spark.read.fields("server" -> "sourceIPAddress", "client" -> "destinationIPAddress").ipfix(...)

    You may also mix the mapped and the default names in one call:

    val df = spark.read.fields("sIP", "dIP", "s" -> "sensor").silkFlow(...)
    Definition Classes
    mothra
    See also

    IPFIX datasource

    SiLK flow datasource

  • package ipfix

    A data source as defined by the Spark Data Source API for reading IPFIX records from Mothra data spools and from loose files.

    A data source as defined by the Spark Data Source API for reading IPFIX records from Mothra data spools and from loose files.

    You can use this by importing org.cert.netsa.mothra.datasources._ like this:

    import org.cert.netsa.mothra.datasources._
    val df1 = spark.read.ipfix("path/to/mothra/data/dir") // for packed Mothra IPFIX data
    val df2 = spark.read.ipfix("path/to/ipfix/files")     // for loose IPFIX files

    The IPFIX datasource uses the fields mechanism from org.cert.netsa.mothra.datasources. You can make use of this mechanism like these examples:

    import org.cert.netsa.mothra.datasources._
    val df1 = spark.read.fields(
      "startTime", "endTime", "sourceIPAddress", "destinationIPAddress"
    ).ipfix(...)
    
    val df2 = spark.read.fields(
      "startTime", "endTime", "TOS" -> "ipClassOfService"
    ).ipfix(...)

    with arbitrary sets of fields and field name mappings.

    Default Fields

    The default set of fields (defined in IPFIXFields.default) is:

    • "startTime" -> "func:startTime"
    • "endTime" -> "func:endTime"
    • "sourceIPAddress" -> "func:sourceIPAddress"
    • "sourcePort" -> "func:sourcePort"
    • "destinationIPAddress" -> "func:destinationIPAddress"
    • "destinationPort" -> "func:destinationPort"
    • "protocolIdentifier"
    • "observationDomainId"
    • "vlanId"
    • "reverseVlanId"
    • "silkAppLabel"
    • "packetCount" -> "packetTotalCount|packetDeltaCount"
    • "reversePacketCount" -> "reversePacketTotalCount|reversePacketDeltaCount"
    • "octetCount" -> "octetTotalCount|octetDeltaDcount"
    • "reverseOctetCount" -> "reverseOctetTotalCount|reverseOctetDeltaCount"
    • "initialTCPFlags"
    • "reverseInitialTCPFlags"
    • "unionTCPFlags"
    • "reverseUnionTCPFlags"

    Some of these defaults are defined simply as IPFIX Information Elements. For example, "protocolIdentifier" and "vlanId" are exactly the Information Elements that are named. No "right-hand-side" is given for these definitions, because the name of the field is the same as the name of the Information Element.

    Others have simple expressions. For example, packetCount is defined as "packetTotalCount|packetDeltaCount". This expressions means that the value should be found from the packetTotalCount IE, or if that is not set from the packetDeltaCount IE. This allows this field to be used regardless of which Information Element contains the data.

    Some others are derived in more complex ways from basic IPFIX fields. For example, the startTime field is produced using "func:startTime", which runs the "gauntlet of time" to determine the start time for a flow by whatever means possible. Other time fields are similarly defined.

    Some of the "func:..." fields are actually quite simple. For example, "func:sourceIPAddress", practically speaking, is the same as "sourceIPv4Address|sourceIPv6Address". However, these fields are defined using the func: extension mechanism so that partitioning on them is possible. (This restriction may be lifted in a future Mothra version.)

    Field Types

    The mappings between IPFIX types and Spark types are:

    • octetArray → Array[Byte]
    • unsigned8 → Short
    • unsigned16 → Int
    • unsigned32 → Long
    • unsigned64 → Long
    • signed8 → Byte
    • signed16 → Short
    • signed32 → Int
    • signed64 → Long
    • float32 → Float
    • float64 → Double
    • boolean → Boolean
    • macAddress → String
    • string → String
    • dateTimeSeconds → Timestamp
    • dateTimeMilliseconds → Timestamp
    • dateTimeMicroseconds → Timestamp
    • dateTimeNanoseconds → Timestamp
    • ipv4Address → String
    • ipv6Address → String

    IPFIX's basicList, subTemplateList, and subTemplateMultiList data types are handled differently.

    Field Expressions

    As noted above, field expressions may contain simple IPFIX Information Element names, or collections of names separated by pipe characters to indicate taking the first matching choice. This language has a number of other capabilities which are documented for now in the IPFIX field parser object.

    Functional Fields

    A number of pre-defined "functional fields" are available. Some of these combine other information elements in ways that the expression language cannot (applying the so-called "gauntlet of time", for example). Others provide support for the Mothra repository partitioning system. And finally, a few are for debugging purposes and provide high-level overviews of IPFIX records or point to file locations on disk.

    Function fields are all defined and described in the org.cert.netsa.mothra.datasources.ipfix.fields.func package.

    Definition Classes
    datasources
  • package fields

    Most of these classes and traits relate to the definition of IPFIX fields as IPFIX record processing objects.

    Most of these classes and traits relate to the definition of IPFIX fields as IPFIX record processing objects.

    The IPFIXFieldParsing object defines the parser used for IPFIX field expressions, and includes the documentation for that language.

    Other mechanisms, including implementations of the IPFIXField trait, provide the ability to define new "function" fields and register them into the Func registry. This is an experimental capabilty and is likely to be deprecated and then removed from public access in the future.

    Definition Classes
    ipfix
    Note

    This is an experimental interface and is likely to be removed or made private in a future version.

  • package func

    The objects in this package represent "function fields" usable in the IPFIX data source.

    The objects in this package represent "function fields" usable in the IPFIX data source. These fields may each be accessed by the field expression "func:<fieldName>", for example "func:recordInfo" in a field expression will produce a human-readable string record as described below.

    See also

    org.cert.netsa.mothra.datasources.ipfix for examples of field expressions

    IPFIXFieldParsing for details about the field path grammar

  • ArrayField
  • BasicListField
  • DeepTemplateField
  • Func
  • IPFIXField
  • IPFIXFieldParsing
  • IPFIXFieldSparkVerImpl
  • InfoElementField
  • MapField
  • MatchField
  • SimpleField
  • StructField
  • SubTemplateField
  • TimeGauntlet
  • UnionField

package fields

Most of these classes and traits relate to the definition of IPFIX fields as IPFIX record processing objects.

The IPFIXFieldParsing object defines the parser used for IPFIX field expressions, and includes the documentation for that language.

Other mechanisms, including implementations of the IPFIXField trait, provide the ability to define new "function" fields and register them into the Func registry. This is an experimental capabilty and is likely to be deprecated and then removed from public access in the future.

Note

This is an experimental interface and is likely to be removed or made private in a future version.

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. fields
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Package Members

  1. package func

    The objects in this package represent "function fields" usable in the IPFIX data source.

    The objects in this package represent "function fields" usable in the IPFIX data source. These fields may each be accessed by the field expression "func:<fieldName>", for example "func:recordInfo" in a field expression will produce a human-readable string record as described below.

    See also

    org.cert.netsa.mothra.datasources.ipfix for examples of field expressions

    IPFIXFieldParsing for details about the field path grammar

Type Members

  1. case class ArrayField(base: IPFIXField[Any]) extends IPFIXField[Array[AnyRef]] with LazyLogging with Product with Serializable

    A field which aggregates all results from a given field into a single array of results.

    A field which aggregates all results from a given field into a single array of results.

    Note

    This is an experimental interface and is likely to be removed or made private in a future version.

  2. case class BasicListField[+T](listElem: Option[InfoElement], contentElem: InfoElement) extends IPFIXField[T] with Product with Serializable

    A field whose results are the values of the specified content Information Element within basicList-typed Information Elements.

    A field whose results are the values of the specified content Information Element within basicList-typed Information Elements. The specific Information Element may be specified, or if None is used, any Information Element with a datatype of basicList is used.

    Note

    This is an experimental interface and is likely to be removed or made private in a future version.

  3. case class DeepTemplateField[T](base: IPFIXField[T]) extends IPFIXField[T] with Product with Serializable

    A field which finds the provided field in the current record, or in _any_ nested record (in Information Elements with subTemplateList or subTemplateMultiList datatypes) at any depth.

    A field which finds the provided field in the current record, or in _any_ nested record (in Information Elements with subTemplateList or subTemplateMultiList datatypes) at any depth.

    Note

    This is an experimental interface and is likely to be removed or made private in a future version.

  4. trait IPFIXField[+T] extends Serializable

    Specification of a way to produce a value from an IPFIX record.

    Specification of a way to produce a value from an IPFIX record. Includes both ways to extract appropriate values from a record, and methods for determining the set of partitions to be searched for a provided filter expression.

    Note

    This is an experimental interface and is likely to be removed or made private in a future version.

  5. trait IPFIXFieldSparkVerImpl extends AnyRef
  6. case class InfoElementField[+T](ie: InfoElement) extends IPFIXField[T] with LazyLogging with Product with Serializable

    A field for a specified IPFIX Information Element.

    A field for a specified IPFIX Information Element. Returns all occurrences of that Information Element in the current record, but not in nested records, or basicList fields. (See BasicListField, DeepTemplateField and SubTemplateField for that.)

    Note

    This is an experimental interface and is likely to be removed or made private in a future version.

  7. case class MapField[S, T](f: (S) => T, sqlType: DataType, base: IPFIXField[S]) extends IPFIXField[T] with LazyLogging with Product with Serializable

    A field which maps results from the given field through a function, producing results of the given SQL data type.

    A field which maps results from the given field through a function, producing results of the given SQL data type.

    Note

    This is an experimental interface and is likely to be removed or made private in a future version.

  8. case class MatchField[T](key: IPFIXField[Any], value: String, base: IPFIXField[T]) extends IPFIXField[T] with LazyLogging with Product with Serializable

    A field which filters its base field based on a parallel "key" field's value.

    A field which filters its base field based on a parallel "key" field's value. (It checks only for equality, for string or integral types.)

    Note

    This is an experimental interface and is likely to be removed or made private in a future version.

  9. class SimpleField[+T] extends IPFIXField[T]

    This is a wrapper to simplify writing gauntlets which need to override some behavior.

    This is a wrapper to simplify writing gauntlets which need to override some behavior. Normally, you can just use IPFIXField("foo"), but if you also want to override methods, you can extend this class instead. (See some of the fields in the org.cert.netsa.mothra.datasources.ipfix.fields.func package for details.)

    Note

    This is an experimental interface and is likely to be removed or made private in a future version.

  10. case class StructField(namedFields: Seq[(String, IPFIXField[Any])]) extends IPFIXField[Row] with LazyLogging with Product with Serializable

    A field which returns a record structure of its various arguments for later processing by Spark.

    A field which returns a record structure of its various arguments for later processing by Spark. Each piece of the struct must be given a name, which is its column name in Spark.

    Note

    This is an experimental interface and is likely to be removed or made private in a future version.

  11. case class SubTemplateField[+T](listElem: Option[InfoElement], templateName: Option[String], base: IPFIXField[T]) extends IPFIXField[T] with Product with Serializable

    A field which finds the given field in a nested record.

    A field which finds the given field in a nested record. An optional listElem may be specified, which will limit which list (subTemplateList or subTemplateMultiList) Information Elements are examined to only that Information Element. An optinal templateName may be specified, which will limit which subrecords will be examined to only those with a template metadata giving that name.

    Note

    This is an experimental interface and is likely to be removed or made private in a future version.

  12. case class UnionField[+T](fs: IPFIXField[T]*) extends IPFIXField[T] with LazyLogging with Product with Serializable

    A field which returns all of the results of the fields given as its arguments.

    A field which returns all of the results of the fields given as its arguments. It's an error if the arguments don't all produce the same result type.

    Note

    This is an experimental interface and is likely to be removed or made private in a future version.

Value Members

  1. object Func

    The registry of "function" fields.

    The registry of "function" fields. An entry of "startTime" here will be used for func:startTime in a field expression.

    Note

    This is an experimental interface and is likely to be removed or made private in a future version.

    See also

    the func package for the defined fields.

  2. object IPFIXField extends IPFIXFieldSparkVerImpl with Serializable

    Note

    This is an experimental interface and is likely to be removed or made private in a future version.

  3. object IPFIXFieldParsing extends RegexParsers

    Parser for IPFIX field expressions.

    Parser for IPFIX field expressions.

    Note that whitespace is allowed anywhere and not significant except between quotes.

    fieldUnion ::=
      | field "|" fieldUnion
      | field

    fieldUnion is the top-level entrypoint. A fieldUnion is a set of one or more fields separated by pipes, allowing for the first match. fieldUnions can appear inside parentheses in sub-expressions.

    field ::=
      | "(" fieldUnion ")"
      | "array" "(" fieldUnion ")"
      | "match" "(" fieldUnion "," id "," fieldUnion ")"
      | "struct" "(" namedFieldList ")"
      | "**" "/" field
      | wildId ":" wildId "/" field
      | wildId "/" field
      | wildId "[" elemId "]"
      | funcId
      | elemId

    field is the place most stuff happens. Here are all of the different sorts of fields:

    Fields may be grouped by parentheses like (<field> [ | <field> ... ]), and field unions may appear inside the parenthesis.

    To collect all values from the given field into an array, array(<fieldUnion>) may be used.

    match(<fieldUnion>, <value>, <fieldUnion>) takes the results of two field expressions in parallel. The first field is the "key field", the value is the "target value", and the second field is the "value field". If the key field's iterator's value ki in a given position i matches the target value, then the result field's iterators value vi at that same location is included in the result of the match. (The value is parsed in the same way as an ID, and is taken as a literal value, a string or integral value depending on the type of the matched field.)

    To produce a structured value, struct(<namedField>[, <namedField> ...]) takes multiple field expressions in parallel to produce a possibly nested record structure. All of the sub-fields' iterators are zipped together to produce an iterator of records.

    **/<field> may be used to apply the given field in this record or a subrecord at any depth. This is useful for cases where you don't know where an Information Element might appear in a record.

    To access nested SubTemplateList or SubTemplateMultiList data, <listElemName> [: <templateName>] / <field> will find the given field in a subrecord in the given STL- or STML-typed information element, and optionally under a template that has been given the provided name. Either templateElemName or templateName can be * for a wildcard.

    func:<funcName> will use the registered function field with this name on the current record and produce the result.

    Finally, the most basic <elemName> will find the named information element in the current record.

    namedFieldList ::=
      | namedField "," namedFieldList
      | namedField

    A namedFieldList is a comma-separated list of named fields, as arguments to struct or match.

    namedField ::=
      | id "=" fieldUnion
      | fieldUnion

    A namedField gives a field a name--for example, in the fields of a struct(<namedField>[, <namedField ...]) call, to give names ot the nested structure. If a name isn't given for the field, one will be generated from the field expression in a unspecified manner.

    id ::=
      | [not whitespace or any of '/:()\"']+
      | '"' ([not '\' or '"']+ | "\" [any of 'btnfr\"''])* '"'

    An ID can't have a number of special symbols in it, but you can put an escaped ID in quotes that can have anything at all.

    wildId ::= id | '*'

    Wildcard IDs (for subrecord IE names and subtemplate names) can be an ID or * for "any".

    funcId ::= "func:" id

    Calls to function fields are identified with "func:<id>". See the func package for details on what function fields are defined.

    elemId ::= id - "array"

    Individual element fields can can be any ID other than "array" (unless it's in quotes).

  4. object IPFIXFieldSparkVerImpl
  5. object InfoElementField extends Serializable

    Note

    This is an experimental interface and is likely to be removed or made private in a future version.

  6. object TimeGauntlet extends Serializable

    There are a large number of ways that time information may be encoded into IPFIX Information Elements.

    There are a large number of ways that time information may be encoded into IPFIX Information Elements. These functions provide methods for extracting the start time, end time, and duration of IPFIX records while running the "gauntlet of time" to find the most precise and convenient format available.

    The "extractX" functions will look in all available locations to extract the requested time information directly, including looking at different time resolutions, absolute times or delta times.

    In addition, the "computeX" functions first try to extract the information, and then if it is not available try to compute it from other sources.

    So, for example, extractEndTime will attempt to read the absolute time from several fields, or the relative time from some others. If none of those are available, it will return nothing.

    If you use computeEndTime, it will call extractEndTime to do all of that and then if that fails it will call extractStartTime and extractDurationNanos and add the duration to the start time to find the end time.

    Note that not all records have a duration, or even a time at all.

    See also

    IANA registry for these Information Elements

  7. object UnionField extends Serializable

    Note

    This is an experimental interface and is likely to be removed or made private in a future version.

Inherited from AnyRef

Inherited from Any

Ungrouped