Packages

  • package root

    This is documentation for Mothra, a collection of Scala and Spark library functions for working with Internet-related data.

    This is documentation for Mothra, a collection of Scala and Spark library functions for working with Internet-related data. Some modules contain APIs of general use to Scala programmers. Some modules make those tools more useful on Spark data-processing systems.

    Please see the documentation for the individual packages for more details on their use.

    Scala Packages

    These packages are useful in Scala code without involving Spark:

    org.cert.netsa.data

    This package, which is collected as the netsa-data library, provides types for working with various kinds of information:

    org.cert.netsa.io.ipfix

    The netsa-io-ipfix library provides tools for reading and writing IETF IPFIX data from various connections and files.

    org.cert.netsa.io.silk

    To read and write CERT NetSA SiLK file formats and configuration files, use the netsa-io-silk library.

    org.cert.netsa.util

    The "junk drawer" of netsa-util so far provides only two features: First, a method for equipping Scala scala.collection.Iterators with exception handling. And second, a way to query the versions of NetSA libraries present in a JVM at runtime.

    Spark Packages

    These packages require the use of Apache Spark:

    org.cert.netsa.mothra.datasources

    Spark datasources for CERT file types. This package contains utility features which add methods to Apache Spark DataFrameReader objects, allowing IPFIX and SiLK flows to be opened using simple spark.read... calls.

    The mothra-datasources library contains both IPFIX and SiLK functionality, while mothra-datasources-ipfix and mothra-datasources-silk contain only what's needed for the named datasource.

    org.cert.netsa.mothra.analysis

    A grab-bag of analysis helper functions and example analyses.

    org.cert.netsa.mothra.functions

    This single Scala object provides Spark SQL functions for working with network data. It is the entirety of the mothra-functions library.

    Definition Classes
    root
  • package org
    Definition Classes
    root
  • package cert
    Definition Classes
    org
  • package netsa
    Definition Classes
    cert
  • package mothra
    Definition Classes
    netsa
  • package analysis
    Definition Classes
    mothra
  • package baselining
    Definition Classes
    analysis
  • package ddos
    Definition Classes
    analysis
  • package misc_silk
    Definition Classes
    analysis
  • package network_profiling
    Definition Classes
    analysis
  • package scanners
    Definition Classes
    analysis
  • package spam_detection
    Definition Classes
    analysis
  • package ssh_session_detection
    Definition Classes
    analysis
  • package statistics
    Definition Classes
    analysis
  • util

object util

A collection of analysis helper functions for working with network data in Spark.

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. util
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val APPLICATION_LABEL_FIELD: String
  5. val DNS_RESOURCE_RECORD_TYPE_FIELD: String
  6. lazy val MOTHRA_COLLECTORS_DF: Option[DataFrame]
  7. val ORGANIZATION_LABEL_FIELD: String
  8. val TLS_CIPHER_SUITE_FIELD: String
  9. val app_label_name: Column
  10. def app_labels(labels: Any*): Column
  11. def append_collection_labels(df: DataFrame, attributes: String*): DataFrame

    Append one or more collection attribute labels to a dataframe based on the observationDomainid and vlanId fields in the data.

    Append one or more collection attribute labels to a dataframe based on the observationDomainid and vlanId fields in the data.

    df

    input dataframe

    Example:
    1. append_label_columns(df, "enclave", "department", "organization")
  12. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  13. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @native()
  14. def collect_column(df: DataFrame, c: Column): Array[Any]

    An array containing the values of a column expression of a dataframe.

  15. def collect_column(df: DataFrame): Array[Any]

    An array containing the values from the first column of a dataframe.

  16. def collection_labels(attribute: String, labels: String*): Column

    Given the name of a collection attribute and a set of matching labels, produce a filter expression that can be used to filter a DataFrame for flows matching the given labels.

    Given the name of a collection attribute and a set of matching labels, produce a filter expression that can be used to filter a DataFrame for flows matching the given labels.

    Example:
    1. df.filter(collection_labels("enclave", "IT1A", "DEV2B"))

      will find flows in either the IT1A or DEV2B enclaves.

  17. def collector_labels(labels: String*): Column

    Given a set of collector labels, returns a set of filter expressions for filtering records from those collectors.

    Given a set of collector labels, returns a set of filter expressions for filtering records from those collectors.

    Example:
    1. df.filter(collector_labels("IT1", "DEV2"))
  18. def compare_field(field: String, condition: String, value: Any): Column

    Convenience method for building filter conditions using comparison operators.

  19. def count_by_collection_labels(df: DataFrame, attributes: String*): DataFrame

    Group data by one or more collection attributes and count the number of flows within each group.

    Group data by one or more collection attributes and count the number of flows within each group.

    df

    input dataframe

    Example:
    1. count_by_label_columns(df, "enclave", "department")
  20. def dateRange(from: LocalDate, to: LocalDate, step: Int = 1): Iterator[LocalDate]
  21. def department_labels(labels: String*): Column

    Given a set of department labels, returns a set of filter expressions for filtering records from those departments.

    Given a set of department labels, returns a set of filter expressions for filtering records from those departments.

    Example:
    1. df.filter(department_labels("IT", "DEV"))
  22. val dns_rr_types: Column
  23. val dns_rr_types_udf: UserDefinedFunction
  24. val email_addr: UserDefinedFunction
  25. val email_addrs: UserDefinedFunction
  26. val email_display: UserDefinedFunction
  27. val email_displays: UserDefinedFunction
  28. val email_domain: UserDefinedFunction
  29. val email_domains: UserDefinedFunction
  30. val email_header_addrs: UserDefinedFunction
  31. val email_header_displays: UserDefinedFunction
  32. val email_header_domains: UserDefinedFunction
  33. val email_header_mailboxes: UserDefinedFunction
  34. val email_mailbox: UserDefinedFunction
  35. val email_mailboxes: UserDefinedFunction
  36. def enclave_labels(labels: String*): Column

    Given a set of enclave labels, returns a set of filter expressions for filtering records from those enclaves.

    Given a set of enclave labels, returns a set of filter expressions for filtering records from those enclaves.

    Example:
    1. df.filter(enclave_labels("IT1A", "DEV2B"))
  37. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  38. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  39. def filter_array_field_list(df: DataFrame, field: String, list: Seq[Any]): DataFrame
  40. def filter_by_field(df: DataFrame, field: String, condition: String, value: Any): DataFrame

    Filter a DataFrame on the passed field, value and condition.

  41. def filter_by_times(df: DataFrame, start: String, end: String): DataFrame

    Return a DataFrame containing only flows starting at or after the passed start time before or at the passed end time.

    Return a DataFrame containing only flows starting at or after the passed start time before or at the passed end time. Empty strings can be passed for the start or end times to avoid filtering on that field.

  42. def filter_field_list(df: DataFrame, field: String, list: Seq[Any]): DataFrame

    This function filters and returns a DataFrame based on whether the value or values in the given field of the DataFrame are contained in the given list.

  43. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable])
  44. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  45. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  46. def ip_is_external(a: Any): Column

    True if the input is a string representing an IP address not in the internal set.

    True if the input is a string representing an IP address not in the internal set. False if it is an IP address that is in the internal set. NULL if the value cannot be parsed as an IP address.

  47. def ip_is_internal(a: Any): Column

    True if the input is a string representing an IP address in the internal set.

    True if the input is a string representing an IP address in the internal set. False if it is an IP address that is not in the internal set. NULL if the value cannot be parsed as an IP address.

  48. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  49. def load_csv_file(filepath: String): DataFrame

    Import data from a CSV file and return a DataFrame containing the imported data.

  50. def load_hive_table(tablename: String): DataFrame

    Import data from a CSV file and return a DataFrame containing the imported data.

  51. def load_ipset(infile: String): DataFrame

    Load a SiLK IPset file into a DataFrame of individual IP addresses.

  52. def load_ipset_blocks(infile: String): DataFrame

    Load a SiLK IPset file into a DataFrame of IP address blocks.

  53. def make_timestamp(t: Timestamp): Timestamp
  54. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  55. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  56. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  57. def organization_labels(labels: String*): Column

    Given a set of organization labels, returns a set of filter expressions for filtering records from those organizations.

    Given a set of organization labels, returns a set of filter expressions for filtering records from those organizations.

    Example:
    1. df.filter(organization_labels("ORG1", "ORG2))
  58. def save_csv_file(df: DataFrame, outfile: String): Unit

    Write data in an existing DataFrame to a CSV file.

    Write data in an existing DataFrame to a CSV file. The output CSV file will have headers corresponding to the column names of the DataFrame.

  59. def save_hive_table(df: DataFrame, tablename: String): Unit

    Write data in an existing DataFrame to a Hive table.

  60. def save_ipset(outfile: String, df: DataFrame): Unit

    Save a DataFrame of IP addresses or blocks as a SiLK IPset file.

  61. def sip_dip_direction(sip: Any, dip: Any): Column

    If both arguments are strings parsable as IP addresses, and if the first argument (sip) represents the initiator of a connection and the second argument (dip) represents the recipient of a connection, returns:

    If both arguments are strings parsable as IP addresses, and if the first argument (sip) represents the initiator of a connection and the second argument (dip) represents the recipient of a connection, returns:

    "in" if the initiator is external and the recipient is internal "out" if the initiator is internal and the recipient is external "int2int" if both are internal "ext2ext" if both are external NULL if either value is not parsable as an IP address

  62. val ssl_rewrite_col: UserDefinedFunction
  63. def ssl_rewrite_df(df: DataFrame): DataFrame
  64. def stime_date(date: String): Column

    Given a string in the form of yyyy-mm-dd representing a date, returns a filter expression that can be used to filter a DataFrame for records with a start time on that date.

    Given a string in the form of yyyy-mm-dd representing a date, returns a filter expression that can be used to filter a DataFrame for records with a start time on that date.

    Example:
    1. df.filter(stime_date("2018-01-01"))
  65. def stime_range(begin: String = null, end: String = null): Column

    Given string arguments representing the begin and end of a time range, returns a filter expression that can be used to filter a DataFrame for records with a start time within that range.

    Given string arguments representing the begin and end of a time range, returns a filter expression that can be used to filter a DataFrame for records with a start time within that range.

    Example:
    1. df.filter(stime_range("2018-01-01 12:00:00", "2018-01-01 13:00:00"))
  66. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  67. val tls_cipher_descs: Column
  68. val tls_cipher_descs_udf: UserDefinedFunction
  69. def toString(): String
    Definition Classes
    AnyRef → Any
  70. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  71. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  72. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  73. def withSensorIds[T](input: Dataset[T], sensorIds: Seq[Int]): Dataset[T]
  74. val yaf_ssl_object_type: UserDefinedFunction

Deprecated Value Members

  1. def make_timestamp(s: String): Timestamp
    Annotations
    @deprecated
    Deprecated

    (Since version org.cert.netsa.mothra.analysis 1.3.2) this method will be removed -- please use Timestamp.valueof(s)

Inherited from AnyRef

Inherited from Any

Ungrouped