object util
A collection of analysis helper functions for working with network data in Spark.
- Alphabetic
- By Inheritance
- util
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- val APPLICATION_LABEL_FIELD: String
- val DNS_RESOURCE_RECORD_TYPE_FIELD: String
- lazy val MOTHRA_COLLECTORS_DF: Option[DataFrame]
- val ORGANIZATION_LABEL_FIELD: String
- val TLS_CIPHER_SUITE_FIELD: String
- val app_label_name: Column
- def app_labels(labels: Any*): Column
- def append_collection_labels(df: DataFrame, attributes: String*): DataFrame
Append one or more collection attribute labels to a dataframe based on the
observationDomainid
andvlanId
fields in the data.Append one or more collection attribute labels to a dataframe based on the
observationDomainid
andvlanId
fields in the data.- df
input dataframe
append_label_columns(df, "enclave", "department", "organization")
Example: - final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def collect_column(df: DataFrame, c: Column): Array[Any]
An array containing the values of a column expression of a dataframe.
- def collect_column(df: DataFrame): Array[Any]
An array containing the values from the first column of a dataframe.
- def collection_labels(attribute: String, labels: String*): Column
Given the name of a collection attribute and a set of matching labels, produce a filter expression that can be used to filter a DataFrame for flows matching the given labels.
Given the name of a collection attribute and a set of matching labels, produce a filter expression that can be used to filter a DataFrame for flows matching the given labels.
df.filter(collection_labels("enclave", "IT1A", "DEV2B"))
will find flows in either the
IT1A
orDEV2B
enclaves.
Example: - def collector_labels(labels: String*): Column
Given a set of collector labels, returns a set of filter expressions for filtering records from those collectors.
Given a set of collector labels, returns a set of filter expressions for filtering records from those collectors.
df.filter(collector_labels("IT1", "DEV2"))
Example: - def compare_field(field: String, condition: String, value: Any): Column
Convenience method for building filter conditions using comparison operators.
- def count_by_collection_labels(df: DataFrame, attributes: String*): DataFrame
Group data by one or more collection attributes and count the number of flows within each group.
Group data by one or more collection attributes and count the number of flows within each group.
- df
input dataframe
count_by_label_columns(df, "enclave", "department")
Example: - def dateRange(from: LocalDate, to: LocalDate, step: Int = 1): Iterator[LocalDate]
- def department_labels(labels: String*): Column
Given a set of department labels, returns a set of filter expressions for filtering records from those departments.
Given a set of department labels, returns a set of filter expressions for filtering records from those departments.
df.filter(department_labels("IT", "DEV"))
Example: - val dns_rr_types: Column
- val dns_rr_types_udf: UserDefinedFunction
- val email_addr: UserDefinedFunction
- val email_addrs: UserDefinedFunction
- val email_display: UserDefinedFunction
- val email_displays: UserDefinedFunction
- val email_domain: UserDefinedFunction
- val email_domains: UserDefinedFunction
- val email_header_addrs: UserDefinedFunction
- val email_header_displays: UserDefinedFunction
- val email_header_domains: UserDefinedFunction
- val email_header_mailboxes: UserDefinedFunction
- val email_mailbox: UserDefinedFunction
- val email_mailboxes: UserDefinedFunction
- def enclave_labels(labels: String*): Column
Given a set of enclave labels, returns a set of filter expressions for filtering records from those enclaves.
Given a set of enclave labels, returns a set of filter expressions for filtering records from those enclaves.
df.filter(enclave_labels("IT1A", "DEV2B"))
Example: - final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def filter_array_field_list(df: DataFrame, field: String, list: Seq[Any]): DataFrame
- def filter_by_field(df: DataFrame, field: String, condition: String, value: Any): DataFrame
Filter a DataFrame on the passed field, value and condition.
- def filter_by_times(df: DataFrame, start: String, end: String): DataFrame
Return a DataFrame containing only flows starting at or after the passed start time before or at the passed end time.
Return a DataFrame containing only flows starting at or after the passed start time before or at the passed end time. Empty strings can be passed for the start or end times to avoid filtering on that field.
- def filter_field_list(df: DataFrame, field: String, list: Seq[Any]): DataFrame
This function filters and returns a DataFrame based on whether the value or values in the given field of the DataFrame are contained in the given list.
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def ip_is_external(a: Any): Column
True if the input is a string representing an IP address not in the internal set.
True if the input is a string representing an IP address not in the internal set. False if it is an IP address that is in the internal set. NULL if the value cannot be parsed as an IP address.
- def ip_is_internal(a: Any): Column
True if the input is a string representing an IP address in the internal set.
True if the input is a string representing an IP address in the internal set. False if it is an IP address that is not in the internal set. NULL if the value cannot be parsed as an IP address.
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def load_csv_file(filepath: String): DataFrame
Import data from a CSV file and return a DataFrame containing the imported data.
- def load_hive_table(tablename: String): DataFrame
Import data from a CSV file and return a DataFrame containing the imported data.
- def load_ipset(infile: String): DataFrame
Load a SiLK IPset file into a DataFrame of individual IP addresses.
- def load_ipset_blocks(infile: String): DataFrame
Load a SiLK IPset file into a DataFrame of IP address blocks.
- def make_timestamp(t: Timestamp): Timestamp
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def organization_labels(labels: String*): Column
Given a set of organization labels, returns a set of filter expressions for filtering records from those organizations.
Given a set of organization labels, returns a set of filter expressions for filtering records from those organizations.
df.filter(organization_labels("ORG1", "ORG2))
Example: - def save_csv_file(df: DataFrame, outfile: String): Unit
Write data in an existing DataFrame to a CSV file.
Write data in an existing DataFrame to a CSV file. The output CSV file will have headers corresponding to the column names of the DataFrame.
- def save_hive_table(df: DataFrame, tablename: String): Unit
Write data in an existing DataFrame to a Hive table.
- def save_ipset(outfile: String, df: DataFrame): Unit
Save a DataFrame of IP addresses or blocks as a SiLK IPset file.
- def sip_dip_direction(sip: Any, dip: Any): Column
If both arguments are strings parsable as IP addresses, and if the first argument (sip) represents the initiator of a connection and the second argument (dip) represents the recipient of a connection, returns:
If both arguments are strings parsable as IP addresses, and if the first argument (sip) represents the initiator of a connection and the second argument (dip) represents the recipient of a connection, returns:
"in" if the initiator is external and the recipient is internal "out" if the initiator is internal and the recipient is external "int2int" if both are internal "ext2ext" if both are external NULL if either value is not parsable as an IP address
- val ssl_rewrite_col: UserDefinedFunction
- def ssl_rewrite_df(df: DataFrame): DataFrame
- def stime_date(date: String): Column
Given a string in the form of yyyy-mm-dd representing a date, returns a filter expression that can be used to filter a DataFrame for records with a start time on that date.
Given a string in the form of yyyy-mm-dd representing a date, returns a filter expression that can be used to filter a DataFrame for records with a start time on that date.
df.filter(stime_date("2018-01-01"))
Example: - def stime_range(begin: String = null, end: String = null): Column
Given string arguments representing the begin and end of a time range, returns a filter expression that can be used to filter a DataFrame for records with a start time within that range.
Given string arguments representing the begin and end of a time range, returns a filter expression that can be used to filter a DataFrame for records with a start time within that range.
df.filter(stime_range("2018-01-01 12:00:00", "2018-01-01 13:00:00"))
Example: - final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- val tls_cipher_descs: Column
- val tls_cipher_descs_udf: UserDefinedFunction
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- def withSensorIds[T](input: Dataset[T], sensorIds: Seq[Int]): Dataset[T]
- val yaf_ssl_object_type: UserDefinedFunction
This is documentation for Mothra, a collection of Scala and Spark library functions for working with Internet-related data. Some modules contain APIs of general use to Scala programmers. Some modules make those tools more useful on Spark data-processing systems.
Please see the documentation for the individual packages for more details on their use.
Scala Packages
These packages are useful in Scala code without involving Spark:
org.cert.netsa.data
This package, which is collected as the
netsa-data
library, provides types for working with various kinds of information:org.cert.netsa.data.net
- types for working with network dataorg.cert.netsa.data.time
- types for working with time dataorg.cert.netsa.data.unsigned
- types for working with unsigned integral valuesorg.cert.netsa.io.ipfix
The
netsa-io-ipfix
library provides tools for reading and writing IETF IPFIX data from various connections and files.org.cert.netsa.io.silk
To read and write CERT NetSA SiLK file formats and configuration files, use the
netsa-io-silk
library.org.cert.netsa.util
The "junk drawer" of
netsa-util
so far provides only two features: First, a method for equipping Scala scala.collection.Iterators with exception handling. And second, a way to query the versions of NetSA libraries present in a JVM at runtime.Spark Packages
These packages require the use of Apache Spark:
org.cert.netsa.mothra.datasources
Spark datasources for CERT file types. This package contains utility features which add methods to Apache Spark DataFrameReader objects, allowing IPFIX and SiLK flows to be opened using simple
spark.read...
calls.The
mothra-datasources
library contains both IPFIX and SiLK functionality, whilemothra-datasources-ipfix
andmothra-datasources-silk
contain only what's needed for the named datasource.org.cert.netsa.mothra.analysis
A grab-bag of analysis helper functions and example analyses.
org.cert.netsa.mothra.functions
This single Scala object provides Spark SQL functions for working with network data. It is the entirety of the
mothra-functions
library.