This handbook describes how to build the Analysis Pipeline, incorporate it into your existing SiLK packing infrastructure, and configure it to detect various types of traffic. The html contains the entire handbook, notably section 3 on the installation and configuration of Pipeline. This html page isn't perfectly formatted, but is in lieu of the incomplete documentation currently under construction below.
The manual page for pipeline, the Analysis Pipeline application.

Pipeline Overview

In order to support inspection of every SiLK flow record as the records are created, the NetSA group developed the Analysis Pipeline. The Analysis Pipeline supports many analyses, including:

Although the Analysis Pipeline application, pipeline, can simply be run from the command line, it is intended to be run as a daemon as part of the collection and packing process where it processes every SiLK flow record created by rwflowpack, just as the flow records are entering the SiLK data repository. (For information on installing pipeline, section 3.)

There are three stages to the Analysis Pipeline:

  • Each incoming flow record is tested against each of the filters that the user has defined. These filters are similar to the rwfilter command line. The flow records that pass each filter are handed to each association interested in the those particular flow records.
  • In the second stage, evaluations and statistics process the records: Evaluations compare internal state to a user defined threshold. Statistics compute state values and then export that state based on a user-defined interval of time.
  • The alerting stage checks the evaluations and statistics to see if there are any alerts to be sent. This alerting stage also checks with named lists that are configured to periodically sent their entire contents as alerts.

To assist in entering data and sharing data among multiple filters, the Analysis Pipeline allows the administrator to create a list. A list can reference an existing SiLK IPset file, contain values entered directly into the configuration file, or be created by a mechanism inside pipeline itself.

Filters, evaluations, statistics, and lists are all built independently of each other, with each having a unique name. They are linked together using configuration keywords and their unique names.

Any number of evaluations and statistics can receive the records held by each filter. However, evaluations and statistics can only have one filter providing flow records to it.

An additional concept in the Analysis Pipeline is an internal filter. Internal filters can be used to store intermediate successes based on flow records and filters. These .successes. are not major enough to yield individual alerts, but can shape future processing. Internal filters are used when the analysis is more complex than simply passing a filter and transferring data to evaluations, and they allow for multistage analysis: Flow record A met criteria, and future flow records will be combined with record A for more in-depth analysis.

Fields and Field Lists

All fields in a SiLK flow record can be used to filter data, along with some derived fields. Currently pipeline only supports IPv4 addresses. The field names in bold below are the proper syntax for referencing the fields in the configuration file

Fields in the list below can be combined into tuples, e.g. {SIP, DIP}, for more advanced analysis. These tuples are represented in the configuration file by listing the fields with spaces between them. When processed, they are sorted internally, so SIP DIP SPORT is the same as SPORT DIP SIP.

IP addresses and ports have directionality, source and destination. The keyword ANY can be used to indicate that the direction does not matter, and both values are to be tried (This can only be used when filtering). The ANY * fields can go anywhere inside the field list, the only restrictions are that the ANY must immediately precede IP, PORT, IP PAIR, or PORT PAIR, and that there are can only be one ANY in a field list. The available fields are:

ANY_IP Either the source address of destination address
ANY_IP_PAIR Either the {SIP, DIP} tuple or the {DIP, SIP} tuple
ANY_PORT Either the source port or the destination port
ANY_PORT_PAIR Either the {SPORT, DPORT} tuple or the {DPORT, SPORT} tuple
APPLICATION The service port of the record as set by the flow generator if the generator supports it, or 0 otherwise. For Example, this owuld 80 if the flow generator recognizes the packets as being part of an HTTP session
ATTRIBUTES Any combination of the letters F, T, or C, where:
  • F indicates the flow generator saw additional packets in this flow following a packet with a FIN flag (excluding ACK packets)
  • T indicates the flow generator prematurely created a record for a long-running connection due to a timeout.
  • C indicates the flow generator created this flow as a continuation of long-running connection, where the previous flow for this connection met a timeout
BYTES The count of the number of bytes in the flow record
BYTES PER PACKET An integer division of the bytes field and the packets field. It is a 32-bit number. The value is 0 if there are no packets
CLASSNAME The class name assigned to the record. Class are defined in the silk.conf file
DIP The destination IP address
DPORT The destination port
DURATION The dureation of the flow record, in integer seconds. This is the difference between ETIME and STIME
ETIME The wall clock time when the flow generator closed the flow record
FLAGS The union of the TCP flags on every packet that comprises the flow record. The value can contain any of the letters F, S, R, P, A, U, E, and C. (To match records with either ACK or SYN|ACK set, use the IN_LIST operator.) The flags formatting used by SiLK can also be used to specify a set of flags values. S/SA means to only care about SYN and ACK, and of those, only the SYN is set. The original way Pipeline accepted flags values, the raw specification of flags permutation is still allowed.
FLOW RECORD This field references the entire flow record, and can only be used when checking the flow record against multiple filters using IN LIST (see below)
ICMPCODE The ICMP code. This test also adds a comparison that the protocol is 1.
ICMPTYPE The ICMP type. This test also adds a comparison that the protocol is 1.
INITFLAGS The TCP flags on the first packet of the flow record. See FLAGS.
INPUT The SNMP interface where the flow record entered the router. This is often 0 as SiLK does not normally store this value.
NHIP The next-hop IP of the flow record as set by the router. This is often 0.0.0.0 as SiLK does not normally store this value.
OUTPUT The SNMP interface where the flow record exited the router. This is often 0 as SiLK does not normally store this value.
PACKETS The count of the number of packets.
PMAP See pmap section for details
PROTOCOL The IP protocol. This is an integer, e.g. 6 is TCP
SENSOR The sensor name assigned to the record. Sensors are defined in the silk.conf file.
SESSIONFLAGS The union of the TCP flags on the second through final packets that comprise the flow record. See FLAGS
SIP The source IP address
SPORT The source port
STIME The wall clock time when the flow generator opened the flow record
TYPENAME The type name assigned to the record. Types are defined in the silk.conf file.

Pmap Support

Prefix Maps (pmaps) are part of the SiLK tool suite and can be made using rwpmapbuild. Their output can be used just like any other field in pipeline. It can make up part of a tuple, be used in FOREACH, and be used in filtering. One caveat about pmaps being used to make up a tuple in field lists, is that the pmap must be listed first in the list for proper parsing. However, when referencing pmap values in a typeable tuple, it must go at the end. PMAPs take either an IP address, or a PROTOCOL PORT pair as inputs.

Using a PMAP in Pipeline is a two stage process in the configuration file. The first step is to declare the pmap. This links a user-defined field name to a pmap file, with the name in quotes. This field name will be used in alerts to reference the field, and in the rest of the configuration file to reference the pmap.

The declaration line is not part of a FILTER or EVALUATION, so it is by iteself, similar to the INCLUDE statements. The declaration line starts with the keyword PMAP, followed by a string for the name without spaces, and lastly, the filename in quotes.

PMAP userDefinedFieldName "pmapFilename"

Now that the PMAP is declared, the field name can be used throughout the file. Each time the field is used, the input to the pmap must be provided. This allows different inputs to be used throughout the file, without redeclaring the pmap.

userDefinedFieldName(inputFieldList)

For each type of pmap, there is a fixed list of inputFieldLists:

  • Pmaps for IP addresses:
    • SIP-Use the SIP as the key to the pmap
    • DIP-Use the DIP as the key to the pmap
    • ANY IP-Use the SIP from the record as the key, then use the DIP. This can be used with filtering to try both values in the comparison, and also in FOREACH to create a state bin for both results of the pmap.
  • Pmaps for Protocol Port pairs:
    • PROTOCOL SPORT-Use the PROTOCOL SPORT tuple as the key to the pmap
    • PROTOCOL DPORT-Use the PROTOCOL DPORT tuple as the key to the pmap
    • PROTOCOL ANY PORT-Use the PROTOCOL SPORT as the key, then use the PROTOCOL DPORT. This can be used with filtering to try both values in the comparison, and also in FOREACH to create a state bin for both results of the pmap.

Below is an example that declares a pmap, then filters based on the result of the pmap on the SIP, then counts records per pmap result on the DIP

PMAP thePmapField "myPmapFile.pmap"

FILTER onPmap
    thePmapField(SIP) == theString
END FILTER

STATISTIC countRecords
    FILTER onPmap
    FOREACH thePmapField(DIP)
    RECORD COUNT
END STATISTIC

Field Booleans

Field booleans are custom fields that consist of an existing field and a list of values. If the value for the field is in the value list, then the field boolean’s value is TRUE. These are defined similar to PMAPs, but use the keyword FIELD BOOLEAN. For example, to define a boolean named webPorts, to mean the source port is one of [80, 8080]:

    FIELD BOOLEAN sourceTransportPort webPorts IN [80, 8080]

Now, webPorts is a field that can be used anywhere in the configuration file that checks whether the sourceTransportPort is in [80, 8080].

If used in filtering, this is the same as saying: sourceTransportPort IN LIST [80, 8080].

However, if used as a part of FOREACH, the value TRUE or FALSE will be in the field list, to indicate whether the sourceTransportPort is 80 or 8080.

Another example could be a boolean to check whether the hour of the day, derived from a timestamp, is part of the work day. There could be a statistic constructed to report byte counts binned by whether the hour is in the workday, which is 8am to 5pm in this example.

    FIELD BOOLEAN HOUR_OF_DAY(flowStartSeconds) workday IN [8,9,10,11,12,13,14,15,16,17]
    STATISTIC workdayByteCounts
        FOREACH workday
        SUM octetTotalCount
    END STATISTIC

Timestamp Derived Fields

These derived fields pull out human readable values from timestamps. The values they pull are just integers, but in filters, pipeline can accept the words associated with those values, e.g. JANUARY is translated to 0, as is SUNDAY. These fields work with field types: DATETIME_SECONDS, DATETIME_MILLISECONDS, DATETIME_MICROSECONDS, DATETIME_NANOSECONDS. Each will be converted to the appropriate units for processing. The system’s timezone is used to calculate the HOUR value.

The field to be operated on is put in parentheses after the derived field name.

These fields can be used anywhere in a pipeline configuration file like any other field.

  • HOUR_OF_DAY(timestampFieldName) The integer value for the hour of the day where midnight is 0 and 11pm being 23.
  • DAY_OF_WEEK(timestampFieldName) The integer value of the day of the week where SUNDAY is 0. The text names of the days in all capital letters are accepted by the configuration file parser as values for filtering.
  • DAY_OF_MONTH(timestampFieldName) The integer value of the day of the month, where the first day of the month is 1.
  • MONTH(timestampFieldName) The integer value of the month of the year where JANUARY is 0. The text names of the months in all capital letters are accepted by the configuration file parser as values for filtering.

Other Derived Fields

The field to be operated on is put in parentheses after the derived field name.

These fields can be used anywhere in a pipeline configuration file like any other field.

  • FLOW_KEY_HASH A 32-bit integer that is the flow key hash from the flow record. No fields need specified as it is a standard calculation. Using this as a filter can be helpful in batch mode when trying to isolate a particular flow. The value(s) to filter with can be formatted in hexadecimal or decimal.
Back to top

Filters

The Analysis Pipeline passes each flow record through each filter to determine whether the record should be passed on to an evaluation or statistic. There can be any number of filters, and each runs independently. As a result, each filter sees every flow record, and keeps its own list of flows that meet its criteria.

A filter block starts with the FILTER keyword followed by the name of the filter, and it ends with the END FILTER statement. The filter name must be unique across all filters. The filter name is referenced by evaluations, internal filters, and statistics.

Filters are initially marked internally as inactive, and become active when an evaluation or statistic references them.

Filters are composed of comparisons. In the filter block, each comparison appears on a line by itself. If all comparisons in a filter return a match or success, the flow record is sent to the evaluation(s) and/or statistic(s) that use the records from that filter.

If there are no comparisons in a filter, the filter reports success for every record.

Each comparison is made up of three elements: a field, an operator, and a compare value, for example BYTES > 40. A comparison is considered a match for a record if the expression created by replacing the field name with the field.s value is true.

Eight operators are supported. The operator determines the form that the compare value takes.

  • IN_LIST - Used to test whether a record.s field is included in the given list. The compare value can be a list that was previously defined by en evaluation or internal filter, an IPSet filename, or defined in-line:
    • The name of a list that is filled by the outputs of an evaluation, or an internal filter. This is the only place in pipeline filters where tuples can be used. The tuple in the filter must entirely match the tuple used to fill the list.

      SIP DIP PROTO SPORT DPORT IN LIST createdListOfFiveTuples

    • The filename of the IPset file is given in quotation marks as the compare value. When pipeline is running as a daemon, the full path to the IPset file must be used. This only can only be used with IP addresses.

      SIP IN LIST "/data/myIPSet.set"

    • The contents of the list can be entered directly into the configuration file. The elements are comma-separated, surrounded by square brackets, [ and ]. As an example, the following matches FTP, HTTP, and SSH traffic in the filter:

      DPORT IN_LIST [21, 22, 80]

    • Pipeline can take a number of formats for files with lists of values. The filename must be in double quotes.

      fieldList IN LIST "/path/to/watchlist.file"

      If the fieldList consists of one field and if it is of type IPV4_ADDRESS or IPV6_ADDRESS, the file MUST be a SiLK IPSet. A fieldList of just an IP cannot be any of the types described below.

      A file can be used to house both types of bracketed lists described above, both the single and double bracketed lists. This has to be formatted exactly as if it was typed directly into the config file. The format is such that a user should be able to copy and paste the contents of files in this format into the config file and vice versa. The single line (there cannot be any newline characters in the list) of the bracketed list much have a new line at the end.

      If the fieldList consists of a single field, a simple watchlist file can be used to hold the values. This format requires one value per line. The format of each value type is the same as if it was typed into the configuration file. Comments can be used in the file by setting the first character of the line to "#". The value in the field being compared against the watchlist must be an exact match to an entry in the file for the comparison to be true.

    • If there is a single field in the fieldList, and if that is an IP address, this bracketed list can contain IPSet files mixed with IP addresses that will all be combined for the filter:

      SIP IN LIST ["/data/firstIPset.set", 192.168.0.0/16, "/data/secondIPset.set"]

    • Bracketed lists can also be used to enter tuple of information directly into a filter. This is done using nested bracketed lists. One caveat is that this is the one case that the ordering of the fields in the field list matters (which is due to the fact that it doesn't matter in the other cases). The fields must follow this ordering scheme: SIP,DIP,PROTOCOL,SPORT,DPORT,STIME,DURATION,TYPENAME,CLASSNAME, SENSOR,ENDTIME,INITFLAGS,RESTFLAGS,TCPFLAGS,TCP_STATE, APPLICATION,INPUT,OUTPUT,PACKETS,BYTES,NHIP,ICMPTYPE,ICMPCODE, ANY IP,ANY PORT,BYTES PER PACKET.

      An example is filtering for sip 1.1.1.1 with sport 80, and 2.2.2.2 with sport 443:
      FILTER sipSportPair
          SIP SPORT IN LIST [[1.1.1.1,80], [2.2.2.2,443]]
      END FILTER

    • The only way to use a logical OR with filters is to create a full filter for describing the sets of conditions you'd like to OR together. For such a filter, the field is FLOW RECORD .

      For example, to do TCP sport 80 OR UDP dport 23:
      FILTER tcp80
          SPORT == 80
          PROTOCOL == 6
      END FILTER
      FILTER udp23
          DPORT == 23
          PROTOCOL == 17
      END FILTER
      FILTER filterUsingTcp80OrUdp23
          FLOW RECORD IN LIST [tcp80,udp23]
      END FILTER

  • NOT_IN_LIST Same as IN_LIST, but succeeds if the value is not in the list.
  • == Succeeds when the value from the record is equal to the compare value. This also encompasses IPv4 subnets. For example, the following will succeed if either the source or destination IP address is in the 192.168.x.x subnet:

    ANY_IP == 192.168.0.0/16

  • != Succeeds when the value from the record is not equal to the compare value.
  • < Succeeds when the value from the record is strictly less than the compare value.
  • <= Succeeds when the value from the record is less than or equal to than the compare value.
  • > Succeeds when the value from the record is strictly greater than the compare value.
  • >= Succeeds when the value from the record is greater than or equal to than the compare value.

The compare value can reference another field on the flow record. For example, to check whether the source and destination port are the same, use: SPORT == DPORT

Internal Filters

There are two places where named lists can be created and populated so they can be used by filters: Internal Filters and Output Lists (which are discussed in evaluation specifics.

In each case, a field list is used to store the tuple that describes the contents of the data in the list. A filter can use these lists if the tuple used in the filters perfectly matches the tuple used to make the list.

Internal Filter Description

An internal filter compares the incoming flow record against an existing filter, and if it passes, it takes some subset of fields from that record and places them into a named list. This list can be used in other filters. There can be any number of these lists.

Internal filters are different from output lists, because they put data into the list(s) immediately, so this contents of the list(s) can be used for records in the same flow file as the one that causes data to be put into the list(s). Output lists, populated by evaluations, are only filled, and thus take effect, for the subsequent flow files.

Internal filters are immediate reactions to encountering a notable flow record.

The fields to be pulled from the record and put into the list can be combined into any tuple. These include the ANY fields, and the output of Pmaps. The "WEB_REDIR" fields cannot be used here. Details on how to create an internal filter for specific use for WEB_REDIRECTION or HIGH_PORT_CHECK primitives is discussed below.

Internal Filter Syntax

An internal filter is a combination of filters and lists, so both pieces need to be specified in the syntax. A key aspect of the internal filter declaration is to tell it which fields pulled from records that pass the filter, get put into which list. There can be more than one field-list combination per internal filter.

It is recommended that a timeout value be added to each statement which declares the length of time a value can be considered valid, but it is no longer required. To build a list from an internal filter without a timeout, leave the timeout portion of the configuration file blank.

Syntax

INTERNAL_FILTER name of this internal filter
    FILTER name of filter to use
    fieldList list name timeout
END INTERNAL FILTER

Example, given an existing filter to find records to or from watchlist

INTERNAL_FILTER watchlistInfo
    FILTER watchlistRecords
    SPORT DPORT watchlistPorts 1 HOUR
    SIP DIP SPORT DPORT PROTOCOL watchlistFiveTuples 1 DAY
END INTERNAL_FILTER

This internal filter pulls {SPORT,DPORT} tuples from flows that pass the filter watchlistRecords, and puts them into a list called watchlistPorts, and those values stay in the list for 1 hour. It also pulls the entire five tuple from those records and puts then into a list called watchlistFiveTuples that stay in the list for 1 DAY.

WEB_REDIRECTION and HIGH_PORT_CHECK require the use of internal filters as they scan for flow records to compare against that can be in the same flow file. The field list for each of these lists are keywords, that in addition to indicating the fields to be stored, tells pipeline how to store them. The keywords are WEB_REDIR_LIST and HIGH_PORT_LIST respectively.

Arithmetic Primitives

Available operators to compare state values with thresholds include: <, <=, >, >=, and !=.

RECORD COUNT CHECK THRESHOLD
    RECORD COUNT oper threshold
END CHECK
Count of flows seen by primitive
SUM CHECK THRESHOLD
    SUM field oper threshold
END CHECK
Sum of the values for the given field.
AVERAGE CHECK THRESHOLD
    AVERAGE field oper threshold
END CHECK
Average of the values for the given field.
DISTINCT CHECK THRESHOLD
    DISTINCT field oper threshold
END CHECK
Count of distinct values for the given field. Field can be a field list to count distinct tuples.
PROPORTION CHECK THRESHOLD
    PROPORTION field value oper threshold
END CHECK
Proportion of flows seen with the given value seen for the given field

Other Primitives

EVERYTHING PASSES CHECK EVERYTHING PASSES
END CHECK
Alert on every flow
BEACON CHECK BEACON
    COUNT minCount CHECK TOLERANCE int PERCENT
    TIME WINDOW minIntervalTimeVal
END CHECK
Finite State Beacon Detection
RATIO CHECK RATIO
    OUTGOING integer1 TO integer2
    LIST name of list from beacon # optional END CHECK
Detect IP pairs with more outgoing than incoming traffic
ITERATIVE COMPARISON This primitive has been removed for Version 4.5
HIGH PORT CHECK CHECK HIGH_PORT_CHECK
    LIST listName
END CHECK
Look for passive traffic
WEB REDIRECTION This primitive has been removed for Version 4.5
SENSOR OUTAGE CHECK FILE_OUTAGE
    SENSOR_LIST [list of sensor names]
    TIME_WINDOW time units
END CHECK
Alert if a sensor stops sending flows
DIFFERENCE DISTRIBUTION STATISTIC diffDistExample
    DIFF DIST field
END STATISTIC
Output difference distribution (Statistic only)

Evaluations and Statistics

Evaluations and statistics comprise the second stage of the Analysis Pipeline. Each evaluation and statistic specifies the name of a filter which feeds records to the evaluation or statistic. Specific values are pulled from those flow records, aggregate state is accumulated, and when certain criteria are met alerts are produced.

To calculate and aggregate state from the filtered flow records, pipeline uses a concept called a primitive .

Evaluations are based on a list of checks that have primitives embedded in them. The aggregate state of the primitive is compared to the user defined threshold value and alerts are generated.

Statistics use exactly one primitive to aggregate state. The statistic periodically exports all of the state as specified by a user-defined interval.

New to version 4.2, if a statistic is utilizing FOREACH, and the state for a particular unique value bin is empty, the value will not be included in an alert for the statistic. A statistic without FOREACH, will output the state value no matter what.

An evaluation block begins with the keyword EVALUATION followed by the evaluation name. Its completion is indicated by END EVALUATION.

Similarly, a statistic block begins with the keyword STATISTIC and the statistic's name; the END STATISTIC statement closes the block.

The remainder of this section describes the settings that evaluations and statistics have in common, and the keywords they share. A description of primitives will hopefully make the details of evaluations and statistics easier to follow.

Each of the following commands (except ID) go on their own line.

ID

Each evaluation and statistic must have a unique string identifier. It can have letters (upper and lower case) and numbers, but no spaces. It is placed immediately following the EVALUATION or STATISTIC declaration:

EVALUATION myUniqueEvaluationName
    ...
END EVALUATION

STATISTIC myUniqueStatisticName
    ...
END STATISTIC

Alert Type

The ALERT TYPE is an arbitrary, user-defined string. It can be used as a general category to help when grouping or sorting the alerts. If no alert type is specified, the default alert type for evaluations and statistics is Evaluation. and Statistic, respectively. The value for the alert type does not affect pipeline processing.

Syntax:

    ALERT TYPE alert-type-string

Severity

Evaluations and statistics must be assigned a severity level which is included in the alerts they generate. The levels are represented by integers from 1 to 255. The severity has no meaning to the Analysis Pipeline; the value is simply recorded in the alert. The value for the severity does not affect pipeline processing. This field is required.

Syntax:

    SEVERITY integer

Filter Id

Evaluations and statistics (but not file evaluations) need to be attached to a filter, which provides them flow records to analyze. Each can have one and only one filter. The filter's name links the evaluation or statistic with the filter. As a result, the filter must be created prior to creating the evaluation or statistic.

Syntax:

    FILTER filter-name

"Binning" by distinct field: FOREACH

Evaluations and statistics can compute aggregate values across all flow records, or they can aggregate values separately for each distinct value of particular field(s) on the flow records-grouping or "binning" the flow records by the field(s). An example of this latter approach is computing something per distinct source address.

FOREACH is used to isolate a value (a malicious IP address), or a notable tuple (a suspicious port pair). The unique field value that caused an evaluation to alert will be included in any alerts. Using FOREACH in a statistic will cause the value for every unique field value to be sent out in the periodic update.

The default is not to separate the data for each distinct value. The field that is used as the key for the bins is referred to as the unique field, and is declared in the configuration file for the FOREACH command, followed by the field name:

    FOREACH field

Any of the fields can be combined into a tuple, with spaces between the individual field names. The more fields included in this list, the more memory the underlying primitives need to keep all of the required state data.

The ANY IP and ANY PORT constructs can be used here to build state (maybe a sum of bytes) for both ips or ports in the flow. The point of this is to build some state for an IP or PORT regardless of whether it's the source or destination, just that it appeared. When referencing the IP or PORT value to build an output list, use SIP or SPORT as the field to put in the list.

Pmaps can also be used to bin state. The state is binned by the output of the pmap. Pmaps can also be combined with other fields to build more complex tuples for binning state, such as pmap(SIP) PROTOCOL

To keep state per source IP Address:

    FOREACH SIP

To keep state per port pair:

    FOREACH SPORT DPORT

To keep state for both ips:

    FOREACH ANY IP

As with filtering, the ordering of the fields in the tuple does not matter as they are sorted internally.

There are some limits on which fields can be used as some evaluations require certain that a particular field be used, and some primitives do not support binning by a field.

File evaluations do not handle records, so the FOREACH statement is illegal.

Active Status

By default, evaluations and statistics are marked as active when they are defined. Specifying the INACTIVE statement in the evaluation or statistic block causes the evaluation or statistic to be created, but it is marked inactive, and it will not be used in processing records. For consistency, there is also an ACTIVE statement which is never really needed.

Syntax:

    INACTIVE

Evaluation Specifics

This section provides evaluation-specific details, building on the evaluation introduction and aggregate function description provided in the previous two sections.

Each evaluation block must contain one or more check blocks. The evaluation sends each flow record it receives to each check block where the records are aggregated and tests are run. If every check block test returns a true value, the evaluation produces an output entry which may become part of an alert.

Evaluations have many settings that can be configured, including the output and alerting stages, in addition to evaluation sanity checks.

Checks

In an evaluation, the check block begins with the CHECK statement which takes as a parameter the type of check. The block ends with the END CHECK statement. If the check requires any additional settings, those settings are put between the CHECK and END CHECK statements, which are laid out in the primitives section.

The FILE_OUTAGE check must be part of a FILE_EVALUATION block. All other checks must be part of a EVALUATION block.

Outputs

When an evaluation threshold is met, the evaluation creates an output entry. The output entry may become part of an alert, or it may be used to affect other settings in the pipeline.

Output Timeouts

All information contained in alerts is pulled from lists of output entries from evaluations. These output entries can be configured to time out both to conserve memory and to ensure that information contained in alerts is fresh enough to provide value. The different ways to configure the alerting stage are discussed below.

One way to configure alerting is to limit the number of times alerts can be sent in a time window. This is a place where the output timeout can have a major effect. If alerts are only sent once a day, but outputs time out after one hour, then only the outputs generated in the hour before alerting will be eligible to be included in alerts.

When FOREACH is not used, output entries are little more than flow records with attached threshold information. When FOREACH is used, they contain the unique field value that caused the evaluation to return true. Each time this unique field value triggers the evaluation, the timestamp for that value is reset and the timeout clock begins again.

Taking an example of an evaluation doing network profile that is identifying servers. If the output timeout is set to 1 day, then the list of output entries will contain all IP addresses that have acted like a server in the last day. As long as a given IP address is acting like a server, it will remain in the output list and is available to be included in an alert, or put in a named output list as described in the output list section.

Syntax:

    OUTPUT TIMEOUT timeval

    OUTPUT TIMEOUT 1 DAY

Shared Output Lists

When FOREACH is used with an evaluation, any value in an output entry can be put into a named output list. If the unique field is a tuple made up of multiple fields, any subset of those fields can be put into a list. There can be any number of these lists. A timeout value is not provided for each list as the OUTPUT TIMEOUT value is used. When an output entry times out, the value, or subset of that tuple is removed from all output lists that contain it.

These lists can be referenced by filters, or configured seperately, as described in the list configsection.

To create a list, a field list of what the output list will contain must be provided. A unique name for this list must be provided as well.

Syntax:

    OUTPUT LIST fieldList listName

If using FOREACH SIP DIP, each of the following lists can be created

    OUTPUT LIST SIP listOfSips

    OUTPUT LIST DIP listOfDips

    OUTPUT LIST SIP DIP listOfIPPairs

Alert on Removal

If FOREACH is used, pipeline can be configured to send an alert when an output has timed out from the output entries list.

Syntax:

    ALERT ON REMOVAL

Clearing State

Once the evaluation’s state has hit the threshold and an output entry has been generated, you may desire to reset the current state of the evaluation. For example, if the evaluation alerts when a count of something gets to 1000, you might want to reset the count to start at 0 again. Using CLEAR ALWAYS can give a more accurate measure of timeliness, and is likely to be faster.

To set the value of when to clear state, simply type on of the following into the body of the evaluation.

    CLEAR ALWAYS

    CLEAR NEVER

This field is now required as of v4.3.1. CLEAR NEVER used to be the default.

Too Many Outputs

There are sanity checks that can be put in place to turn off evaluations that are finding more outputs than expected. This could happen from a poorly designed evaluation or analysis. For example, an evaluation looking for web servers may be expected to find less then 100, so a sanity threshold of 1000 would indicate lots of unexpected results, and the evaluation should be shut down as to not take up too much memory or flood alerts.

Evaluations that hit the threshold can be shutdown permanently, or go to sleep for a specified period of time, and turned back on. If an evaluation is shut down temporarily, all state is cleared and memory is freed, and it will restart as if pipeline had just begun processing.

Syntax:

    SHUTDOWN MORE THAN integer OUTPUTS [FOR timeval]

Examples to shutdown if there are more than 1000 outputs. One shuts it down forever, and the other shuts it down for 1 day and starts over

    SHUTDOWN MORE THAN 1000 OUTPUTS

    SHUTDOWN MORE THAN 1000 OUTPUTS FOR 1 DAY

Alerting Settings

Alerting is the final stage of the Analysis Pipeline. When the evaluation stage is finished, and output entries are created, alerts can be sent. The contents of all alerts come from these output entries. These alerts provide information for a user to take action and/or monitor events. The alerting stage in pipeline can be configured with how often to send alerts and how much to include in the alerts.

Based on the configurations outlined below, the alerting stage first determines if it is permitted to send alerts, then it decides which output entries can be packaged up into alerts.

How often to send alerts

Just because there are output entries produced by an evaluation does not mean that alerts will be sent. An evaluation can be configured to only send a batch of alerts once an hour, or 2 batches per day. The first thing the alerting stage does is check when the last batch of alerts were sent, and determine if sending a new batch meets the restrictions placed by the user in the configuration file.

If it determines that alerts can be sent, it builds an alert for each output entry, unless further restricted by the next section that affect how much to alert.

Syntax:

    ALERT integer-count TIMES timeVal

This configuration option does not affect the number of alerts sent per time period, if affects the number of times batches of alerts can be sent per time period. That is why the configuration command says "alert N times per time period", rather than "send N alerts per time period", while the semantic differences are subtle, it has a great affect on what gets sent out.

To have pipeline send only 1 batch of alerts per hour, use:

    ALERT 1 TIMES 1 HOUR

To indicate that pipeline should alert every time there are output entries for alerts, use:

    ALERT ALWAYS

How much to alert

The second alert setting determines how much information to send in each alert. You may wish to receive different amounts of data depending on the type of evaluation and how often it reports. Consider these examples:

  • An evaluation is generating a list of web servers and reporting that list once an hour. You want to get the complete list every hour (that is, in every alert).
  • A beacon detection evaluation reports each beacon as soon as it finds the beacon. For this evaluation, you only want to get the beacons found since the previous alert.
  • A particular evaluation produces a great deal of output. For this evaluation, you only want to receive the alerts generated in the most recently processed file.
  • An evaluation repeatedly finds the same outputs (maybe servers?), but what is notable is when a new one is found. You may only want to hear about each server one time, unless it stops acting like a server, then reestablishes itself.

The amount of data to send in an alert is relevant only when the OUTPUT_TIMEOUT statement includes a non-zero timeout and multiple alerts are generated within that time window.

To specify how much to send in an alert, specify the ALERT keyword followed by one of the following:

  • EVERYTHING: Package all outputs in the output field list into the current alert.
  • SINCE LAST TIME: Package all outputs found since the last alert was sent into the current alert.
  • EACH ONLY ONCE: Include each unique value (set with FOREACH ) in alert one time only.

The default is SINCE LAST TIME. If using an EVERYTHING PASSES evaluation, be sure to use ALERT EVERYTHING to ensure flows from files that arrive with less than a second between them are included in alerts.

The last option is to have an evaluation do its work, but to never send out alerts. If the goal of an evaluation is just to fill up a list so other pieces of pipeline can use the results, individual alerts may not be necessary. Another case is that the desired output of filling these lists is that the lists send alerts periodically, and getting individual alerts for each entry is not ideal. In these cases, instead of the options described above use:

    DO NOT ALERT

Statistic Specifics

The Evals and Stats section introduced the Analysis Pipeline concept of a statistic and described the settings that statistics share with evaluations. A statistic receives flow records from a filter, computes an aggregate value, and periodically reports that value.

There are two time values that affect statistics: how often to report the statistics, and the length of the time-window used when computing the statistics. The following example reports the statistics every 10 minutes using the last 20 minutes of data to compute the statistic:

    UPDATE 10 MINUTES
    TIME_WINDOW 20 MINUTES

  • The UPDATE statement specifies the reporting interval; that is, how often to report the statistic. This statement is required in each statistics block.
  • The TIME_WINDOW statement specifies the rolling time frame over which to compute the statistic. When the time window is not specified or specifies a value smaller than the reporting interval, the time window is set to the reporting interval.

Statistics support the aggregation functions from the primitives section. Unlike an evaluation, a statistic is simply reporting the function's value, and neither the CHECK statement nor a threshold value are used. Instead, the statistic lists the primitive and any parameters it requires.

Simple examples are:

  • Periodically report the number of records:
        RECORD_COUNT
  • Periodically report the sum of the packets:
        SUM PACKETS
  • Periodically report the average flow duration:
        AVERAGE DURATION
  • Periodically report the number of distinct destination ports seen:
        DISTINCT DPORT
  • Periodically report the proportion of records for each source port:
        PROPORTION SPORT

Statistics send alerts after the specified time period has elapsed. One exception to this is if Pipeline is processing a list of files using --named-files and there is only a single file in this list. In this case, a Statistic will send an alert for testing and summary purposes, even though technically no time has passed.

List Configuration

Named lists created by internal filters and evaluations can be given extra configuration such that they are responsible for sending updates and alerts independent or in lieu of the mechanism that populates them. If there is a list configuration block, there does not need to be an evaluation block for the configuration file to be accepted. As long as something in pipeline generates alerts, it will run.

Lists created by internal filters have their own timeouts, so they are responsible for removing out-dated elements on their own. Lists populated by evaluations keep track of the timing out of values within the evaluation, and tell the list to remove a value, so those lists know nothing of the timeouts. A result of this is that due to efficiency concerns, some of the alerting functionality described below is not available for lists created and maintained by internal filters. It is explicitly stated which features cannot be used.

This extra configuration is surrounded in a LIST CONFIGURATION block, similar to other pipeline mechanisms. The list to configure must already have been declared before the configuration block.

High Level Syntax:

    LIST CONFIGURATION existingListName
        options discussed below
    END LIST CONFIGURATION

Alert Triggers

Alerts sent due to a list configuration come from the lists, and have their own timestamps and state kept about their alerts. They are not subject to the alerting restrictions imposed on the evaluations that populate the list.

Periodic

The full contents of the list can be packaged into one alert, periodically:

    UPDATE timeval

This will send out the entire list every 12 hours

    UPDATE 12 HOURS

Element Threshold

An alert can be sent if the number of elements in the list meets a certain threshold, as it's possible that while the contents are important, and can be configured to be sent periodically, knowing the count got above a threshold could be more time sensitive.

    ALERT MORE THAN elementThreshold ELEMENTS

This alert will only be sent the first time the number of elements crosses the threshold. There can also be a reset threshold that if the number of elements drops below this value, pipeline will once again be allowed to send an alert if the number of elements is greater than the alert threshold. There is no alert sent upon going below the reset threshold. The elements in the are no reset by this either.

    ALERT MORE THAN elementThreshold ELEMENTS RESET AT resetThreshold

This example will send an alert if there are more than 10 elements in the list. No more alerts will be sent unless the number of elements drops below 5, and then it will alert is the number of elements goes above 10 again.

    ALERT MORE THAN 10 ELEMENTS RESET AT 5

Alert on Removal

Pipeline can send an alert any time a value is removed form the list.

    ALERT ON REMOVAL

Alerting on removal cannot be used by lists created by internal filters

Other List Configuration Options

Seeding LIsts with IPSets

Lists used to hold SIP, DIP, or NHIP can be given a set of initial values by providing an ipset file. Only IP's can be used with seedfiles.

    SEED pathToIPSetFile

Overwrite IPSet File on Update

If a seedfile is provided, and the list of configured to send periodic updates, it can be configured to overwrite that seedfile with the current contents of the list. This allows that file to always have the most up to date values.

    OVERWRITE ON UPDATE

Element Threshold to Shutdown

As with evaluations, lists can be configured to shut down if they become filled with too many elements. This is provided as a sanity check to let the user know it the configuration has a flaw in the analysis. If the number of elements meets the shutdown threshold, an alert is sent, the list is freed, and is disconnected from the mechanism that had been populating it.

    SHUTDOWN MORE THAN shutdownThreshold ELEMENTS

Severity

As with evaluations, a severity level can be provided to give context to alerts. It is not used during processing, but included in alerts sent from the lists.

    SEVERITY integerSeverity

List Bundles

Named lists, and ipset files, can now be linked such that if an element is added to all of the lists in the bundle, Pipeline can send an alert, and if desired, add that element to another named list, which can be used in a LIST CONFIGURATION block described above.

The lists referenced in the list bundle must already have be created in the configuration file. All lists must be made up of the same fields. An IPSet file can be added to the bundle, provided that the field for the lists is SIP or DIP, and must be put in quotation marks.

High Level Syntax:

    LIST BUNDLE listBundleName
         existingListNameOrIpSetFilename
         existingListNameOrIpSetFilenameWithSameFields
         ...
        Other options
    END LIST BUNDLE

Named lists for bundle

Each list to be added to the bundle goes on its own line. This list must be created already in the configuration file by an evaluation or internal filter. If this list is to be made from an IPSet file, it must be in quotes.

Add element to another list

Once an element has been found to be in all of the lists in a bundle, it is then able to be put in a new named list. This list can be used in LIST CONFIGURATION just like any other named list. There is no timeout needed for this, as the element will be removed from this list if it is removed from an element in the bundle.

    OUTPUT LIST nameOfNewList

Severity

As with evaluations, a severity level must be provided to give context to alerts. It is not used during processing, but included in alerts sent from the lists.

    SEVERITY integerSeverity

Do Not Alert

As with evaluations, you can force the list bundle to not alert, as maybe you just want the values that meet the qualifications of the list bundle to be put into another named list (using OUTPUT LIST above) , and get alerts of the contents that way. Just add DO NOT ALERT to the list of statements for the list bundle.

List Bundle example

Let's say an evaluation creates a list named myServers, and an internal filter creates a list called interestingIPs, and there is an IPSet file is note named notableIPS.set. To include these lists in a bundle, and to put any IP that is in all lists into a new list named reallyImportantIPs, use the following:

LIST BUNDLE myExampleBundle
    myServers
    interestingIPs
    "notableIPS.set"
    OUTPUT LIST reallyImportantIPs
    SEVERITY 4
END LIST BUNDLE