This version is an expansion from version 4.x. If you are only processing SiLK records, version 4.x is simpler.

The Analysis Pipeline was developed to support inspection of flow records as they are created. The Analysis Pipeline supports many analyses, including:

SiLK IPv4 records were the focus of versions 4.x and below. Version 5.0 expanded the data options to include SiLK IPv6 records as well as IPv4. Version 5.1 opens the door to a wider array of record types. The first specific records are those exported by YAF. YAF exports the core flow information used by SiLK, and enhances it with deep packet inspection information. Analysis Pipeline can also accept raw IPFIX records from any application. Any field used by YAF, or present in the IPFIX records, can be used by Pipeline. It can handle multiple sources, and multiple record types transmitted by each data source. If multiple sources or record types share a field or fields, Pipeline is able to recognize that and combine state for values of that field from each data source. If watchlisting on SIP for example, any data record that has SIP can meet the criteria for that filter.

The new data formats add an additional way to read data. As with earlier versions, it can accept a list of files on the command line and poll a directory for incoming files. It can also accept a socket connection for YAF and IPFIX records. If a socket or directory polling is used, Pipeline wil run as a daemon unless --do-not-daemonize is specified as a command line option.

Analyses are defined in configuration files that contain any combination and number of available building blocks presented here. There are three stages to the Analysis Pipeline, which are defined using that configuration file:

  1. Each incoming flow record is tested against each of the filters that the user has defined. These filters are similar to the rwfilter command line tool. The records that pass each filter are handed to each association interested in the those particular flow records. Filters are described here.
  2. In the second stage, evaluations and statistics process the records:
    • Evaluations compare internal state to a user defined threshold.
    • Statistics compute state values and export that state based on a user-defined interval of time.
    A description of Evaluations and Statistics can be found here
  3. The alerting stage checks the evaluations and statistics to see if there are any alerts to be sent. This alerting stage also checks with named lists that are configured to periodically send their entire contents as alerts.

To assist in entering data and sharing data among multiple filters, the Analysis Pipeline allows the administrator to create a list. A list can reference an existing SiLK ipset file, contain values entered directly into the configuration file, or be created by a mechanism inside Pipeline itself.

Filters, evaluations, statistics, and lists are all built independently of each other, with each having a unique name. They are linked together by referencing those unique names.

Any number of evaluations and statistics can receive the records held by each filter. However, evaluations and statistics can only have one filter providing flow records to it. Filters delete their flows after each flow file, or group of flows from a socket, as the other stages of Pipeline take what they need from the flows during processing.

An additional concept in the Analysis Pipeline is an internal filter. Internal filters can be used to store intermediate "successes" based on flow records and filters. These "successes" are not major enough to yield individual alerts, but can shape future processing. Internal filters are used when the analysis is more complex than simply passing a filter and transferring data to evaluations, and they allow for multistage analysis: "Flow record A met criteria, and future flow records will be combined with record A for more in-depth analysis."

Another use case for internal filters is when the user wants to build a list containing a particular field (or list of fields) from all of the records that meet the criteria of a certain filter. "Build a list of all IP addresses that send out web traffic from port 80."

Analytics for the Analysis Pipeline are specified in a configuration file, the details and syntax of which can be found here