Primitives are what pipeline uses to calculate and aggregate state from the filtered flow records. They are the building blocks for evaluations and statistics. Statistics use only one primitive and periodically export the state based on a user-defined interval. Evaluations pair primitives and thresholds and send alerts when the aggregate state of a primitive meets the threshold requirement. Evaluations embed a primitive in a check, and there can be multiple checks whose values are "anded" together to produce an overall answer as to whether the evaluation succeeded, and an alert should be sent.

Each primitive is based on a field from a flow record from which it extracts a value to be aggregated. What the primitive does with this value is based on the type of primitive (outlined below).

The primitive.s state can be aggregated based on all of the records, or can be divided into bins based on the value of the user defined field in the flow records. A typical example of this is keeping track of something per source IP address. This feature helps to identify IP addresses or ports involved in anomalous activity. (Mainly for ports and IPs, but works with all flow record fields). The field that is used to create the bins is referred to as the unique field, and is declared in an evaluation or statistic in the configuration file using the "foreach" command, followed by the field name.

When arithmetic primitives are used in an evaluation, a threshold is required. This threshold is compared against the aggregated state value either for entire collection system, or for each bin created with the FOREACH field. When using a FOREACH field, if the field or field list is four bytes or less, a SiLK bag can be used to set dynamic thresholds based on the value for the FOREACH field. If that field is not in the bag, that value and state are ignored. The syntax for bags is to replace the integer threshold with a quoted string filename of the bag.

There is a time aspect that affects how data is aggregated. Each primitive can be assigned a time window that indicates how long data from each flow record is to be counted in the aggregate state before it is timed out and subtracted. This allows the query of "alert if the count gets to 100 in any 5 minute interval" to be successfully answered. The time window value is given in seconds, minutes, or hours. A window of "forever" can also be used, using the keyword FOREVER instead of declaring an integer number of seconds.

For each primitive, the syntax for embedding it in a check for an evaluation and in a statistic is listed. When used in evaluations, the arithmetic primtiives: RECORD COUNT, SUM, AVERAGE, DISTINCT, and PROPORTION are grouped as threshold checks. Each check starts with the keyword CHECK followed by the type of check. It ends with the keywords END CHECK. Statistics only have one primitive, so they are simpler, so primitives do not need to be embedded in a check.

All of these primitives can be used to build evaluations, but only those specifically labeled can be used to build a statistic. Some primitives have specific requirements, such as being required to be the only one in an evaluation of statistic. These are laid out in each section, along with the memory consumption ramifications for each type. The number of bytes of state that each primitive keeps is listed. If the evaluation or statistic is binning up the state using FOREACH, that number of bytes will be multiplied by the number of unique values seen to get the total memory consumption. If no FOREACH is used, there is only one state value, no multiplier.

Each primitive has certain requirements for information provided, or restrictions on what is allowed. For example, the SUM of SIPs is nonsensical and is not permitted. These will be outlined below.

There may be some aspects of the configuration file that are set automatically by choosing a certain primitive. These will be mentioned below with each primitive when they arise.

Notes on TIME WINDOW

For many primitives, the state is aggregated over a user-specified time window. This window indicates how long data from each flow record is to be counted in the aggregate state before that records data is timed out and subtracted. This allows the query "Alert if the count gets to 100 in a 5 minute interval" to be successfully answered. The time window is specified with the TIME_WINDOW command followed by a list of number-time-unit pairs. The number may be an integer or a floating-point value. pipeline supports time units of MILLISECONDS, SECONDS, MINUTES, HOURS, or DAYS. For most primitives, any fractional seconds value is ignored. An infinite time window of can be specified by using the keyword FOREVER.

Examples:

TIME_WINDOW 6 MINUTES
TIME_WINDOW 4 MINUTES 120 SECONDS # also 6 minutes
TIME_WINDOW 0.1 HOUR # also 6 minutes
TIME_WINDOW 30 SECONDS
TIME_WINDOW FOREVER

Pipeline can base its evaluations on a sliding window, allowing things such as "alert if a SIP sends out more than 10000 bytes in any 5 minute period". That 5 minute period is a sliding time window.

The 5 minutes are measured against "network time". The time is advanced based on the end times in the flows received. If there is a delay in the collection network, causing flows to arrive to pipeline "late", this time window does not get skewed, as it relies on the flows to advance this.

In addition to adding the new flows to the state, evaluations remove expired state (older than the time window), ensuring unwanted, or old, data does not improperly affect the comparison to the threshold.

In an evaluation, the TIME_WINDOW command appears in a CHECK block and applies to that particular primitive. In a statistic, the TIME_WINDOW command is in main body of the block.

Record Count

This primitive type counts the number of records that make it through the filter. It does not pull values from the records, so there is no need to specify a field in the configuration file.

This primitive uses 8 bytes for each state value kept.

Record count in a check

RECORD COUNT operator threshold

This example will send an alert if there are more than 100 records.

EVALUATION rcEval
    CHECK THRESHOLD
        RECORD COUNT > 100
    END CHECK
END EVALUATION

Record count in a statistic

Statistics do not have thresholds, and this primitive needs no field. This example will generate periodic alerts containing the number of records seen.

STATISTIC rcStat
    RECORD COUNT
END STATISTIC

Sum

This primitive pulls the value of the field specified in the configuration file from a record that passes the filter. These values are added together, and their sum is kept for evaluation. All check parameters are required for this check type.

The available fields for SUM are: BYTES, PACKETS, or DURATION .

This primitive uses 8 bytes for each state value kept.

Sum in a check

SUM field operator threshold

This example will generate an alert if the sum of BYTES is greater than or equal to 1000.

EVALUATION sumEval
    CHECK THRESHOLD
        SUM BYTES > 1000
    END CHECK
END EVALUATION

Sum in a statistic

Statistics do not have thresholds, so this primitive just needs a field. This example will generate periodic alerts contaning the sum of the number of packets seen.

STATISTIC sumStat
    SUM PACKETS
END STATISTIC

Average

The AVERAGE primitive is a combination of the sum and record count primitives: it computes the sum of the named volume field and counts the number of records, such that it can compute an average volume per record.

The available field for AVERAGE are BYTES, PACKETS, DURATION, or BYTES PER PACKET.

It uses 12 bytes for each state value kept.

Average in a check

AVERAGE field operator threshold

This example will generate an alert if the average of BYTES PER PACKET is less than 10.

EVALUATION avgEval
    CHECK THRESHOLD
        AVERAGE BYTES PER PACKET < 10
    END CHECK
END EVALUATION

Average in a statistic

Statistics do not have thresholds, so this primitive just needs a field. This example will generate periodic alerts containing the running average of the number of packets seen per flow.

STATISTIC avgStat
    AVERAGE PACKETS
END STATISTIC

Distinct

This primitive tallies the number of unique values of the specified field list that have passed the filter. All check parameters are required for this check type. An example of distinct is: "alert if there are 10 unique DIPs seen, regardless of how many times each DIP was contacted". This primitive can be used for statistics. Any number of fields can be combined to be counted in a field list, including the ANY fields, and pmap results (including pmaps using ANYs as keys).

The DISTINCT primitive is memory intensive as it keeps track of each distinct value seen and the time when that value was last seen (so that data can be properly aged). When paired with a FOREACH command, the primitive is even more expensive.

Distinct in a check

DISTINCT field operator threshold

This example will generate an alert if more than 50 DPORTs are seen

EVALUATION distinctEval
    CHECK THRESHOLD
        DISTINCT DPORT > 50
    END CHECK
END EVALUATION

Distinct in a statistic

Statistics do not have thresholds, so this primitive just needs a field. This exampe will generate periodic alerts containing the number of different {SIP, DIP} tuples seen.

STATISTIC distinctStat

    DISTINCT SIP DIP
END STATISTIC

Proportion

This primitive takes a field and a value for that field. It calculates the percentage of the flows that have that value for the specified field. This field includes the ANY fields, and pmap results.

The option of when to clear the state is automatically set to NEVER for PROPORTION.

This primitive uses 16 bytes per state value kept.

Proportion in a check

PROPORTION field fieldValue operator threshold PERCENT

This example will generate an alert if less than 33 percent of traffic is UDP.

EVALUATION propEval
    CHECK THRESHOLD
        PROPORTION PROTOCOL 17 < 33 PERCENT
    END CHECK
END EVALUATION

Proportion in a statistic

Statistics do not have thresholds, so this primitive just needs a field. This example will generate periodic alerts containing the percentage of traffic sent from source port 80. each SPORT.

STATISTIC propStat
    PROPORTION SPORT 80
END STATISTIC

Everything Passes

This primitive does not keep any state, it tells pipeline to simply output all flow records that pass the filter. This primitive is typically used evaluations that alert on watchlists because the watchlist check itself is done at the filter stage.

It must be the only check used in an evaluation and cannot use FOREACH.

Because there is no state kept, running an evaluation with an EVERYTHING_PASSES primitive has an insignificant effect on the memory usage.

It should be used with ALERT EVERYTHING as it is possible for two flow files to arrive within the same second with flows to alert on, and the default of ALERT SINCE LAST TIME could prevent flows from the second file to alert as they will be marked as having the same timestamp as "last time", so alerts for them will not be sent.

This primitive forces some Evaluation settings by default:

Everything passes in a check

There is no state to keep, so there is no additional information needed.

EVALUATION epEval
    CHECK EVERYTHING_PASSES
    END CHECK
END EVALUATION

Everything passes in a statistic

This primitive cannot be used in a statistic. To have pipeline periodically send out the number of flows that a filter identifies, use the RECORD COUNT primitive in s statistic.

Beacon

This primitive looks for beacons using SIP, DIP, DPORT, and PROTOCOL as the unique field. If flows show up with end times spaced out in intervals, longer than the user specified time, the four tuple and the record are put into an alert.

The user must provide the threshold of the minimum number of periodic flows to be seen before an alert is generated. Also, the minimum amount of time for the interval between flows. Lastly, the tolerance for the flow not showing up exactly interval seconds after the last flow.

Do not enter anything for the FOREACH field, it will be done for you. It is automatically set to never clear state upon success.

Beacon finding is very costly simply due to the number of permutations of the SIP DIP DPORT PROTOCOL tuples, and state is needed for each one.

Beacon in a check

CHECK BEACON
    COUNT minCount CHECK TOLERANCE integerPercent PERCENT
    TIME WINDOW minimumIntervalTimeVal
END CHECK

This example will look for beacons that are defined by the following characteristics: There are at least 5 flows with the same {SIP, DIP, DPORT, PROTOCOL} that arrives at a constant interval plus or minus 5 percent. And that interval must be at least 5 minutes.

EVALUATION beaconEval
    CHECK BEACON
        COUNT 5 CHECK_TOLERANCE 5 PERCENT
        TIME WINDOW 5 MINUTES
    END CHECK
END EVALUATION

Beacon in a statistic

The Beacon primitive cannot be used in a statistic.

Ratio

This primitive calculates the ratio of outgoing to incoming bytes. There are three options for grouping the bytes using the FOREACH field like other evaluations and statistics:

The direction of the traffic can be determined one of two ways:

The threshold must at least be that outgoing > incoming.

Ratio in a check

With the requirement that integer1 > integer 2

CHECK RATIO
    OUTGOING integer1 TO integer2
    LIST name of list from beacon # optional
END CHECK

The inside of the check reversed with equivalent results:

CHECK RATIO
    INCOMING integer2 TO integer1
    LIST name of list from beacon # optional
END CHECK

This example will generate an alert if the outgoing to incoming ratio is greater than 10 to 1, for a pair of IPs without using a beacon list.

EVALUATION ratioEval
    FOREACH SIP DIP
    CHECK RATIO
        OUTGOING 10 TO 1
    END CHECK
END EVALUATION

This example will generate an alert if the total bytes sent by an IP is 5 times as much as the number of bytes it receives, no matter who it's to or from.

EVALUATION ratioEval
    FOREACH ANY IP
    CHECK RATIO
        OUTGOING 10 TO 1
    END CHECK
END EVALUATION

Ratio in a statistic

This primitive cannot be used in a statistic

Iterative Comparison

This primitive has been removed for version 4.5

High Port Check

Syntax:

CHECK HIGH_PORT_CHECK
    LIST list-name
END CHECK

The HIGH_PORT_CHECK detects passive data transfer on ephemeral ports. As an example, in passive FTP, the client contacts the server on TCP port 21, and this is the control channel. The server begins listening on an ephemeral (high) port that will be used for data transfer, and the client uses an ephemeral port to contact the server's ephemeral port. Sometimes there are multiple ephemeral connections. Finally, all the connections are closed. Since flows represent many packets, typically the flow representing the traffic on port 21 is not generated until the entire FTP session is ended. As a result, the flow record for port 21 arrives after the flow records for the passive transfers.

To detect passive FTP, pipeline uses an internal list of all high port to high port five-tuples. When pipeline sees the port 21 flow record, it determines whether the IPs on that record appear in a five-tuple in the high port list. If a match is found, the traffic between the high ports is considered part of the FTP session.

When using a HIGH_PORT_CHECK check in an EVALUATION, there are several additional steps you must take:

  1. The FOREACH value must be set to the standard five tuple. The HIGH_PORT_CHECK check will set this value for you, and it will issue an error if you attempt to set it to any other value
  2. The filter that feeds the evaluation should look for TCP traffic on port 21.
        FILTER ftp-control
            ANY_PORT == 21
            PROTOCOL == 6
        END FILTER
  3. A second filter to match traffic between ephemeral ports is created. For example:
        FILTER passive-ftp
            SPORT > 1024
            DPORT > 1024
            PROTOCOL == 6
        END FILTER
  4. You must create an INTERNAL_FILTER block (see Section 1.4). This block uses the filter created in the previous step, and it must specify a list over pairs of source and destination IP addresses. For example:
        INTERNAL_FILTER passive-ftp
            FILTER passive-ftp
            SIP DIP high-port-ips 90 SECONDS
        END INTERNAL_FILTER

    The list does not need to be created explicitly; the internal filter will create the list if it does not exist.
  5. In the CHECK block, specify the name of the list that is part of the INTERNAL_FILTER. For example:
        CHECK HIGH_PORT_CHECK
            LIST high-port-ips
        END CHECK

Putting that together in the EVALUATION block, you have:

    EVALUATION passive-ftp
        FILTER ftp-control
        INTERNAL_FILTER passive-ftp
        CHECK HIGH_PORT_CHECK
             LIST high-port-ips
        END CHECK
    END EVALUATION

The HIGH_PORT_CHECK check is set to always clear the state upon success. This check uses a large amount of memory as the internal list maintains state for each flow record between two ephemeral ports.

This primitive cannot be used in a statistic.

Web Redirection

This primitive has been removed for version 4.5

Sensor Outage

An evaluation may operate on an input file as a whole, as opposed to operating on every record. This type of evaluation is called a file evaluation. It begins with FILE_EVALUATION and the name of the file evaluation being created. It ends with END FILE_EVALUATION.

The FILE_OUTAGE check only works within a FILE_EVALUATION. It alerts if pipeline has not received an incoming flow file from the listed sensor(s) in a given period of time.

Syntax:

    CHECK FILE_OUTAGE
        SENSOR_LIST sensor-list
        TIME_WINDOW number time-unit     END CHECK

The TIME_WINDOW specifies the maximum amount of time to wait for a new sensor file to appear before alerting. The number can be an integer or a floating-point value. Valid time units are MILLISECONDS, SECONDS, MINUTES, HOURS, or DAYS. Fractional seconds are ignored. There is no default time window, and it must be specified.

The SENSOR_LIST names the sensors that you expect will generate a new flow file more often than the specified time window. This statement must appear in a SENSOR_LIST check. There are three forms for the statement:

Example: Alert if any or the sensors S0, S1, or S2 do not produce a flow files within two hours:

    FILE EVALUATION
        CHECK FILE_OUTAGE
             SENSOR_LIST [S0, S1, S2]
             TIME_WINDOW 2 HOURS
        END CHECK
    END FILE EVALUATION

Example: Alert if any sensor does not produce flow files within four hours:

    CHECK FILE_OUTAGE
        SENSOR_LIST ALL_SENSORS
        TIME_WINDOW 4 HOURS
    END CHECK

Difference Distribution

This primitive tracks the difference between sub sequent values for a specified field. It uses bins, the number of which is based on the length of the field, to keep track of the distribution of those differences. An 8-bit field has 17 bins, a 16 bit field has 33 bins, 32->65,and 64->129. The bins themselves are 16-bit numbers.

This primitive can only be used in Statistics. It can be used with any field, and can be combined with FOREACH.

The bin chosen to increment is relative to the middle of the array of bins. If there is no difference in the value, the middle bin is incremented. The bin number relative to the middle uses the following calculation: bin number = (log[base2] of the difference) + 1. If the new value is smaller than the old, then a "negative" bin offset is used, as decreases in the value need to be tracked.

Bin NumberDifference Range
lower binsBigger negative differences
-4-15 - -8
-3-7 - -4
-2-3 - -2
-1-1
00
11
22 - 3
34 - 7
48 - 15
higher binsBigger positive differences

DIFFERENCE DISTRIBUTION can only be used in a STATISTIC. This example will output the difference distribution of destination ports for each source IP address every hour

    STATISTIC diffDistPorts
        DIFF DIST DPORT
        FOREACH SIP
        UPDATE 1 HOUR
    END STATISTIC