Primitives are what pipeline uses to calculate and aggregate state from the filtered flow records. They are the building blocks for evaluations and statistics. Statistics use only one primitive and periodically export the state based on a user-defined interval. Evaluations pair primitives and thresholds and send alerts when the aggregate state of a primitive meets the threshold requirement. Evaluations embed a primitive in a check, and there can be multiple checks whose values are "anded" together to produce an overall answer as to whether the evaluation succeeded, and an alert should be sent.

Each primitive is based on a field from a flow record from which it extracts a value to be aggregated. What the primitive does with this value is based on the type of primitive (outlined below).

The primitive's state can be aggregated based on all of the records, or can be divided into bins based on the value of the user defined field in the flow records. A typical example of this is keeping track of something per source IP address. This feature helps to identify IP addresses or ports involved in anomalous activity. (Mainly for ports and IPs, but works with all flow record fields). The field that is used to create the bins is referred to as the unique field, and is declared in an evaluation or statistic in the configuration file using the "foreach" command, followed by the field name.

When arithmetic primitives are used in an evaluation, a threshold is required. This threshold is compared against the aggregated state value either for entire collection system, or for each bin created with the FOREACH field. When using a FOREACH field, if the field or field list is four bytes or less, a SiLK bag can be used to set dynamic thresholds based on the value for the FOREACH field. If that field is not in the bag, that value and state are ignored. The syntax for bags is to replace the integer threshold with a quoted string filename of the bag.

There is a time aspect that affects how data is aggregated. Each primitive can be assigned a time window that indicates how long data from each flow record is to be counted in the aggregate state before it is timed out and subtracted. This allows the query of "alert if the count gets to 100 in any 5 minute interval" to be successfully answered. The time window value is given in seconds, minutes, or hours. A window of "forever" can also be used, using the keyword FOREVER instead of declaring an integer number of seconds.

For each primitive, the syntax for embedding it in a check for an evaluation and in a statistic is listed. When used in evaluations, the arithmetic primtiives: RECORD COUNT, SUM, AVERAGE, DISTINCT, and PROPORTION are grouped as threshold checks. Each check starts with the keyword CHECK followed by the type of check. It ends with the keywords END CHECK. Statistics only have one primitive, so they are simpler, so primitives do not need to be embedded in a check.

All of these primitives can be used to build evaluations, but only those specifically labeled can be used to build a statistic. Some primitives have specific requirements, such as being required to be the only one in an evaluation of statistic. These are laid out in each section, along with the memory consumption ramifications for each type. The number of bytes of state that each primitive keeps is listed. If the evaluation or statistic is binning up the state using FOREACH, that number of bytes will be multiplied by the number of unique values seen to get the total memory consumption. If no FOREACH is used, there is only one state value, no multiplier.

Each primitive has certain requirements for information provided, or restrictions on what is allowed. For example, the SUM of SIPs is nonsensical and is not permitted. These will be outlined below.

There may be some aspects of the configuration file that are set automatically by choosing a certain primitive. These will be mentioned below with each primitive when they arise.

Notes on TIME WINDOW

For many primitives, the state is aggregated over a user-specified time window. This window indicates how long data from each flow record is to be counted in the aggregate state before that records data is timed out and subtracted. This allows the query "Alert if the count gets to 100 in a 5 minute interval" to be successfully answered. The time window is specified with the TIME_WINDOW command followed by a list of number-time-unit pairs. The number may be an integer or a floating-point value. pipeline supports time units of MILLISECONDS, SECONDS, MINUTES, HOURS, or DAYS. For most primitives, any fractional seconds value is ignored. An infinite time window of can be specified by using the keyword FOREVER.

This keyword is not required for primitives. Omitting this keyword will cause Pipeline to delete its state after each file is processed. Specifying a time window of 0 is the same as omitting it.

Examples:

TIME_WINDOW 6 MINUTES
TIME_WINDOW 4 MINUTES 120 SECONDS # also 6 minutes
TIME_WINDOW 0.1 HOUR # also 6 minutes
TIME_WINDOW 30 SECONDS
TIME_WINDOW FOREVER

Pipeline can base its evaluations on a sliding window, allowing things such as "alert if a SIP sends out more than 10000 bytes in any 5 minute period". That 5 minute period is a sliding time window.

The 5 minutes are measured against "network time". The time is advanced based on the end times in the flows received. If there is a delay in the collection network, causing flows to arrive to pipeline "late", this time window does not get skewed, as it relies on the flows to advance this.

In addition to adding the new flows to the state, evaluations remove expired state (older than the time window), ensuring unwanted, or old, data does not improperly affect the comparison to the threshold.

In an evaluation, the TIME_WINDOW command appears in a CHECK block and applies to that particular primitive. In a statistic, the TIME_WINDOW command is in main body of the block.

Record Count

This primitive type counts the number of records that make it through the filter. It does not pull values from the records, so there is no need to specify a field in the configuration file.

This primitive uses 8 bytes for each state value kept.

Record count in a check

RECORD COUNT operator threshold

This example will send an alert if there are more than 100 records.

EVALUATION rcEval
    CHECK THRESHOLD
        RECORD COUNT > 100
    END CHECK
END EVALUATION

Record count in a statistic

Statistics do not have thresholds, and this primitive needs no field. This example will generate periodic alerts containing the number of records seen.

STATISTIC rcStat
    RECORD COUNT
END STATISTIC

Sum

This primitive pulls the value of the field specified in the configuration file from a record that passes the filter. These values are added together, and their sum is kept for evaluation. All check parameters are required for this check type.

The available fields for SUM are: BYTES, PACKETS, or DURATION .

This primitive uses 8 bytes for each state value kept.

Sum in a check

SUM field operator threshold

This example will generate an alert if the sum of BYTES is greater than or equal to 1000.

EVALUATION sumEval
    CHECK THRESHOLD
        SUM BYTES > 1000
    END CHECK
END EVALUATION

Sum in a statistic

Statistics do not have thresholds, so this primitive just needs a field. This example will generate periodic alerts contaning the sum of the number of packets seen.

STATISTIC sumStat
    SUM PACKETS
END STATISTIC

Average

The AVERAGE primitive is a combination of the sum and record count primitives: it computes the sum of the named volume field and counts the number of records, such that it can compute an average volume per record.

The available field for AVERAGE are BYTES, PACKETS, DURATION, or BYTES PER PACKET.

It uses 12 bytes for each state value kept.

Average in a check

AVERAGE field operator threshold

This example will generate an alert if the average of BYTES PER PACKET is less than 10.

EVALUATION avgEval
    CHECK THRESHOLD
        AVERAGE BYTES PER PACKET < 10
    END CHECK
END EVALUATION

Average in a statistic

Statistics do not have thresholds, so this primitive just needs a field. This example will generate periodic alerts containing the running average of the number of packets seen per flow.

STATISTIC avgStat
    AVERAGE PACKETS
END STATISTIC

Distinct

This primitive tallies the number of unique values of the specified field list that have passed the filter. All check parameters are required for this check type. An example of distinct is: "alert if there are 10 unique DIPs seen, regardless of how many times each DIP was contacted". This primitive can be used for statistics. Any number of fields can be combined to be counted in a field list, including the ANY fields, and pmap results (including pmaps using ANYs as keys).

The DISTINCT primitive is memory intensive as it keeps track of each distinct value seen and the time when that value was last seen (so that data can be properly aged). When paired with a FOREACH command, the primitive is even more expensive.

Distinct in a check

DISTINCT field operator threshold

This example will generate an alert if more than 50 DPORTs are seen

EVALUATION distinctEval
    CHECK THRESHOLD
        DISTINCT DPORT > 50
    END CHECK
END EVALUATION

Distinct in a statistic

Statistics do not have thresholds, so this primitive just needs a field. This exampe will generate periodic alerts containing the number of different {SIP, DIP} tuples seen.

STATISTIC distinctStat

    DISTINCT SIP DIP
END STATISTIC

Proportion

This primitive takes a field and a value for that field. It calculates the percentage of the flows that have that value for the specified field. This field includes the ANY fields, and pmap results.

The option of when to clear the state is automatically set to NEVER for PROPORTION.

This primitive uses 16 bytes per state value kept.

Proportion in a check

PROPORTION field fieldValue operator threshold PERCENT

This example will generate an alert if less than 33 percent of traffic is UDP.

EVALUATION propEval
    CHECK THRESHOLD
        PROPORTION PROTOCOL 17 < 33 PERCENT
    END CHECK
END EVALUATION

Proportion in a statistic

Statistics do not have thresholds, so this primitive just needs a field. This example will generate periodic alerts containing the percentage of traffic sent from source port 80. each SPORT.

STATISTIC propStat
    PROPORTION SPORT 80
END STATISTIC

Everything Passes

This primitive does not keep any state, it tells pipeline to simply output all flow records that pass the filter. This primitive is typically used evaluations that alert on watchlists because the watchlist check itself is done at the filter stage.

It must be the only check used in an evaluation and cannot use FOREACH.

Because there is no state kept, running an evaluation with an EVERYTHING_PASSES primitive has an insignificant effect on the memory usage.

It should be used with ALERT EVERYTHING as it is possible for two flow files to arrive within the same second with flows to alert on, and the default of ALERT SINCE LAST TIME could prevent flows from the second file to alert as they will be marked as having the same timestamp as "last time", so alerts for them will not be sent.

This primitive forces some Evaluation settings by default:

Everything passes in a check

There is no state to keep, so there is no additional information needed.

EVALUATION epEval
    CHECK EVERYTHING_PASSES
    END CHECK
END EVALUATION

Everything passes in a statistic

This primitive cannot be used in a statistic. To have pipeline periodically send out the number of flows that a filter identifies, use the RECORD COUNT primitive in s statistic.

Beacon

This primitive looks for beacons using SIP, DIP, DPORT, and PROTOCOL as the unique field. If flows show up with end times spaced out in intervals, longer than the user specified time, the four tuple and the record are put into an alert.

The user must provide the threshold of the minimum number of periodic flows to be seen before an alert is generated. Also, the minimum amount of time for the interval between flows. Lastly, the tolerance for the flow not showing up exactly interval seconds after the last flow.

Do not enter anything for the FOREACH field, it will be done for you. It is automatically set to never clear state upon success.

Beacon finding is very costly simply due to the number of permutations of the SIP DIP DPORT PROTOCOL tuples, and state is needed for each one.

Beacon in a check

CHECK BEACON
    COUNT minCount CHECK TOLERANCE integerPercent PERCENT
    TIME WINDOW minimumIntervalTimeVal
END CHECK

This example will look for beacons that are defined by the following characteristics: There are at least 5 flows with the same {SIP, DIP, DPORT, PROTOCOL} that arrives at a constant interval plus or minus 5 percent. And that interval must be at least 5 minutes.

EVALUATION beaconEval
    CHECK BEACON
        COUNT 5 CHECK_TOLERANCE 5 PERCENT
        TIME WINDOW 5 MINUTES
    END CHECK
END EVALUATION

Beacon in a statistic

The Beacon primitive cannot be used in a statistic.

Ratio

This primitive calculates the ratio of outgoing to incoming bytes. There are three options for grouping the bytes using the FOREACH field like other evaluations and statistics:

The direction of the traffic can be determined one of two ways:

The threshold must at least be that outgoing > incoming.

Ratio in a check

With the requirement that integer1 > integer 2

CHECK RATIO
    OUTGOING integer1 TO integer2
    LIST name of list from beacon # optional
END CHECK

The inside of the check reversed with equivalent results:

CHECK RATIO
    INCOMING integer2 TO integer1
    LIST name of list from beacon # optional
END CHECK

This example will generate an alert if the outgoing to incoming ratio is greater than 10 to 1, for a pair of IPs without using a beacon list.

EVALUATION ratioEval
    FOREACH SIP DIP
    CHECK RATIO
        OUTGOING 10 TO 1
    END CHECK
END EVALUATION

This example will generate an alert if the total bytes sent by an IP is 5 times as much as the number of bytes it receives, no matter who it's to or from.

EVALUATION ratioEval
    FOREACH ANY IP
    CHECK RATIO
        OUTGOING 10 TO 1
    END CHECK
END EVALUATION

Ratio in a statistic

This primitive cannot be used in a statistic

Iterative Comparison

This primitive has been removed in Version 5.3.

High Port Check

Syntax:

CHECK HIGH_PORT_CHECK
    LIST list-name
END CHECK

The HIGH_PORT_CHECK detects passive data transfer on ephemeral ports. As an example, in passive FTP, the client contacts the server on TCP port 21, and this is the control channel. The server begins listening on an ephemeral (high) port that will be used for data transfer, and the client uses an ephemeral port to contact the server's ephemeral port. Sometimes there are multiple ephemeral connections. Finally, all the connections are closed. Since flows represent many packets, typically the flow representing the traffic on port 21 is not generated until the entire FTP session is ended. As a result, the flow record for port 21 arrives after the flow records for the passive transfers.

To detect passive FTP, pipeline uses an internal list of all high port to high port five-tuples. When pipeline sees the port 21 flow record, it determines whether the IPs on that record appear in a five-tuple in the high port list. If a match is found, the traffic between the high ports is considered part of the FTP session.

When using a HIGH_PORT_CHECK check in an EVALUATION, there are several additional steps you must take:

  1. The FOREACH value must be set to the standard five tuple. The HIGH_PORT_CHECK check will set this value for you, and it will issue an error if you attempt to set it to any other value
  2. The filter that feeds the evaluation should look for TCP traffic on port 21.
        FILTER ftp-control
            ANY_PORT == 21
            PROTOCOL == 6
        END FILTER
  3. A second filter to match traffic between ephemeral ports is created. For example:
        FILTER passive-ftp
            SPORT > 1024
            DPORT > 1024
            PROTOCOL == 6
        END FILTER
  4. You must create an INTERNAL_FILTER block (see Section 1.4). This block uses the filter created in the previous step, and it must specify a list over pairs of source and destination IP addresses. For example:
        INTERNAL_FILTER passive-ftp
            FILTER passive-ftp
            SIP DIP high-port-ips 90 SECONDS
        END INTERNAL_FILTER

    The list does not need to be created explicitly; the internal filter will create the list if it does not exist.
  5. In the CHECK block, specify the name of the list that is part of the INTERNAL_FILTER. For example:
        CHECK HIGH_PORT_CHECK
            LIST high-port-ips
        END CHECK

Putting that together in the EVALUATION block, you have:

    EVALUATION passive-ftp
        FILTER ftp-control
        INTERNAL_FILTER passive-ftp
        CHECK HIGH_PORT_CHECK
             LIST high-port-ips
        END CHECK
    END EVALUATION

The HIGH_PORT_CHECK check is set to always clear the state upon success. This check uses a large amount of memory as the internal list maintains state for each flow record between two ephemeral ports.

This primitive cannot be used in a statistic.

Web Redirection

This primitive has been removed for version 5.3

Sensor Outage

An evaluation may operate on an input file as a whole, as opposed to operating on every record. This type of evaluation is called a file evaluation. It begins with FILE_EVALUATION and the name of the file evaluation being created. It ends with END FILE_EVALUATION.

The FILE_OUTAGE check only works within a FILE_EVALUATION. It alerts if pipeline has not received an incoming flow file from the listed sensor(s) in a given period of time.

The FILE_OUTAGE check is only allowed if there is a SiLK data source.

Syntax:

    CHECK FILE_OUTAGE
        SENSOR_LIST sensor-list
        TIME_WINDOW number time-unit     END CHECK

The TIME_WINDOW specifies the maximum amount of time to wait for a new sensor file to appear before alerting. The number can be an integer or a floating-point value. Valid time units are MILLISECONDS, SECONDS, MINUTES, HOURS, or DAYS. Fractional seconds are ignored. There is no default time window, and it must be specified.

The SENSOR_LIST names the sensors that you expect will generate a new flow file more often than the specified time window. This statement must appear in a SENSOR_LIST check. There are three forms for the statement:

Example: Alert if any or the sensors S0, S1, or S2 do not produce a flow files within two hours:

    FILE EVALUATION
        CHECK FILE_OUTAGE
             SENSOR_LIST [S0, S1, S2]
             TIME_WINDOW 2 HOURS
        END CHECK
    END FILE EVALUATION

Example: Alert if any sensor does not produce flow files within four hours:

    CHECK FILE_OUTAGE
        SENSOR_LIST ALL_SENSORS
        TIME_WINDOW 4 HOURS
    END CHECK

Difference Distribution

This primitive tracks the difference between sub sequent values for a specified field. It uses bins, the number of which is based on the length of the field, to keep track of the distribution of those differences. An 8-bit field has 17 bins, a 16 bit field has 33 bins, 32->65,and 64->129. The bins themselves are 16-bit numbers.

This primitive can only be used in Statistics. It can be used with any field, and can be combined with FOREACH.

The bin chosen to increment is relative to the middle of the array of bins. If there is no difference in the value, the middle bin is incremented. The bin number relative to the middle uses the following calculation: bin number = (log[base2] of the difference) + 1. If the new value is smaller than the old, then a "negative" bin offset is used, as decreases in the value need to be tracked.

Bin NumberDifference Range
lower binsBigger negative differences
-4-15 - -8
-3-7 - -4
-2-3 - -2
-1-1
00
11
22 - 3
34 - 7
48 - 15
higher binsBigger positive differences

DIFFERENCE DISTRIBUTION can only be used in a STATISTIC. This example will output the difference distribution of destination ports for each source IP address every hour

    STATISTIC diffDistPorts
        DIFF DIST DPORT
        FOREACH SIP
        UPDATE 1 HOUR
    END STATISTIC

Fast Flux

This primitive sends an alert if it detects a fast flux network. It builds a connected graph of ASN, domain name, ip address tuples. If the connected graph has at least a certain amount of ANSs, domain names, and ip addresses, based on user defined thresholds, that graph is considered to be a fast flux network. When pipeline alerts, the alert can contain all of the elements of the fast flux network (verbose alerting, which is default), or it can send just the number of ASNs, domain name, and ip addresses (existence alerting).

Fast Flux works with IPV6 values as well. Both the IP address field listed, and the pmap referenced must be IPV6.

This evaluation can only run if there is a pmap providing IP to ASN conversions, along with domain name and resolved IP address fields in the data. The check block in the evaluation must specify the information elements used for each field using the ASN(pmap file name), DNS (query name field), and IP_FIELD(resource record IP field) keywords. Each also needs the threshold for each to determine whether a connected graph is a fast flux network. The general recommended (though not default) thresholds are 5, 5, 5.

Given the available data sources, there are 2 types of records that pipeline can receive, that contain the right information. The easiest is just a flat ipfix record containing domain names and ip addresses (and whatever else) in the main part of the record. If this is the case specifying the elements and their thresholds is subtitlecient.

If the YAF data source is used, an additional line is needed in the config file. Because pipeline needs to ensure that the IP and DNS values are pulled from the same resource record embedded in a YAF record, the keyword YAF_DNS_RECORD_TYPE must be used in addition to the other element specifiers. The two options for this are: DNS_A_RECORD for IPv4, and DNS_AAAA_RECORD for IPv6. Due to some required renaming of elements, when using fast flux with YAF records, the IP field used is either named: rrIPv4 or rrIPv6 instead of YAF’s true exporting of: sourceIPv4Address and sourceIPv6Address. This removes the confusion of referencing the sip and dip in the main part of the flow records.

Even though this primitive can’t be used with a foreach value, which usually prevents output lists, the IPs and / or DNS values in any discovered fast flux network can be put into named lists. This is done by adding the keyword TO after the element specification, followed by the name of the list. This list can be treated just like any other named list, despite the internal differences described below.

As time passes, these graphs continue to grow and use more memory. The individual values do not timeout, so to prevent runaway memory usage, a node maximum must be provided. This is the count of unique values stored in the graph, not the number of ASN, dns, ip tuples. It is specified by: NODE_MAXIMUM . When the maximum is reached, all graphs are completely reset.

If the fast fluxing IP and / or DNS values are put into a list, they are not entirely removed when the NODE_MAXIMUM is reached. Each Fast Flux list has two lists underneath, one containing any active fast flux values, and one containing the values that were active the last time the node maximum was hit. This ensures that a node continually in an active fast flux network will always be in the list in spite of clearing state when NODE_MAXIMUM is hit.

Here is an example of fast flux with flat IPFIX records, with verbose alerting, and putting the DNS values into a list:

PMAP asn "pmaps/asnPmapForFastFlux.pmap"
...
EVALUATION ipfixFastFluxExample
...
    CHECK FAST FLUX
        IP_FIELD sourceIPv4Address 5
        ASN asn 5
        DNS dnsQName 5 TO myFastFluxDNSs
        NODE MAXIMUM 250000
    VERBOSE ALERTS #not required, defaults to verbose
    END CHECK
...
END EVALUATION

Here is an example of fast flux with YAF records, with existence alerting, and putting the IP values into a list:

PMAP asn "pmaps/asnPmapForFastFlux.pmap"
...
EVALUATION yafFastFluxExample
...
    CHECK FAST FLUX
        YAF_DNS_RECORD_TYPE DNS_A_RECORD
        IP_FIELD rrIPv4 5 TO fastFluxYafIPs
        ASN asn 5
        DNS dnsQName 5
        NODE MAXIMUM 250000
        EXISTENCE ALERTS
    END CHECK
...
END EVALUATION

Fast flux alerts can be very diㄦent depending on if Snarf is used versus the legacy alert files, and whether the evaluation is using existance alerts or verbose alerts. When the fast flux evaluation gets to the alerting stage, there could be multiple networks to alert on.

When using existance alerts, Pipeline will only report the number of IPs, ASNs, and DNS names that make up the fast flux network, not the contents. In the legacy auxiliary alert log file, there will be a seperate alert for each fast flux network discovered. The metric value in the alert will be a three tuple of the ip, asn, and dns count. When using snarf, all discovered fast flux networks will be listed in a single alert. There will be three metric fields labelled pipeline.metric.value.ipcount, pipeline.metric.value.asncount, and pipeline.metric.value.dnscount. If there are multiple networks alerted on in this single alert, the counts will be listed as parallel arrays with the first element of each array being tied together as one network, the second entries being connected and so on.

When using verbose alerts, the contents of the fast flux networks are included in the alert. The difference in alerting mechanisms are the same in the existance case, with snarf using three parallel arrays for values, and the legacy auxiliary alert file having a seperate entry for each network. The number of values in a particular list is given in parentheses before the curly bracketted list of values for parsing purposes.

PERSISTENCE

The PERSISTENCE primitive alerts if the tuple specified using FOREACH is present in the traffic for a specified number of consecutive time bins. Those bins can either be HOURS or DAYS. Unlike all other timestamps in Pipeline, 2 DAYS is not the same as 48 HOURS for this primitive. 2 DAYS creates two bins and the primitive checks to see whether traffic appears in each day's bin. 48 HOURS creates 48 bins and the primitive checks to see whether traffic appears in each hour's bin.

FOREACH must be used with this primitive

This primitive can only be used in EVALUATIONs

This does not alert as soon as it finds a tuple present for all of the consecutive time bins. This is the first primitive that sends a summary alert at the end of every time bin. If the primitive is tracking consecutive HOURS, an alert will be sent at the start of every new hour where there were tuples present in the previous number of specified time bins.

The primitive also needs to be told where to get the time value for each flow, to know which time bin to record the presence for. There are two options for the source of this time field:

Syntax:

Alert if an IP address appears as a SIP in 12 consecutive hours using the end time of each flow for the time value.

FOREACH SIP
CHECK PERSISTENCE
    12 CONSECUTIVE HOURS
    USING ETIME
END CHECK

Alert if an SPORT, PROTOCOL tuple 4 consecutive days using Pipeline time for the time value.

FOREACH SPORT PROTOCOL
CHECK PERSISTENCE
    4 CONSECUTIVE DAYS
    USING PIPELINE TIME
END CHECK