In order to support inspection of every SiLK flow record as the records are created, the NetSA group developed the Analysis Pipeline. The Analysis Pipeline supports many analyses, including:
Although the Analysis Pipeline application, pipeline, can simply be run from the command line, it is intended to be run as a daemon as part of the collection and packing process where it processes every SiLK flow record created by rwflowpack, just as the flow records are entering the SiLK data repository. (For information on installing pipeline, section 3.)
There are three stages to the Analysis Pipeline:
To assist in entering data and sharing data among multiple filters, the Analysis Pipeline allows the administrator to create a list. A list can reference an existing SiLK IPset file, contain values entered directly into the configuration file, or be created by a mechanism inside pipeline itself.
Filters, evaluations, statistics, and lists are all built independently of each other, with each having a unique name. They are linked together using configuration keywords and their unique names.
Any number of evaluations and statistics can receive the records held by each filter. However, evaluations and statistics can only have one filter providing flow records to it.
An additional concept in the Analysis Pipeline is an internal filter. Internal filters can be used to store intermediate successes based on flow records and filters. These .successes. are not major enough to yield individual alerts, but can shape future processing. Internal filters are used when the analysis is more complex than simply passing a filter and transferring data to evaluations, and they allow for multistage analysis: Flow record A met criteria, and future flow records will be combined with record A for more in-depth analysis.
All fields in a SiLK flow record can be used to filter data, along with some derived fields. Currently pipeline only supports IPv4 addresses. The field names in bold below are the proper syntax for referencing the fields in the configuration file
Fields in the list below can be combined into tuples, e.g. {SIP, DIP}, for more advanced analysis. These tuples are represented in the configuration file by listing the fields with spaces between them. When processed, they are sorted internally, so SIP DIP SPORT is the same as SPORT DIP SIP.
IP addresses and ports have directionality, source and destination. The keyword ANY can be used to indicate that the direction does not matter, and both values are to be tried (This can only be used when filtering). The ANY * fields can go anywhere inside the field list, the only restrictions are that the ANY must immediately precede IP, PORT, IP PAIR, or PORT PAIR, and that there are can only be one ANY in a field list. The available fields are:
ANY_IP | Either the source address of destination address |
ANY_IP_PAIR | Either the {SIP, DIP} tuple or the {DIP, SIP} tuple |
ANY_PORT | Either the source port or the destination port |
ANY_PORT_PAIR | Either the {SPORT, DPORT} tuple or the {DPORT, SPORT} tuple |
APPLICATION | The service port of the record as set by the flow generator if the generator supports it, or 0 otherwise. For Example, this owuld 80 if the flow generator recognizes the packets as being part of an HTTP session |
ATTRIBUTES | Any combination of the letters F, T, or C, where:
|
BYTES | The count of the number of bytes in the flow record |
BYTES PER PACKET | An integer division of the bytes field and the packets field. It is a 32-bit number. The value is 0 if there are no packets |
CLASSNAME | The class name assigned to the record. Class are defined in the silk.conf file |
DIP | The destination IP address |
DPORT | The destination port |
DURATION | The dureation of the flow record, in integer seconds. This is the difference between ETIME and STIME |
ETIME | The wall clock time when the flow generator closed the flow record |
FLAGS | The union of the TCP flags on every packet that comprises the flow record. The value can contain any of the letters F, S, R, P, A, U, E, and C. (To match records with either ACK or SYN|ACK set, use the IN_LIST operator.) The flags formatting used by SiLK can also be used to specify a set of flags values. S/SA means to only care about SYN and ACK, and of those, only the SYN is set. The original way Pipeline accepted flags values, the raw specification of flags permutation is still allowed. |
FLOW RECORD | This field references the entire flow record, and can only be used when checking the flow record against multiple filters using IN LIST (see below) |
ICMPCODE | The ICMP code. This test also adds a comparison that the protocol is 1. |
ICMPTYPE | The ICMP type. This test also adds a comparison that the protocol is 1. |
INITFLAGS | The TCP flags on the first packet of the flow record. See FLAGS. |
INPUT | The SNMP interface where the flow record entered the router. This is often 0 as SiLK does not normally store this value. |
NHIP | The next-hop IP of the flow record as set by the router. This is often 0.0.0.0 as SiLK does not normally store this value. |
OUTPUT | The SNMP interface where the flow record exited the router. This is often 0 as SiLK does not normally store this value. |
PACKETS | The count of the number of packets. |
PMAP | See pmap section for details |
PROTOCOL | The IP protocol. This is an integer, e.g. 6 is TCP |
SENSOR | The sensor name assigned to the record. Sensors are defined in the silk.conf file. |
SESSIONFLAGS | The union of the TCP flags on the second through final packets that comprise the flow record. See FLAGS |
SIP | The source IP address |
SPORT | The source port |
STIME | The wall clock time when the flow generator opened the flow record |
TYPENAME | The type name assigned to the record. Types are defined in the silk.conf file. |
Prefix Maps (pmaps) are part of the SiLK tool suite and can be made using rwpmapbuild. Their output can be used just like any other field in pipeline. It can make up part of a tuple, be used in FOREACH, and be used in filtering. One caveat about pmaps being used to make up a tuple in field lists, is that the pmap must be listed first in the list for proper parsing. However, when referencing pmap values in a typeable tuple, it must go at the end. PMAPs take either an IP address, or a PROTOCOL PORT pair as inputs.
Using a PMAP in Pipeline is a two stage process in the configuration file. The first step is to declare the pmap. This links a user-defined field name to a pmap file, with the name in quotes. This field name will be used in alerts to reference the field, and in the rest of the configuration file to reference the pmap.
The declaration line is not part of a FILTER or EVALUATION, so it is by iteself, similar to the INCLUDE statements. The declaration line starts with the keyword PMAP, followed by a string for the name without spaces, and lastly, the filename in quotes.
PMAP userDefinedFieldName "pmapFilename"
Now that the PMAP is declared, the field name can be used throughout the file. Each time the field is used, the input to the pmap must be provided. This allows different inputs to be used throughout the file, without redeclaring the pmap.
userDefinedFieldName(inputFieldList)
For each type of pmap, there is a fixed list of inputFieldLists:
Below is an example that declares a pmap, then filters based on the result of the pmap on the SIP, then counts records per pmap result on the DIP
PMAP thePmapField "myPmapFile.pmap"
FILTER onPmap
thePmapField(SIP) == theString
END FILTER
STATISTIC countRecords
FILTER onPmap
FOREACH thePmapField(DIP)
RECORD COUNT
END STATISTIC
Field booleans are custom fields that consist of an existing field and a list of values. If the value for the field is in the value list, then the field boolean’s value is TRUE. These are defined similar to PMAPs, but use the keyword FIELD BOOLEAN. For example, to define a boolean named webPorts, to mean the source port is one of [80, 8080]:
FIELD BOOLEAN sourceTransportPort webPorts IN [80, 8080]
Now, webPorts is a field that can be used anywhere in the configuration file that checks whether the sourceTransportPort is in [80, 8080].
If used in filtering, this is the same as saying: sourceTransportPort IN LIST [80, 8080].
However, if used as a part of FOREACH, the value TRUE or FALSE will be in the field list, to indicate whether the sourceTransportPort is 80 or 8080.
Another example could be a boolean to check whether the hour of the day, derived from a timestamp, is part of the work day. There could be a statistic constructed to report byte counts binned by whether the hour is in the workday, which is 8am to 5pm in this example.
FIELD BOOLEAN HOUR_OF_DAY(flowStartSeconds)
workday IN [8,9,10,11,12,13,14,15,16,17]
STATISTIC workdayByteCounts
FOREACH workday
SUM octetTotalCount
END STATISTIC
These derived fields pull out human readable values from timestamps. The values they pull are just integers, but in filters, pipeline can accept the words associated with those values, e.g. JANUARY is translated to 0, as is SUNDAY. These fields work with field types: DATETIME_SECONDS, DATETIME_MILLISECONDS, DATETIME_MICROSECONDS, DATETIME_NANOSECONDS. Each will be converted to the appropriate units for processing. The system’s timezone is used to calculate the HOUR value.
The field to be operated on is put in parentheses after the derived field name.
These fields can be used anywhere in a pipeline configuration file like any other field.
The field to be operated on is put in parentheses after the derived field name.
These fields can be used anywhere in a pipeline configuration file like any other field.
The Analysis Pipeline passes each flow record through each filter to determine whether the record should be passed on to an evaluation or statistic. There can be any number of filters, and each runs independently. As a result, each filter sees every flow record, and keeps its own list of flows that meet its criteria.
A filter block starts with the FILTER keyword followed by the name of the filter, and it ends with the END FILTER statement. The filter name must be unique across all filters. The filter name is referenced by evaluations, internal filters, and statistics.
Filters are initially marked internally as inactive, and become active when an evaluation or statistic references them.
Filters are composed of comparisons. In the filter block, each comparison appears on a line by itself. If all comparisons in a filter return a match or success, the flow record is sent to the evaluation(s) and/or statistic(s) that use the records from that filter.
If there are no comparisons in a filter, the filter reports success for every record.
Each comparison is made up of three elements: a field, an operator, and a compare value, for example BYTES > 40. A comparison is considered a match for a record if the expression created by replacing the field name with the field.s value is true.
Eight operators are supported. The operator determines the form that the compare value takes.
The name of a list that is filled by the outputs of an evaluation, or an internal filter. This is the only place in pipeline filters where tuples can be used. The tuple in the filter must entirely match the tuple used to fill the list.
SIP DIP PROTO SPORT DPORT IN LIST createdListOfFiveTuples
The filename of the IPset file is given in quotation marks as the compare value. When pipeline is running as a daemon, the full path to the IPset file must be used. This only can only be used with IP addresses.
SIP IN LIST "/data/myIPSet.set"
DPORT IN_LIST [21, 22, 80]
fieldList IN LIST "/path/to/watchlist.file"
If the fieldList consists of one field and if it is of type IPV4_ADDRESS or IPV6_ADDRESS, the file MUST be a SiLK IPSet. A fieldList of just an IP cannot be any of the types described below.
A file can be used to house both types of bracketed lists described above, both the single and double bracketed lists. This has to be formatted exactly as if it was typed directly into the config file. The format is such that a user should be able to copy and paste the contents of files in this format into the config file and vice versa. The single line (there cannot be any newline characters in the list) of the bracketed list much have a new line at the end.
If the fieldList consists of a single field, a simple watchlist file can be used to hold the values. This format requires one value per line. The format of each value type is the same as if it was typed into the configuration file. Comments can be used in the file by setting the first character of the line to "#". The value in the field being compared against the watchlist must be an exact match to an entry in the file for the comparison to be true.
If there is a single field in the fieldList, and if that is an IP address, this bracketed list can contain IPSet files mixed with IP addresses that will all be combined for the filter:
SIP IN LIST ["/data/firstIPset.set", 192.168.0.0/16, "/data/secondIPset.set"]
An example is filtering for sip 1.1.1.1 with sport 80, and
2.2.2.2 with sport 443:
FILTER sipSportPair
SIP SPORT IN LIST [[1.1.1.1,80],
[2.2.2.2,443]]
END FILTER
For example, to do TCP sport 80 OR UDP dport 23:
FILTER tcp80
SPORT == 80
PROTOCOL == 6
END FILTER
FILTER udp23
DPORT == 23
PROTOCOL == 17
END FILTER
FILTER filterUsingTcp80OrUdp23
FLOW RECORD IN LIST [tcp80,udp23]
END FILTER
== Succeeds when the value from the record is equal to the compare value. This also encompasses IPv4 subnets. For example, the following will succeed if either the source or destination IP address is in the 192.168.x.x subnet:
ANY_IP == 192.168.0.0/16
The compare value can reference another field on the flow record. For example, to check whether the source and destination port are the same, use: SPORT == DPORT
There are two places where named lists can be created and populated so they can be used by filters: Internal Filters and Output Lists (which are discussed in evaluation specifics.
In each case, a field list is used to store the tuple that describes the contents of the data in the list. A filter can use these lists if the tuple used in the filters perfectly matches the tuple used to make the list.
An internal filter compares the incoming flow record against an existing filter, and if it passes, it takes some subset of fields from that record and places them into a named list. This list can be used in other filters. There can be any number of these lists.
Internal filters are different from output lists, because they put data into the list(s) immediately, so this contents of the list(s) can be used for records in the same flow file as the one that causes data to be put into the list(s). Output lists, populated by evaluations, are only filled, and thus take effect, for the subsequent flow files.
Internal filters are immediate reactions to encountering a notable flow record.
The fields to be pulled from the record and put into the list can be combined into any tuple. These include the ANY fields, and the output of Pmaps. The "WEB_REDIR" fields cannot be used here. Details on how to create an internal filter for specific use for WEB_REDIRECTION or HIGH_PORT_CHECK primitives is discussed below.
An internal filter is a combination of filters and lists, so both pieces need to be specified in the syntax. A key aspect of the internal filter declaration is to tell it which fields pulled from records that pass the filter, get put into which list. There can be more than one field-list combination per internal filter.
It is recommended that a timeout value be added to each statement which declares the length of time a value can be considered valid, but it is no longer required. To build a list from an internal filter without a timeout, leave the timeout portion of the configuration file blank.
Syntax
INTERNAL_FILTER name of this internal
filter
FILTER name of filter to use
fieldList list name timeout
END INTERNAL FILTER
Example, given an existing filter to find records to or from watchlist
INTERNAL_FILTER watchlistInfo
FILTER watchlistRecords
SPORT DPORT watchlistPorts 1 HOUR
SIP DIP SPORT DPORT PROTOCOL
watchlistFiveTuples 1 DAY
END INTERNAL_FILTER
This internal filter pulls {SPORT,DPORT} tuples from flows that pass the filter watchlistRecords, and puts them into a list called watchlistPorts, and those values stay in the list for 1 hour. It also pulls the entire five tuple from those records and puts then into a list called watchlistFiveTuples that stay in the list for 1 DAY.
WEB_REDIRECTION and HIGH_PORT_CHECK require the use of internal filters as they scan for flow records to compare against that can be in the same flow file. The field list for each of these lists are keywords, that in addition to indicating the fields to be stored, tells pipeline how to store them. The keywords are WEB_REDIR_LIST and HIGH_PORT_LIST respectively.
Available operators to compare state values with thresholds include: <, <=, >, >=, and !=.
RECORD COUNT |
CHECK THRESHOLD RECORD COUNT oper threshold END CHECK |
Count of flows seen by primitive |
SUM |
CHECK THRESHOLD SUM field oper threshold END CHECK |
Sum of the values for the given field. |
AVERAGE |
CHECK THRESHOLD AVERAGE field oper threshold END CHECK |
Average of the values for the given field. |
DISTINCT |
CHECK THRESHOLD DISTINCT field oper threshold END CHECK |
Count of distinct values for the given field. Field can be a field list to count distinct tuples. |
PROPORTION |
CHECK THRESHOLD PROPORTION field value oper threshold END CHECK |
Proportion of flows seen with the given value seen for the given field |
EVERYTHING PASSES |
CHECK EVERYTHING PASSES END CHECK |
Alert on every flow |
BEACON |
CHECK BEACON COUNT minCount CHECK TOLERANCE int PERCENT TIME WINDOW minIntervalTimeVal END CHECK |
Finite State Beacon Detection |
RATIO |
CHECK RATIO OUTGOING integer1 TO integer2 LIST name of list from beacon # optional END CHECK |
Detect IP pairs with more outgoing than incoming traffic |
ITERATIVE COMPARISON | This primitive has been removed for Version 4.5 | |
HIGH PORT CHECK |
CHECK HIGH_PORT_CHECK LIST listName END CHECK |
Look for passive traffic |
WEB REDIRECTION | This primitive has been removed for Version 4.5 | |
SENSOR OUTAGE |
CHECK FILE_OUTAGE SENSOR_LIST [list of sensor names] TIME_WINDOW time units END CHECK |
Alert if a sensor stops sending flows |
DIFFERENCE DISTRIBUTION |
STATISTIC diffDistExample DIFF DIST field END STATISTIC |
Output difference distribution (Statistic only) |
Evaluations and statistics comprise the second stage of the Analysis Pipeline. Each evaluation and statistic specifies the name of a filter which feeds records to the evaluation or statistic. Specific values are pulled from those flow records, aggregate state is accumulated, and when certain criteria are met alerts are produced.
To calculate and aggregate state from the filtered flow records, pipeline uses a concept called a primitive .
Evaluations are based on a list of checks that have primitives embedded in them. The aggregate state of the primitive is compared to the user defined threshold value and alerts are generated.
Statistics use exactly one primitive to aggregate state. The statistic periodically exports all of the state as specified by a user-defined interval.
New to version 4.2, if a statistic is utilizing FOREACH, and the state for a particular unique value bin is empty, the value will not be included in an alert for the statistic. A statistic without FOREACH, will output the state value no matter what.
An evaluation block begins with the keyword EVALUATION followed by the evaluation name. Its completion is indicated by END EVALUATION.
Similarly, a statistic block begins with the keyword STATISTIC and the statistic's name; the END STATISTIC statement closes the block.
The remainder of this section describes the settings that evaluations and statistics have in common, and the keywords they share. A description of primitives will hopefully make the details of evaluations and statistics easier to follow.
Each of the following commands (except ID) go on their own line.
Each evaluation and statistic must have a unique string identifier. It can have letters (upper and lower case) and numbers, but no spaces. It is placed immediately following the EVALUATION or STATISTIC declaration:
EVALUATION myUniqueEvaluationName
...
END EVALUATION
STATISTIC myUniqueStatisticName
...
END STATISTIC
The ALERT TYPE is an arbitrary, user-defined string. It can be used as a general category to help when grouping or sorting the alerts. If no alert type is specified, the default alert type for evaluations and statistics is Evaluation. and Statistic, respectively. The value for the alert type does not affect pipeline processing.
Syntax:
ALERT TYPE alert-type-string
Evaluations and statistics must be assigned a severity level which is included in the alerts they generate. The levels are represented by integers from 1 to 255. The severity has no meaning to the Analysis Pipeline; the value is simply recorded in the alert. The value for the severity does not affect pipeline processing. This field is required.
Syntax:
SEVERITY integer
Evaluations and statistics (but not file evaluations) need to be attached to a filter, which provides them flow records to analyze. Each can have one and only one filter. The filter's name links the evaluation or statistic with the filter. As a result, the filter must be created prior to creating the evaluation or statistic.
Syntax:
FILTER filter-name
Evaluations and statistics can compute aggregate values across all flow records, or they can aggregate values separately for each distinct value of particular field(s) on the flow records-grouping or "binning" the flow records by the field(s). An example of this latter approach is computing something per distinct source address.
FOREACH is used to isolate a value (a malicious IP address), or a notable tuple (a suspicious port pair). The unique field value that caused an evaluation to alert will be included in any alerts. Using FOREACH in a statistic will cause the value for every unique field value to be sent out in the periodic update.
The default is not to separate the data for each distinct value. The field that is used as the key for the bins is referred to as the unique field, and is declared in the configuration file for the FOREACH command, followed by the field name:
FOREACH field
Any of the fields can be combined into a tuple, with spaces between the individual field names. The more fields included in this list, the more memory the underlying primitives need to keep all of the required state data.
The ANY IP and ANY PORT constructs can be used here to build state (maybe a sum of bytes) for both ips or ports in the flow. The point of this is to build some state for an IP or PORT regardless of whether it's the source or destination, just that it appeared. When referencing the IP or PORT value to build an output list, use SIP or SPORT as the field to put in the list.
Pmaps can also be used to bin state. The state is binned by the output of the pmap. Pmaps can also be combined with other fields to build more complex tuples for binning state, such as pmap(SIP) PROTOCOL
To keep state per source IP Address:
FOREACH SIP
To keep state per port pair:
FOREACH SPORT DPORT
To keep state for both ips:
FOREACH ANY IP
As with filtering, the ordering of the fields in the tuple does not matter as they are sorted internally.
There are some limits on which fields can be used as some evaluations require certain that a particular field be used, and some primitives do not support binning by a field.
File evaluations do not handle records, so the FOREACH statement is illegal.
By default, evaluations and statistics are marked as active when they are defined. Specifying the INACTIVE statement in the evaluation or statistic block causes the evaluation or statistic to be created, but it is marked inactive, and it will not be used in processing records. For consistency, there is also an ACTIVE statement which is never really needed.
Syntax:
INACTIVE
This section provides evaluation-specific details, building on the evaluation introduction and aggregate function description provided in the previous two sections.
Each evaluation block must contain one or more check blocks. The evaluation sends each flow record it receives to each check block where the records are aggregated and tests are run. If every check block test returns a true value, the evaluation produces an output entry which may become part of an alert.
Evaluations have many settings that can be configured, including the output and alerting stages, in addition to evaluation sanity checks.
In an evaluation, the check block begins with the CHECK statement which takes as a parameter the type of check. The block ends with the END CHECK statement. If the check requires any additional settings, those settings are put between the CHECK and END CHECK statements, which are laid out in the primitives section.
The FILE_OUTAGE check must be part of a FILE_EVALUATION block. All other checks must be part of a EVALUATION block.
When an evaluation threshold is met, the evaluation creates an output entry. The output entry may become part of an alert, or it may be used to affect other settings in the pipeline.
Output Timeouts
All information contained in alerts is pulled from lists of output entries from evaluations. These output entries can be configured to time out both to conserve memory and to ensure that information contained in alerts is fresh enough to provide value. The different ways to configure the alerting stage are discussed below.
One way to configure alerting is to limit the number of times alerts can be sent in a time window. This is a place where the output timeout can have a major effect. If alerts are only sent once a day, but outputs time out after one hour, then only the outputs generated in the hour before alerting will be eligible to be included in alerts.
When FOREACH is not used, output entries are little more than flow records with attached threshold information. When FOREACH is used, they contain the unique field value that caused the evaluation to return true. Each time this unique field value triggers the evaluation, the timestamp for that value is reset and the timeout clock begins again.
Taking an example of an evaluation doing network profile that is identifying servers. If the output timeout is set to 1 day, then the list of output entries will contain all IP addresses that have acted like a server in the last day. As long as a given IP address is acting like a server, it will remain in the output list and is available to be included in an alert, or put in a named output list as described in the output list section.
Syntax:
OUTPUT TIMEOUT timeval
OUTPUT TIMEOUT 1 DAY
Shared Output Lists
When FOREACH is used with an evaluation, any value in an output entry can be put into a named output list. If the unique field is a tuple made up of multiple fields, any subset of those fields can be put into a list. There can be any number of these lists. A timeout value is not provided for each list as the OUTPUT TIMEOUT value is used. When an output entry times out, the value, or subset of that tuple is removed from all output lists that contain it.
These lists can be referenced by filters, or configured seperately, as described in the list configsection.
To create a list, a field list of what the output list will contain must be provided. A unique name for this list must be provided as well.
Syntax:
OUTPUT LIST fieldList listName
If using FOREACH SIP DIP, each of the following lists can be created
OUTPUT LIST SIP listOfSips
OUTPUT LIST DIP listOfDips
OUTPUT LIST SIP DIP listOfIPPairs
Alert on Removal
If FOREACH is used, pipeline can be configured to send an alert when an output has timed out from the output entries list.
Syntax:
ALERT ON REMOVAL
Clearing State
Once the evaluation’s state has hit the threshold and an output entry has been generated, you may desire to reset the current state of the evaluation. For example, if the evaluation alerts when a count of something gets to 1000, you might want to reset the count to start at 0 again. Using CLEAR ALWAYS can give a more accurate measure of timeliness, and is likely to be faster.
To set the value of when to clear state, simply type on of the following into the body of the evaluation.
CLEAR ALWAYS
CLEAR NEVER
This field is now required as of v4.3.1. CLEAR NEVER used to be the default.
Too Many Outputs
There are sanity checks that can be put in place to turn off evaluations that are finding more outputs than expected. This could happen from a poorly designed evaluation or analysis. For example, an evaluation looking for web servers may be expected to find less then 100, so a sanity threshold of 1000 would indicate lots of unexpected results, and the evaluation should be shut down as to not take up too much memory or flood alerts.
Evaluations that hit the threshold can be shutdown permanently, or go to sleep for a specified period of time, and turned back on. If an evaluation is shut down temporarily, all state is cleared and memory is freed, and it will restart as if pipeline had just begun processing.
Syntax:
SHUTDOWN MORE THAN integer OUTPUTS [FOR timeval]
Examples to shutdown if there are more than 1000 outputs. One shuts it down forever, and the other shuts it down for 1 day and starts over
SHUTDOWN MORE THAN 1000 OUTPUTS
SHUTDOWN MORE THAN 1000 OUTPUTS FOR 1 DAY
Alerting is the final stage of the Analysis Pipeline. When the evaluation stage is finished, and output entries are created, alerts can be sent. The contents of all alerts come from these output entries. These alerts provide information for a user to take action and/or monitor events. The alerting stage in pipeline can be configured with how often to send alerts and how much to include in the alerts.
Based on the configurations outlined below, the alerting stage first determines if it is permitted to send alerts, then it decides which output entries can be packaged up into alerts.
How often to send alerts
Just because there are output entries produced by an evaluation does not mean that alerts will be sent. An evaluation can be configured to only send a batch of alerts once an hour, or 2 batches per day. The first thing the alerting stage does is check when the last batch of alerts were sent, and determine if sending a new batch meets the restrictions placed by the user in the configuration file.
If it determines that alerts can be sent, it builds an alert for each output entry, unless further restricted by the next section that affect how much to alert.
Syntax:
ALERT integer-count TIMES timeVal
This configuration option does not affect the number of alerts sent per time period, if affects the number of times batches of alerts can be sent per time period. That is why the configuration command says "alert N times per time period", rather than "send N alerts per time period", while the semantic differences are subtle, it has a great affect on what gets sent out.
To have pipeline send only 1 batch of alerts per hour, use:
ALERT 1 TIMES 1 HOUR
To indicate that pipeline should alert every time there are output entries for alerts, use:
ALERT ALWAYS
How much to alert
The second alert setting determines how much information to send in each alert. You may wish to receive different amounts of data depending on the type of evaluation and how often it reports. Consider these examples:
The amount of data to send in an alert is relevant only when the OUTPUT_TIMEOUT statement includes a non-zero timeout and multiple alerts are generated within that time window.
To specify how much to send in an alert, specify the ALERT keyword followed by one of the following:
The default is SINCE LAST TIME. If using an EVERYTHING PASSES evaluation, be sure to use ALERT EVERYTHING to ensure flows from files that arrive with less than a second between them are included in alerts.
The last option is to have an evaluation do its work, but to never send out alerts. If the goal of an evaluation is just to fill up a list so other pieces of pipeline can use the results, individual alerts may not be necessary. Another case is that the desired output of filling these lists is that the lists send alerts periodically, and getting individual alerts for each entry is not ideal. In these cases, instead of the options described above use:
DO NOT ALERT
The Evals and Stats section introduced the Analysis Pipeline concept of a statistic and described the settings that statistics share with evaluations. A statistic receives flow records from a filter, computes an aggregate value, and periodically reports that value.
There are two time values that affect statistics: how often to report the statistics, and the length of the time-window used when computing the statistics. The following example reports the statistics every 10 minutes using the last 20 minutes of data to compute the statistic:
UPDATE 10 MINUTES
TIME_WINDOW 20 MINUTES
Statistics support the aggregation functions from the primitives section. Unlike an evaluation, a statistic is simply reporting the function's value, and neither the CHECK statement nor a threshold value are used. Instead, the statistic lists the primitive and any parameters it requires.
Simple examples are:
Statistics send alerts after the specified time period has elapsed. One exception to this is if Pipeline is processing a list of files using --named-files and there is only a single file in this list. In this case, a Statistic will send an alert for testing and summary purposes, even though technically no time has passed.
Named lists created by internal filters and evaluations can be given extra configuration such that they are responsible for sending updates and alerts independent or in lieu of the mechanism that populates them. If there is a list configuration block, there does not need to be an evaluation block for the configuration file to be accepted. As long as something in pipeline generates alerts, it will run.
Lists created by internal filters have their own timeouts, so they are responsible for removing out-dated elements on their own. Lists populated by evaluations keep track of the timing out of values within the evaluation, and tell the list to remove a value, so those lists know nothing of the timeouts. A result of this is that due to efficiency concerns, some of the alerting functionality described below is not available for lists created and maintained by internal filters. It is explicitly stated which features cannot be used.
This extra configuration is surrounded in a LIST CONFIGURATION block, similar to other pipeline mechanisms. The list to configure must already have been declared before the configuration block.
High Level Syntax:
LIST CONFIGURATION existingListName
options discussed
below
END LIST CONFIGURATION
Alerts sent due to a list configuration come from the lists, and have their own timestamps and state kept about their alerts. They are not subject to the alerting restrictions imposed on the evaluations that populate the list.
The full contents of the list can be packaged into one alert, periodically:
UPDATE timeval
This will send out the entire list every 12 hours
UPDATE 12 HOURS
An alert can be sent if the number of elements in the list meets a certain threshold, as it's possible that while the contents are important, and can be configured to be sent periodically, knowing the count got above a threshold could be more time sensitive.
ALERT MORE THAN elementThreshold ELEMENTS
This alert will only be sent the first time the number of elements crosses the threshold. There can also be a reset threshold that if the number of elements drops below this value, pipeline will once again be allowed to send an alert if the number of elements is greater than the alert threshold. There is no alert sent upon going below the reset threshold. The elements in the are no reset by this either.
ALERT MORE THAN elementThreshold ELEMENTS RESET AT resetThreshold
This example will send an alert if there are more than 10 elements in the list. No more alerts will be sent unless the number of elements drops below 5, and then it will alert is the number of elements goes above 10 again.
ALERT MORE THAN 10 ELEMENTS RESET AT 5
Pipeline can send an alert any time a value is removed form the list.
ALERT ON REMOVAL
Alerting on removal cannot be used by lists created by internal filters
Lists used to hold SIP, DIP, or NHIP can be given a set of initial values by providing an ipset file. Only IP's can be used with seedfiles.
SEED pathToIPSetFile
If a seedfile is provided, and the list of configured to send periodic updates, it can be configured to overwrite that seedfile with the current contents of the list. This allows that file to always have the most up to date values.
OVERWRITE ON UPDATE
As with evaluations, lists can be configured to shut down if they become filled with too many elements. This is provided as a sanity check to let the user know it the configuration has a flaw in the analysis. If the number of elements meets the shutdown threshold, an alert is sent, the list is freed, and is disconnected from the mechanism that had been populating it.
SHUTDOWN MORE THAN shutdownThreshold ELEMENTS
As with evaluations, a severity level can be provided to give context to alerts. It is not used during processing, but included in alerts sent from the lists.
SEVERITY integerSeverity
Named lists, and ipset files, can now be linked such that if an element is added to all of the lists in the bundle, Pipeline can send an alert, and if desired, add that element to another named list, which can be used in a LIST CONFIGURATION block described above.
The lists referenced in the list bundle must already have be created in the configuration file. All lists must be made up of the same fields. An IPSet file can be added to the bundle, provided that the field for the lists is SIP or DIP, and must be put in quotation marks.
High Level Syntax:
LIST BUNDLE listBundleNameEach list to be added to the bundle goes on its own line. This list must be created already in the configuration file by an evaluation or internal filter. If this list is to be made from an IPSet file, it must be in quotes.
Once an element has been found to be in all of the lists in a bundle, it is then able to be put in a new named list. This list can be used in LIST CONFIGURATION just like any other named list. There is no timeout needed for this, as the element will be removed from this list if it is removed from an element in the bundle.
OUTPUT LIST nameOfNewList
As with evaluations, a severity level must be provided to give context to alerts. It is not used during processing, but included in alerts sent from the lists.
SEVERITY integerSeverity
As with evaluations, you can force the list bundle to not alert, as maybe you just want the values that meet the qualifications of the list bundle to be put into another named list (using OUTPUT LIST above) , and get alerts of the contents that way. Just add DO NOT ALERT to the list of statements for the list bundle.
Let's say an evaluation creates a list named myServers, and an internal filter creates a list called interestingIPs, and there is an IPSet file is note named notableIPS.set. To include these lists in a bundle, and to put any IP that is in all lists into a new list named reallyImportantIPs, use the following:
LIST BUNDLE myExampleBundle