YAF: Use Cases & Tutorials


Tutorials for installation of YAF with other tools in the NetSA suite.

Configuring YAF with SiLK

How-to guide on configuring YAF with SiLK.

Indexing Packet Capture (PCAP) Files with YAF

How-to guide on using YAF to index large PCAP files. Basic flow analysis with SiLK and other tools are discussed.

Rolling Packet Capture (PCAP) Export with YAF

How-to guide on enabling rolling PCAP and metadata index generation in YAF. Analysis with SiLK and other tools are discussed.

Configuring YAF and Super Mediator

How-to guide on using YAF and super_mediator to collect DPI data and import that data into a MySQL database. SiLK flow collection is also described.

Configuring YAF and Pipeline

How-to guide on using YAF and super_mediator to feed Analysis Pipeline.

Use Cases for YAF and Super Mediator:

Feeding both Analysis Pipeline and SiLK

Historically, YAF output was submitted directly to a storage system, such as SiLK’s packing suite. Analysts requested a capability to duplicate flow data and convert the records into formats more useful for a variety of tools. In response, an IPFIX Super Mediator was introduced to provide for multiple data streams, in a variety of formats. One of the simplest cases for these data streams is a two-pronged structure, where YAF feeds super_mediator, and the super_mediator feeds separately the storage system and also an instance of Analysis Pipeline to produces stream-oriented analysis.

Building DPI Information Archives

Some network information is transient: DNS resolutions change, sometimes frequently; SSL certificates are updated, or instantiated for new hosts; URLs are introduced and dropped, sometimes frequently. To interpret a variety of network behaviors, including attacks, analysts may need to understanding the network information current at the time. YAF output can be filtered to extract DPI fields, which includes this network information. As this information is exported from YAF, it can be recorded in a time-stamped archive, which can then be queried when it is needed in an analysis. YAF is run with --applabel, a --max-payload of at least 384 octets, and invoking the dpacketplugin. These parameters support addition of dpi-specific information elements within the templates generated by YAF. Super Mediator can extract these information elements into a JSON or text file for import into an archive.

One difficulty in such an import is that the extracted information elements may hold several values (for example, a DNS resource record may list several resolutions for a domain name). This needs to be included in the design of the archive, and in the extraction process during import. Some of the information elements will be located in sub-templates within the IPFIX output of YAF, requiring the import process to navigate the resulting JSON structure. Finally, use of encrypted or tunneled protocols may result in gaps in the information stored in the archive.

Building both SiLK Repository and Database Populations

In searching for activity across a long span of time, SiLK’s rapid turnaround is extremely useful, but it’s lack of DPI and other flow fields limits the precision with which it can characterize behaviors. One possibility is to use SiLK tools to isolate flow records that are potentially of interest, then to use fields in corresponding complete flow records to eliminate extraneous hits among those records. YAF produces complete flow records. Super Mediator can then both convert the records to SiLK format and also produce a JSON form that can be more easily parsed by a database ingestion process. This would allow an analyst to use both SiLK and a database to do the combined queries.

Exporting Flow Statistics and Flow Key Hashes to Databases

Flow records are often quite indicative of the traffic that they summarize, but rarely do they contain enough detailed characteristics to support definitive statements about that traffic. Additional information about the generated flow records supports more such statements. Analyses that profile inter-packet statistics can classify traffic as autonomic, interactive, or data transfer connections. YAF, as it generates flow records, can also collect and generate statistics about the packets that fed into the flow record. These statistics include counts of categorized packets, averages of times and sizes, and standard deviations of similar characteristics. Combined with flow key hashes, these statistics allow the analyst to further categorize hosts and to spot outliers or questionable traffic involving those hosts. Super Mediator can capture these statistics information items, and export them as JSON to support ingest into a database for analysts to use for characterizing hosts and traffic.

Building Separate Repositories for Single-Packet Flows and Conventional Flows

Some network attacks, such as reverse-shell compromises, mimic conventional traffic. Analysts seeking to isolate such attacks need a thorough understanding of the traffic to key network resources. To more completely understand the network traffic, an analyst may need to dig into the traffic on a packet-by-packet basis. But modern networks may have very high traffic volumes and isolating specific interactions of interest may consume a lot of analyst effort. Using flow records to consolidate packets belonging to a given interaction may help the analyst find specific ones more efficiently, but also conceals the inter-packet dynamics. A compromise is to generate non-traditional flow records, where each packet describes a single packet. These flow records can be indexed and sorted using flow key hashes, to group the single-packet flows associated with an interaction, but then the specific characteristics of each packet within the group can be examined and used for analysis. If further analysis is needed, captured packets can be identified to an interaction using YAF, and then conventional packet-viewing tools can be used. To enable this sort of analysis, the network packet traffic is processed by two instances of YAF, each feeding a separate flow repository. One instance generates multi-packet flow records, the other generates single-packet flow records.