Overview

Summary

Orcus is a system for analyzing passively-collected DNS information. It includes a capability for analyzing all DNS information that has been seen (the “resource record database”), as well as a faster name-to-address mapping with daily resolution (the “name database”).

Orcus is designed to work with YAF to collect or process data, and it may store information in either a PostgreSQL or an Oracle database.

Resource Record Database

DNS traffic is composed of queries and response in the form of packets. A single request packet contains a single query, but a response packet may contain many individual pieces of information, called “resource records”, in three different “sections”.

For example, when a system requests the address for a name like “google.com”, the “answer section” may contain many different IP addresses associated with that name, in order to load-balance accesses to google.com. In addition, a response for this query will also typically contain information about the name servers for the “google.com” domain in the “authority” section. And finally, it may also contain IP addresses for the name servers in the “additional” section.

Orcus does not group all of the resource records for a single packet together in the database. Instead, it works on a model where each resource record is treated as an independent statement about the world. A record in the RR database says either “host A said that the IP address for name B is C” or “host D was told that the IP address for name B is C”.

The general procedure is that when a DNS packet is handled, if that packet is a query, the query is recorded in the database. If the packet is a response with the NXDOMAIN error status, the query that resulted in the NXDOMAIN is recorded in the database. If the packet is a response which contains resource records, each individual resource record is recorded.

It is these resource records that can be queries using the orquery tool.

There are a large variety of DNS record types. Orcus handles the following:

  • IPv4 Address (A) records.
  • IPv6 Address (AAAA) records.
  • Canonical Name (CNAME) records.
  • Mail Exchange (MX) records.
  • Name Server (NS) records.
  • Pointer (PTR) records.
  • Source of Authority (SOA) records.
  • Service (SRV) records.
  • Text (TXT) records.

Other records are not stored in the database or interpreted at all.

The DNS protocol has a variety of rules about what sorts of records are well-formed and what sorts are not. Orcus attempts to loosen these rules whenever possible to collect malformed but parsable data as often as possible. For example, even though raw DNS names are not allowed to contain certain characters, the protocol does allow names with those characters to be encoded. Orcus errs on the side of completeness and includes records for those names rather than discarding them.

Finally, Orcus was originally designed to work in an environment where it is monitoring communications around a caching proxy server. As a result, it has a concept of “internal” and “external” traffic. Each entry in the resource record database stores a single address. For “internal” traffic, this address is the internal host that was communicating with the proxy server. The source of queries, and the destination of responses. For “external” traffic, this address is the external host that was communicating with the proxy server. This is the destination of queries, and the source of responses. The identity of the proxy in situations with multiple proxies was expected to be encoded as the sensor.

Unfortunately, not all networks follow this sort of configuration, and as a result this “single address” is rather vague in other scenarios. The documentation for net-list and net-list-mode in the orcus.conf man page explains in detail how traffic is divided into internal and external traffic, how this can be configured, and the meaning of the address field in different situations.

We plan to enhance the system to track both source and destination addresses in a future version, which will remove some of this ambiguity.

Name Database

A large part of what we use DNS for involves mapping symbolic names to addresses and vice-versa. Working with the raw resource records to derive this information is slow and clunky, since each individual query and response is recorded at the time it happened. To satisfy the need to quickly query name-address mappings, the Orcus “name database” provides a daily indexed mapping for quick lookups.

The name database loses much of the detail available in the resource record database. It’s no longer possible to determine who was involved in the DNS conversation, or even exactly when the conversation happened. Instead each name-address pair is recorded once for each day during which it occurs.

This information is queries using the orlookup tool.

Managing Database Storage

The storage requirement of the resource record database can grow very large very quickly, as day-to-day operation of a network involves nearly constant DNS traffic. In a major installation, you should coordinate with your database administrator to come up with a storage management plan. In a smaller installation, there are a few more options.

If you are only interested in the information contained in the name database, you may turn off storage of the resource record database entirely using the keep-unique-only option in orcus.conf. This option works on both PostgreSQL and Oracle.

Both the PostgreSQL and Oracle schemas are able to remove resource record data older than a certain age, discarding data partitions from before this period. This may be configured by placing a single row in the orcus.config_options table, like this for PostgreSQL:

insert into orcus.config_options ( option_name, option_value ) values
  ( 'max_part_age', '30 days' );

or this for Oracle:

insert into orcus.config_options ( option_name, option_value ) values
  ( 'max_part_age', '30 00:00:00' );

The value for max_part_age should be given in the syntax allowed for the interval` datatype in PostgreSQL, and the syntax allowed by the to_dsinterval function.

Collecting Data with YAF

To collect data passively using YAF, you can use the following command-line options to YAF:

--applabel --max-payload=4096 --udp-uniflow=53 \
--plugin-name=${plugin_dir}/dpacketplugin.la --plugin-opts=53

This will collect payloads up to 4096 bytes in length for traffic on port 53 (the DNS port), parse the DNS data in those packets in a way that Orcus can process.

If you wish, you may additionally use the super_mediator tool to split this DNS information from other flow information, and even remove duplicate records during processing.