S3 Directory Structure

The Main UNX-OBP Bucket:

scripts/regenerate_protocol_baseline.py

<protocol>/baseline_traffic/ contains:

<protocol>__<iso-date>__<uuid4> e.g., smb__2022-03-08__1a584418-ca79-4147-9fc2-8cd2abcb6454

where filename elements are separated by two underscores

stime|src-org|sensor-id|sip|dip|sport|dport|protocol|applabel|packets|bytes|duration|asn_info

where asn_info → netblk|asn|cc|rir|org

Queue Message Structures

Incoming SQS Queue

The incoming queue receives messages from NiFi. Each message is a JSON object and certain fields are expected. The field names generally follow the conventions of rwcut --fields options, with a few variations:

An additional required field called, unx_obp_proto, must be present and is added to each record by NiFi. The value of this field is a lowercase string of the protocol which the flow should represent. For example, "unx_obp_proto": "smb".

The minimum required fields in any incoming message are given in the following example JSON object:

{
  "unx_obp_proto": "smb",
  "sTime": "2022-04-11T01:19:56.621",
  "sensor": 9050,
  "sIP": "90.5.0.203",
  "dIP": "20.36.163.72",
  "sPort": 47547,
  "dPort": 445,
  "protocol": 6,
  "application": 139,
  "bytes": 67270,
  "packets": 111,
  "duration": 331.688
}

Finally, the format of "sTime" must be in standard ISO format. This can be configured in the properties of the relevant JsonRecordSetWriter service in NiFi. Specifically, you'll need to set the "Timestamp Format" property to "YYYY-MM-dd'T'HH:mm:ss.SSS" (without the double quotes).

Outgoing SQS Queue

The outgoing queue messages are read/received by NiFi. Each message is a JSON object and certain fields are expected. Required fields include:

stime|src-org|sensor-id|sip|dip|sport|dport|protocol|applabel|packets|bytes|duration|asn_info

where asn_info → netblk|asn|cc|rir|org

Infrastructure Dependencies

Depending on your specific environment and requirements, you will likely need to modify the provided Terraform to suit your needs and/or perform a manual deployment of some UNX-OBP resources.

Networking

All components were deployed and tested in AWS within a single basic VPC with various VPC endpoints.

As each cloud network environment is different, exactly how you enable access for and/or between the components listed below is up to you and is your responsibility to align to your environment's networking practices.

IAM

The permissions (API Actions) required for each component to interact with other resources are listed below. Terraform will create the necessary IAM roles/policies for you, but it is ultimately your responsibility to review/adjust/align them to your environment's IAM practices (e.g., nomenclature, level of granularity, etc.).

OpenSearch / Elasticsearch

During testing, an internal VPC-attached AWS-managed Elasticsearch service was used, with no domain access policy. Thus, there are no Elasticsearch permissions listed below - only VPC access and security groups were used to manage access from other cloud resources.

You likely have an existing Elasticsearch cluster somewhere. Your Elasticsearch deployment and access policies will be different, and the relevant components of the analytic will need to be adjusted for that. This might consist of adjusting IAM policies associated with the IAM role(s) associated with certain Lambda functions, adjusting the authentication values for the REST API requests in the function code, configuring a proxy or use of a certain Certificate Authority in the function code, and/or removing Elasticsearch Terraform resources and explicitly setting the Terraform es_domain variable and ES_DOMAIN environment variable to your existing Elasticsearch domain endpoint.

NiFi

You likely have an existing NiFi cluster. Review the Component Table below to establish the necessary IAM policy for your cluster and other UNX-OBP resources.

DynamoDB

All DynamoDB tables setup/tested using PAY_PER_REQUEST billing mode (On-demand capacity mode), so there is no read or write throttling. It is NOT recommended to change this for two reasons:

  1. If you do not set provisioned capacity correctly, it will likely cause the state machine executions to slow or stop completely due to throttling, causing executions to fail. Also, it will greatly slow down the nightly baseline regeneration process.
  2. For this use case, the overall number of read/write requests are generally predictable and are accounted for in overall cost estimates. You generally do not need to worry about significant swings in read/write requests, and therefore sudden increases in DynamoDB costs, because the number of requests are largely based on the amount of per-protocol outbound traffic, which tends not to experience significant swings.

Lambda

All Lambda functions are written in Python and use the Python 3.9 runtime.

In addition to any required permissions listed below, all Lambda function IAM roles should include standard logging permissions:

    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "*"
    }

For VPC-attached Lambda functions, the following standard permissions should be included as well:

    {
      "Effect": "Allow",
      "Action": [
        "ec2:CreateNetworkInterface",
        "ec2:DeleteNetworkInterface",
        "ec2:DescribeNetworkInterfaces"
      ],
      "Resource": "*"
    }

Secrets Manager

This service holds the user IDs ("uid") and secret keys ("skey") used for the Censys.io and RiskIQ PassiveTotal API lookups. You will deploy these manually in any case.

Initial Loads

1 - Deploy Infrastructure

Deploy the infrastructure/services using the provided Terraform.

2 - Prepare the Elasticsearch Cluster and Kibana

This step consists of uploading an index template, importing Kibana Saved Objects, loading initial documents to certain indices.

Load the UNX-OBP Index Template

The index template essentially controls the field data type mappings and number of primary and replica shards created for each matching index prefix. This index template covers all indices used by the capability. Those include the following:

Due to the relatively low volume of documents that should be in any one index at any given time, there only needs to be a single primary shard for each index. Nonetheless, you may wish to adjust the number of replica shards according to the number of data nodes in your Elasticsearch cluster and/or to improve search performance. This index template currently uses the Elasticsearch defaults:

You can adjust these at the bottom of the file, if desired.

Use the Elasticsearch Domain Endpoint (ES_DOMAIN) from the Terraform output...

curl -XPUT ${ES_DOMAIN}/_template/template_unx-obp \
  --data @unx-obp-ecs-catchall-template.json \
  -H 'Content-Type: application/json'

Import Saved Kibana Objects

The provided file, unx-obp-kibana-saved-objects-export-<date>-<revision>.ndjson, contains exported Kibana Saved Objects including index patterns, saved searches, visualizations, and dashboards.

Import this file in the Kibana interface via Stack Management > Kibana Saved Objects > Import.

Perform Initial Load of ASN Info

The provided file, asn-info-initial-load.json, contains some long-lived ASN Info documents, the loading of which will also create/prime the unx-obp-asn-info index.

yesterday=$(date --date="1 day ago" +"%Y-%m-%d")
sed -i "s/YYYY-MM-DD/$yesterday/g" asn-info-initial-load.json
curl -XPOST ${ES_DOMAIN}/_bulk \
  --data-binary @asn-info-initial-load.json \
  -H 'Content-Type: application/json'

Confirm there are documents loaded by using the Discover tab in Kibana and looking at the unx-obp-asn-info index for the last 24 hours. Change the time-picker to the last 7 days if nothing shows up on first attempt.

Perform Initial Load of CSP Info

Assuming that the unx-obp-update-csp-info-lambda-function-scheduled was setup correctly in Step 1, you can simply manually execute this function via the AWS Management Console using an empty test event.

Any fatal errors should be evident from within the same interface. Once complete, confirm there are documents loaded by using the Discover tab in Kibana and looking at the unx-obp-csp-info index for the last 24 hours.

3 - Load Org Info to DynamoDB

This step should be performed any time there is an update to the silk.conf.

The major prerequisite for this step is to have available the latest silk.conf with specially annotated sensor-descriptions.

Each sensor-description must be annotated with the following fields:

The following format is expected:

sensor 101 ABC1 "org_name:ABC,org_parent:DAFT"
sensor 102 ABC2 "org_name:ABC,org_parent:DAFT"
sensor 199 XYZ1 "org_name:XYZ,org_parent:none"
...

With the latest annotated silk.conf available, use the provided Python script, silk-site2star.py, to parse the file and generate a file of JSON objects compatible for upload to DynamoDB.

python3 silk-site2star.py \
  --silk-conf=latest_annotated_silk.conf dynamodb > ddb-org-info-items.json

Now use the provided Python script, load_org_info.py, to load the previously generated items into DynamoDB. You will need to run this from a machine/shell that has the appropriate AWS credentials available with the appropriate permissions and the AWS SDK for Python, boto3, installed (pip3 install boto3).

See Terraform outputs for table_prefix and aws_region values.

python3 load_org_info.py \
  -r ddb-org-info-items.json -t <table_prefix> --region <aws_region>

Confirm items loaded by exploring the items in the unx-obp-Org-Info table via the DynamoDB page of the AWS Management Console.

4 - Load Allow List and Explicit Deny List Entries to DynamoDB

This step consists of running the provided Python script, load_list.py, with the provided allowlist.json and xdenylist.json files to convert entries to unique DynamoDB items for fast lookups. You will need to run this from a machine/shell that has the appropriate AWS credentials available with the appropriate permissions and the AWS SDK for Python, boto3, installed (pip3 install boto3).

See Terraform outputs for table_prefix and aws_region values.

python3 load_list.py -a -r allowlist.json -t <table_prefix> --region <aws_region>
python3 load_list.py -x -r xdenylist.json -t <table_prefix> --region <aws_region>

Confirm items loaded by exploring the items in each table via the DynamoDB page of the AWS Management Console.

5 - Prepare NiFi Cluster

Download the latest NetSA NiFi NAR file and place it into the NiFi lib directory (e.g., /opt/nifi/lib/) - this may require a restart of the NiFi service.

Follow the NiFi instructions in Get Started.