CERT
Software Assurance Secure Systems Organizational Security Coordinating Response Training
Skip to end of metadata
Go to start of metadata

Introduction

Network flow information is inherently information about relationships and connections between internet protocol addresses. As such it may be useful to model flows as directed graphs. In addition to giving analysts a method for visualizing how addresses relate, it enables them to apply the methods of graph theory to network flow analysis.

Neo4j is one of the open-source graph databases currently available. It implements the Cypher query language, allowing robust graph analysis of the data in the database. It also provides a simple visualization interface as a web browser. Several more robust visualization tools can interface with Neo4j as the backend database (see the Neo4j website for more information).

Neo4j provides a Java API (application programming interface), which was used to build a method for importing network flow information into Neo4j. It consists of two applications: one that initializes a Neo4j database, creating indices and constraints and one that takes output from the SiLK rwcut command and creates entities and relationships in the database.

The structure of network flow graphs uses IP addresses as entities (nodes) and has_<proto>_flow designations as relationships (vertices), where <proto> is based on the flow protocol fields and has one of these values:

  • AH
  • EGP
  • EIGRP
  • ENCAP
  • ESP
  • GGP
  • GRE
  • ICMP
  • ICMPv6
  • IGMP
  • IGP
  • IPv4
  • IPv6
  • IPv6-Frag
  • IPv6-Opts
  • IPv6-Route
  • IRTP
  • NoNxt
  • RDP
  • RSVP
  • SCTP
  • SWIPE
  • TCP
  • UDP

 

Each IP address entity has, at a minimum, an address property. If a source or destination country code is included in the input file for the IP address, the address will also have a country property. Each relationship will have, at a minimum, sport, dport, stime, and etime properties. One or more of these properties may also be included in the input file:

  • bytes
  • dur
  • flags
  • icmptypecode
  • packets
  • sensor
  • type

 

The input file must have this following format

    Column1_Title| Column2_Title|Column3_Title|…|ColumnN_Title

    Column1_Value| Column2_ Value |Column3_ Value |…|ColumnN_ Value

where columns are pipe delimited and, at a minimum, include

  • dIP
  • dPort
  • eTime
  • proto
  • sIP
  • sPort
  • sTime

Column titles must be the first row in the file, and processed columns must have titles that exist in this list:

  • bytes
  • dcc
  • dIP
  • dPort
  • dur
  • eTime
  • flags
  • icmptypecode fields:
    • iType
    • iCode
  • packets
  • proto
  • sensor
  • scc
  • sIP
  • sPort
  • sTime
  • type

To easily get this format, use the rwcut command with the desired fields and the --no-col option.

To include all the fields, use this command:

    rwcut --fields=1-12,18,19,21,25,26 --no-col > outputFileDestinedForNeo4j.txt

To include sIP, dIP, sPort, dPort, proto, sTime, and eTime (minimum fields only), use this command:

    rwcut --fields=1-5,9,11 --no-col > outputFileDestinedForNeo4j.txt

To include sIP, dIP, sPort, dPort, proto, sTime, eTime, scc, and dcc, use this command:

    rwcut --fields=1-5,9,11,18,19 --no-col > outputFileDestinedForNeo4j.txt

Prerequisites

Using the Java import files has these prerequisites:

  • a working install of Neo4j 2.1.2 or newer
  • Java SDK 7 (Neo4j 2.1.2 does not support Java SDK 8.)
  • a copy of the neo4j_InitializeSchema_packaged.jar file
  • a copy of the neo4j_ImportFile_packaged.jar file
  • the ability to run command-line statements (e.g., Windows cmd.exe or Mac Terminal)
  • the path to the Neo4j database (If you don’t know this path, try running the database, and note the path presented at startup.)

Initialize the Neo4j Database Schema

To prevent the duplication of nodes within the database, you must create unique type constraints. You can do this manually or by running the neo4j_InitializeSchema_packaged.jar file as described below. You only need to do it once per database, unless the database is deleted and recreated. If an error occurs that states the index already exists, this process has already been completed, and the database is ready to import SiLK data using the neo4j_ImportFile_packaged.jar file.

  1. Note the path to the Neo4j database.
  2. Make sure there is at least one file in the Neo4j database directory. (If the directory is empty, add a blank text file.)
  3. Stop any running instance of the Neo4j database or any other application that is using the database. Otherwise an error will occur.
  4. Open a command-line program (e.g., Windows cmd.exe or Mac Terminal).
  5. Change the current directory in the command-line program to the location of the neo4j_InitializeSchema_packaged.jar file (using this command: cd <path to jar file>).
  6. Type java -jar neo4j_InitializeSchema_packaged.jar in the command-line program and press ENTER.
  7. In the file dialog box, navigate to the Neo4j database directory (using the path to the Neo4j database).
  8. Select any file that exists in the directory, such as neo4j.properties or the blank text file created in Step 2.
    Note: This is just to get the correct path for the database; it does not matter which file is selected.
  9. Click OKAY.
    The database schema has been initialized and is ready to import data. The database directory will contain at least an index folder, a schema folder, and a neo4j.properties file.

Import SiLK Data

Getting SiLK data into Neo4j involves two steps: (1) obtain and format the data and (2) import the formatted data using the neo4j_ImportFile_packaged.jar file.

To obtain and format the SiLK data of interest, do the following:

  1. Create an rwfilter command to extract the flow records that will eventually be imported to Neo4j. Limit the number of records as much as possible. Visualizations in the Neo4j browser can only handle 1,000 objects at a time, and even with other visualization tools, large numbers of nodes and relationships are distracting. Also, the import process will become prohibitively long when more than a few thousand records are being imported.
  2. Pipe the output of the rwfilter command(s) to an rwcut command, and be sure to include titles and remove the column padding with the --no-col option.

To import the SiLK data from the file created above, do the following:

  1. Note the path to the Neo4j database.
  2. Make sure there are files in the Neo4j database directory. (If the directory does not contain at least an index folder, a schema folder, and a neo4j.properties file, you must initialize the database using the steps described above.)
  3. Stop any running instance of the Neo4j database or any other application that is using the database. Otherwise an error will occur.
  4. Open a command-line program (e.g., Windows cmd.exe or Mac Terminal).
  5. Change the current directory in the command-line program to the location of the neo4j_ImportFile_packaged.jar file (using this command: cd <path to jar file>).
  6. Type java -jar neo4j_ImportFile_packaged.jar in the command-line program and press ENTER.
  7. In the file dialog box, navigate to the Neo4j database directory (using the path to the Neo4j database).
  8. Select any file that exists in the directory, such as neo4j.properties.
    Note: This is just to get the correct path for the database; it does not matter which file is selected.
  9. In the second file dialog box, navigate to the directory location of the file you want to import.
  10. Select the file to import.
    It might be several minutes before you see an “All Done!” message, indicating that the file has been imported.
  11. Click OKAY.
    The data from the file has been used to create nodes and relationships and can now be accessed through the Neo4j database interface or a visualization tool.
  • No labels