rwtuc is VERY useful for obfuscating data to protect privacy. What could be useful is to translate addresses into an unused domain. There are three different CIDR/8 blocks that are easy to use:
The first two sometimes occur in network traffic (when private traffic is routed), but the last one will not be produced by the protocol stack on any of the common operating systems. It still sometimes occurs as a source address on the Internet, but this is crafted traffic.
There are three different ways to use these addresses. Subnet-preserving substitution translates subnets (either at the /16 or /24 level) into an obfuscated zone, but leaves the host information unchanged to allow structural analysis. Subnet-obfuscating substitution uses an arbitrary but fixed substitution for each host. This allows tracking consistent behavior on the host level, (including matching of incoming and outgoing flows), but makes it difficult to track network structure (including tracking of dynamically-allocated hosts). Host-random substitution uses an arbitrary and varying substitution for each occurrence of a host. This offers the most privacy protection, but it also blocks tracking consistent behavior on either the host or network-structure level.
Even though the data is obfuscated, anonymity cannot be fully guaranteed. If your recipient knows where the data originates, and something about that network (such as the addresses of common servers on that network), they can leverage that information to reduce or eliminate address obfuscation at the subnet-preserving or subnet-obfuscating levels. There are other methods (such as comparing traffic in the released data against traffic the recipients capture on their network) that may reduce the address obfuscation.
For example, this tip will use three different networks as those to be protected, containing a total of 10 hosts:
For subnet-preserving substitution, construct a simple sed script (see the Unix manual on sed(1) for more information). This example assumes the script is called "priv.sed", and contains:
These commands simply substitute the network portion of the address at the /24 level into an obfuscated zone. Now we can use this sed script with rwtuc to change flow information:
This obfuscates both the IP address fields at the subnet level and the sensor field.
For subnet-obfuscating substitution, construct a similar sed script that substitutes IP addresses, rather than just the network portion. This example assumes the script is called "priv2.sed" and contains the host addresses of interest and arbitrarily chosen substitutes:
Again, we can use this sed script with rwtuc to change flow information:
For host-random substitution, sed is not a good solution. A fairly simple python script can implement this substitution. Let's assume that this script is called "hostsub.py" and contains content such as:
We can use this python script to obfuscate addresses:
Similar methods (either fixed substitution or random substitution) can be used to obfuscate ports and protocols if needed. To obfuscate dates, one can preserve interval relationships by mapping the earliest date to a known date (Jan 1, 1970 is popular) and determining further dates by interval since the earliest date, or again use a random substitution. Obfuscation of volume information (number of packets, number of bytes, or duration of flow) is rarely needed, but again either a fixed substitution or random substitution may be applied if required.
The amount of obfuscation applied directly limits the utility of the data in analysis, so use care to minimize the obfuscation.