Child pages
  • Using rwuniq for Top-10 Lists
Skip to end of metadata
Go to start of metadata


As of SiLK 2.0, rwstats is greatly improved, and you can use an arbitrary key when running rwstats in SiLK 2.0 and later.

rwstats is useful to get top-N lists for the things rwstats supports, but sometimes you want to get a top-N list for something else, such as top-20 sIP-sPort combinations or top-10 destination countries based on bytes.

To get the result, you use rwuniq to bin the data, the UNIX sort command to sort it, and finally the UNIX head command to limit the output.

Looking at each part of the answer in turn:

1. First you use rwuniq to bin the data:

    rwuniq --fields=<FIELDS-YOU-WANT>

That command will count flow records. To do a top-N list based on bytes or packets, specify the --bytes or --packets switches on the command line.

You should also give rwuniq the --no-titles switch since the titles will be lost anyway as you process the data further.

For our two examples, the rwuniq commands are:

    rwuniq --fields=sip,sport --no-titles
    rwuniq --fields=dcc --bytes --no-titles

2. Next, you must pipe the data into the UNIX sort command:

    sort -nr -t '|' -k XXX

The -k XXX switch tells sort which field (column) to use as the key. You typically specify a key one greater than the number of fields you specify to rwuniq, since that will be the column that contains the volume (byte, packet, or flow count) that you want to base the top-N list on.

In addition, you must tell sort to use numerical sorting (the -n switch). For a top-N list, you want the values sorted in reverse order (-r). The switch -t '|' tells sort that the pipe character ('|') is the field separator.

For the two examples, our commands are now:

   rwuniq --fields=sip,sport --no-titles \
       | sort -nr -t '|' -k 3
   rwuniq --fields=dcc --bytes --no-titles \
       | sort -nr -t '|' -k 2

3. Finally, use the UNIX head command to limit the number of rows.

You simply tell head the number of rows to return; for our examples, those values are 20 and 10.

The final commands for our examples are:

    rwuniq --fields=sip,sport --no-titles | \
        sort -nr -t '|' -k 3 | \
            head -20
    rwuniq --fields=dcc --bytes --no-titles | \
        sort -nr -t '|' -k 2 | \
            head -10

Another advantage of using rwuniq is that your output is not limited to the volume column you want to use as a key. You can have rwuniq output all volume columns, and adjust the value you pass to 'sort -k' to sort on a particular volume.

To get the top-5 sport/protocol combinations, based on packets, but with all volumes, use the following. Note that you must use '-k 4': Column one is sport, two is proto, three is bytes, four is packets, and five is flows.

    rwuniq --fields=sport,proto --no-titles --flows --bytes --packets | \
        sort -nr -t '|' -k 4 | \
            head -15

Some example results from that command:

       80|  6|          1247454303|   1414596|     90414|
      443|  6|           407921349|    522527|     18041|
       53| 17|            56057288|    472938|    161057|
    49601|  6|            79442872|    358944|        66|
       25|  6|            15178863|    220994|     20533|

Of course, with the rwuniq approach you do lose the percentage and cumulative values you get from rwstats. To get those, you will need to write some code in your favorite scripting language.

  • No labels