PySiLK - Silk in Python

This document describes the features of PySiLK, the SiLK Python extension. It documents the objects and methods that allow one to read, manipulate, and write SiLK Flow records, IPsets, Bags, and Prefix Maps (pmaps) from within Python. PySiLK may be used in a stand-alone Python script or as a plug-in from within the SiLK tools rwfilter(1), rwcut(1), rwgroup(1), rwsort(1), rwstats(1), and rwuniq(1). This document describes the objects and methods that PySiLK provides; the details of using those from within a plug-in are documented in the silkpython(3) manual page.

The SiLK Python extension provides the following functions:

silk.ipv6_enabled()

Return True if SiLK was compiled with IPv6 support, False otherwise.

silk.initial_tcpflags_enabled()

Return True if SiLK was compiled with support for initial TCP flags, False otherwise.

silk.init_country_codes([filename])

Initialize PySiLK's country code database. filename should be the path to a country code prefix map, as created by rwgeoip2ccmap(1). If filename is not supplied, SiLK will look first for the file specified by $SILK_COUNTRY_CODES, and then for a file named country_codes.pmap in $SILK_PATH/share/silk, $SILK_PATH/share, /usr/local/share/silk, and /usr/local/share. (The latter two assume that SiLK was installed in /usr/local.) Will throw a RuntimeError if loading the country code prefix map fails.

silk.silk_version()

Return the version of SiLK linked with PySiLK, as a string.

The SiLK Python extension defines the following objects:

IPAddr

A representation of an IP Address.

IPv4Addr

A representation of an IPv4 Address.

IPv6Addr

A representation of an IPv6 Address.

IPWildcard

A representation of CIDR blocks or SiLK IP wildcard addresses.

IPSet

A representation of a SiLK IPset.

PrefixMap

A representation of a SiLK Prefix Map.

Bag

A representation of a SiLK Bag.

TCPFlags

A representation of TCP flags.

RWRec

A representation of a SiLK Flow record.

SilkFile

A representation of a channel for writing to or reading from SiLK Flow files.

FGlob

An iterable object that allows retrieval of filenames in a SiLK data store.

An IPAddr object represents an IPv4 or IPv6 address. These two types of addresses are represented by two subclasses of IPAddr: IPv4Addr and IPv6Addr.

class silk.IPAddr(address)

The constructor takes a string address, which must be a string representation of either an IPv4 or IPv6 address, or an IPAddr object. IPv6 addresses are only accepted if ipv6_enabled() returns True. The IPAddr object that the constructor returns will be either an IPv4Addr object or an IPv6Addr object.

For backwards compatibility, the IPAddr constructor will also accept an integer address, in which case it converts that integer to an IPv4Addr object. This behavior is deprecated. Use the IPv4Addr and IPv6Addr constructors instead.

Examples:

 >>> addr1 = IPAddr('192.160.1.1')
 >>> addr2 = IPAddr('2001:db8::1428:57ab')
 >>> addr3 = IPAddr('::ffff:12.34.56.78')
 >>> addr4 = IPAddr(addr1)
 >>> addr5 = IPAddr(addr2)
 >>> addr6 = IPAddr(0x10000000) # Deprecated

Supported operations and methods:

Inequality Operations

In all the below inequality operations, whenever an IPv4 address is compared to an IPv6 address, the IPv4 address is converted to an IPv6 address before comparison. This means that IPAddr("0.0.0.0") == IPAddr("::ffff:0.0.0.0").

addr1 == addr2

Return True if addr1 is equal to addr2; False otherwise.

addr1 != addr2

Return False if addr1 is equal to addr2; True otherwise.

addr1 < addr2

Return True if addr1 is less than addr2; False otherwise.

addr1 <= addr2

Return True if addr1 is less than or equal to addr2; False otherwise.

addr1 >= addr2

Return True if addr1 is greater than or equal to addr2; False otherwise.

addr1 > addr2

Return True if addr1 is greater than addr2; False otherwise.

addr.is_ipv6()

Return True if addr is an IPv6 address, False otherwise.

addr.isipv6()

(DEPRECATED) An alias for is_ipv6().

addr.to_ipv6()

Returns the address converted to an IPv6Addr.

addr.to_ipv4()

Returns the address converted to an IPv4Addr. If addr cannot be legally converted to IPv5, this method will return None.

int(addr)

Return the integer representation of addr. For an IPv4 address, this is a 32-bit number. For an IPv6 address, this is a 128-bit number.

str(addr)

Return a human-readable representation of addr in its canonical form.

addr.padded()

Return a human-readable representation of addr which is fully padded with zeroes. With IPv4, it will return a string of the form "xxx.xxx.xxx.xxx". With IPv6, it will return a string of the form "xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx".

addr.mask(mask)

Return a copy of addr masked by the IPAddr mask.

When both addresses are either IPv4 or IPv6, applying the mask is straightforward.

If addr is IPv6 but mask is IPv4, mask is converted to IPv6 and then the mask is applied. This may result in an odd result.

If addr is IPv4 and mask is IPv6, addr will remain an IPv4 address if masking mask with ::ffff:0000:0000 results in ::ffff:0000:0000, (namely, if bytes 10 and 11 of mask are 0xFFFF). Otherwise, addr is converted to an IPv6 address and the mask is performed in IPv6 space, which may result in an odd result.

addr.mask_prefix(prefix)

Return a copy of addr masked by the high prefix bits. All bits below the prefixth bit will be set to zero. The maximum value for prefix is 32 for an IPv4Addr, and 128 for an IPv6Addr.

addr.country_code()

Return the two character country code associated with addr. If no country code is associated with addr, return None. The country code association is initialized by the init_country_codes() function. If init_country_codes() is not called before calling this method, it will act as if init_country_codes() was called with no argument.

An IPv4Addr object represents an IPv4 address. IPv4Addr is a subclass of IPAddr, and supports all operations and methods that IPAddr supports.

class silk.IPv4Addr(address)

The constructor takes a string address, which must be a string representation of IPv4 address, an IPAddr object, or an integer. A string will be parsed as an IPv4 address. An IPv4Addr object will be copied. An IPv6Addr object will be converted to an IPv4 address, or throw a ValueError if the conversion is not possible. A 32-bit integer will be converted to an IPv4 address.

Examples:

 >>> addr1 = IPv4Addr('192.160.1.1')
 >>> addr2 = IPv4Addr(IPAddr('::ffff:12.34.56.78'))
 >>> addr3 = IPv4Addr(addr1)
 >>> addr4 = IPv4Addr(0x10000000)

Supported operations and methods:

addr.octets()

Return a tuple of the octets of addr.

An IPv6Addr object represents an IPv6 address. IPv6Addr is a subclass of IPAddr, and supports all operations and methods that IPAddr supports.

class silk.IPv6Addr(address)

The constructor takes a string address, which must be a string representation of either an IPv6 address, an IPAddr object, or an integer. A string will be parsed as an IPv6 address. An IPv6Addr object will be copied. An IPv4Addr object will be converted to an IPv6 address. A 128-bit integer will be converted to an IPv6 address.

Examples:

 >>> addr1 = IPAddr('2001:db8::1428:57ab')
 >>> addr2 = IPv6Addr(IPAddr('192.160.1.1'))
 >>> addr3 = IPv6Addr(addr1)
 >>> addr4 = IPv6Addr(0x100000000000000000000000)

An IPWildcard object represents a range or block of IP addresses. The IPWildcard object handles iteration over IP addresses with for x in wildcard.

class silk.IPWildcard(wildcard)

The constructor takes a string representation wildcard of the wildcard address. The string wildcard can be an IP address, an IP with a CIDR notation, an integer, an integer with a CIDR designation, or an entry in SiLK wildcard notation. In SiLK wildcard notation, a wildcard is represented as an IP address in canonical form with each octet (IPv4) or hexadectet (IPv6) represented by one of following: a value, a range of values, a comma separated list of values and ranges, or the character 'x' used to represent the entire octet or hexadectet. IPv6 wildcard addresses are only accepted if silk.ipv6_enabled() returns True.

Examples:

 >>> a = IPWildcard('1.2.3.0/24')
 >>> b = IPWildcard('ff80::/16')
 >>> c = IPWildcard('1.2.3.4')
 >>> d = IPWildcard('::FFFF:0102:0304')
 >>> e = IPWildcard('16909056')
 >>> f = IPWildcard('16909056/24')
 >>> g = IPWildcard('1.2.3.x')
 >>> h = IPWildcard('1:2:3:4:5:6:7.x')
 >>> i = IPWildcard('1.2,3.4,5.6,7')
 >>> j = IPWildcard('1.2.3.0-255')
 >>> k = IPWildcard('::2-4')
 >>> l = IPWildcard('1-2:3-4:5-6:7-8:9-a:b-c:d-e:0-ffff')

Supported operations and methods:

addr in wildcard

Return True if addr is in wildcard, False otherwise.

addr not in wildcard

Return False if addr is in wildcard, True otherwise.

string in wildcard

Return the result of IPAddr(string) in wildcard.

string not in wildcard

Return the result of IPAddr(string) not in wildcard.

wildcard.isipv6()

Return True if wildcard contains IPv6 addresses, False otherwise.

str(wildcard)

Return the string that was used to construct wildcard.

An IPSet object represents a set of IPv4 addresses, as produced by rwset(1) and rwsetbuild(1). IPSets do not yet support IPv6. The IPSet object handles iteration over IP addresses with for x in set, and iteration over CIDR blocks using for x in set.cidr_iter().

class silk.IPSet([ip_iterable])

The constructor creates an empty IPset. If an ip_iterable is supplied as an argument, each member of ip_iterable will be added to the IPset. The ip_iterable may be:

  • an IPv4Addr object representing an IPv4 address

  • the string representation of a valid IPv4 address

  • an IPWildcard object containing IPv4 address(es)

  • the string representation of an IPWildcard

  • an iterable of any combination of the above

  • another IPSet object

Other constructors, all class methods:

load(path)

Create an IPSet by reading a SiLK IPset file. path must be a valid location of an IPset.

Supported operations and methods:

In the lists of operations and methods below,

The following operations and methods do not modify the IPSet:

set.cardinality()

Return the cardinality of set.

len(set)

Return the cardinality of set. This method will raise OverflowError if there are too many IPs in the set---when the number of IPs in the set will not fit into Python's Plain Integer type. The cardinality() method will not raise this exception.

addr in set

Return True if addr is a member of set; False otherwise.

addr not in set

Return False if addr is a member of set; True otherwise.

set.copy()

Return a new IPSet with a copy of set.

set <= set2
set.issubset(ip_iterable)

Return True if every IP address in set is also in set2. Return False otherwise.

set >= set2
set.issuperset(ip_iterable)

Return True if every IP address in set2 is also in set. Return False otherwise.

set | set2
set.union(ip_iterable)

Return a new IPset containing the IP addresses in set and set2.

set & set2
set.intersection(ip_iterable)

Return a new IPset containing the IP addresses common to set and set2.

set - set2
set.difference(ip_iterable)

Return a new IPset containing the IP addresses in set but not in set2.

set ^ set2
set.symmetric_difference(ip_iterable)

Return a new IPset containing the IP addresses in either set or in set2 but not in both.

set.cidr_iter()

Return an iterator over the CIDR blocks in set. Each iteration returns a 2-tuple, the first element of which is the first IP address in the block, the second of which is the prefix length of the block. Can be used as for (addr, prefix) in set.cidr_iter().

set.save(filename)

Save the contents of set in the file filename.

The following operations and methods will modify the IPSet:

set.add(addr)

Add addr to set and return set. To add multiple IP addresses, use the update() method.

set.discard(addr)

Remove addr from set if addr is present; do nothing if it is not. Return set. To discard multiple IP addresses, use the difference_update() method.

set.remove(addr)

Similar to discard(), but raises KeyError if addr is not a member of set.

set.clear()

Remove all IP addresses from set and return set.

set |= set2
set.update(ip_iterable)

Add the IP addresses specified in set2 to set; the result is the union of set and set2.

set &= set2
set.intersection_update(ip_iterable)

Remove from set any IP address that does not appear in set2; the result is the intersection of set and set2.

set -= set2
set.difference_update(ip_iterable)

Remove from set any IP address found in set2; the result is the difference of set and set2.

set ^= set2
set.symmetric_difference_update(ip_iterable)

Update set, keeping the IP addresses found in set or in set2 but not in both.

An RWRec object represents a SiLK Flow record.

class silk.RWRec([rec],[field=value],...)

This constructor creates an empty RWRec object. If an RWRec rec is supplied, the constructor will create a copy of it. The variable rec can be a dictionary, such as that supplied by the as_dict() method. Initial values for record fields can be included. Note that setting or accessing certain attributes on an RWRec causes the silk.site.have_site_config() to be invoked; that function will call silk.site.init_site() with no argument if it has not yet been called.

Example:

 >>> recA = RWRec(input=10, output=20)
 >>> recB = RWRec(recA, output=30)
 >>> (recA.input, recA.output)
 (10, 20)
 >>> (recB.input, recB.output)
 (10, 30)

Instance attributes:

rec.application

The service port of the flow rec as set by the flow collector if the collector supports it, an integer. The default application value is 0.

rec.bytes

The count of the number of bytes in the flow rec, an integer. The default bytes value is 0.

rec.classname

(READ ONLY) The class name of assigned to the flow rec, a string. Calls silk.site.have_site_config(). The default classname is ?. The classname cannot be modified by itself. In order to modify the classname, you also need to modify the typename. See the rec.classtype attribute.

rec.classtype

A tuple of the classname and the typename of the flow rec. Calls silk.site.have_site_config().

rec.dip

The destination IP of the flow rec, an IPAddr object. The default dip value is IPAddr('0.0.0.0'). May be set using a string containing a valid IP address.

rec.dport

The destination port of the flow rec, an integer. The default dport value is 0.

rec.duration

The duration of the flow rec, a datetime.timedelta object. The default duration value is 0. Changing the rec.duration attribute will modify the rec.etime attribute such that (rec.etime - rec.stime) == the new rec.duration. See also rec.duration_secs.

rec.duration_secs

The duration of the flow rec in seconds, a float. The default duration_secs value is 0. Changing the rec.duration_secs attribute will modify the rec.etime attribute in the same way as changing rec.duration.

rec.etime

The end time of the flow rec, a datetime.datetime object. The default etime value is the UNIX epoch time, datetime.datetime(1970,1,1,0,0). Changing the rec.etime attribute modifies the flow record's duration. If the new duration is larger than RWRec supports, an OverflowError will be raised. See also rec.etime_epoch_secs.

rec.etime_epoch_secs

The end time of the flow rec as a number of seconds since the epoch time, a float. Epoch time is 1970-01-01 00:00:00. The default etime_epoch_secs value 0. Changing the rec.etime_epoch_secs attribute modifies the flow record's duration. If the new duration is larger than RWRec supports, an OverflowError will be raised.

rec.initflags

The TCP flags on the first packet of the flow rec, a TCPFlags object. The default initflags value is None. The rec.initflags attribute may be set to a new TCPFlags object, or a string or number which can be converted to a TCPFlags object by the TCPFlags() constructor.

rec.icmpcode

The ICMP code of the flow rec (only valid if rec.protocol is 1), an integer. The default icmpcode value is 0.

rec.icmptype

The ICMP type value of the flow rec (only valid if rec.protocol is 1), an integer. The default icmpcode value is 0.

rec.input

The SNMP interface where the flow rec entered the router, an integer. The default input value is 0.

rec.nhip

The next-hop IP of the flow rec as set by the router, an IPAddr object. The default nhip value is IPAddr('0.0.0.0'). May be set using a string containing a valid IP address.

rec.output

The SNMP interface where the flow rec exited the router, an integer. The default output value is 0.

rec.packets

The packet count for the flow rec, an integer. The default packets value is 0.

rec.protocol

The IP protocol of the flow rec, an integer. The default protocol value is 0.

rec.restflags

The union of the flags of all but the first packet in the flow rec, a TCPFlags object. The default restflags value is None. The rec.restflags attribute may be set to a new TCPFlags object, or a string or number which can be converted to a TCPFlags object by the TCPFlags() constructor.

rec.sensor

The name of sensor where the flow rec was collected, a string. Calls silk.site.have_site_config(). The default sensor value is ?.

rec.sip

The source IP of the flow rec, an IPAddr object. The default sip value is IPAddr('0.0.0.0'). May be set using a string containing a valid IP address.

rec.sport

The source port of the flow rec, an integer. The default sport value is 0.

rec.stime

The start time of the flow rec, a datetime.datetime object. The default stime value is the UNIX epoch time, datetime.datetime(1970,1,1,0,0). Modifying the rec.stime attribute will modify the flow's end time such that the rec.duration is constant. See also rec.etime_epoch_secs.

rec.stime_epoch_secs

The start time of the flow rec as a number of seconds since the epoch time, a float. Epoch time is 1970-01-01 00:00:00. The default stime_epoch_secs value 0. Changing the rec.stime_epoch_secs attribute will modify the flow's end time such that the rec.duration is constant.

rec.tcpflags

The union of the TCP flags of all packets in the flow rec, a TCPFlags object. The default tcpflags value is TCPFlags(' '). The rec.tcpflags attribute may be set to a new TCPFlags object, or a string or number which can be converted to a TCPFlags object by the TCPFlags() constructor.

rec.timeout_killed

Whether the flow rec was closed early due to timeout by the collector, a boolean. The default timeout_killed value is None.

rec.timeout_started

Whether the flow rec is a continuation from a timed-out flow, a boolean. The default timeout_started value is None.

rec.typename

(READ ONLY) The type name of the flow rec, a string. Calls silk.site.have_site_config(). The default typename is '255'. The typename cannot be modified by itself. In order to modify the typename, you also need to modify the classname. See the rec.classtype attribute.

Supported operations and methods:

rec.is_web()

Return True if rec can be represented as a web record, False otherwise.

rec.as_dict()

Return a dictionary representing the contents of rec. Calls silk.site.have_site_config().

str(rec)

Return the string representation of rec.as_dict().

rec1 == rec2

Return True if rec1 is structurally equivalent to rec2.

rec1 != rec2

Return True if rec1 is not structurally equivalent to rec2

A SilkFile object represents a channel for writing to or reading from SiLK Flow files. A SiLK file open for reading can be iterated over using for rec in file.

class silk.SilkFile(filename, mode, compression=DEFAULT, notes=[], invocations=[])

The constructor takes a filename, a mode, and a set of optional keyword parameters. The mode should be one of the following constant values:

READ

Open file for reading

WRITE

Open file for writing

APPEND

Open file for appending

The filename should be the path to the file to open. A few filenames are treated specially. The filename stdin maps to the standard input stream when the mode is READ. The filenames stdout and stderr map to the standard output and standard error streams respectively when the mode is WRITE. A filename consisting of a single hyphen (-) maps to the standard input if the mode is READ, and to the standard output if the mode is WRITE.

The compression parameter can be one of the following constants:

DEFAULT

Use the default compression scheme compiled into SiLK.

NO_COMPRESSION

Use no compression.

ZLIB

Use zlib block compression (as used by gzip(1)).

LZO1X

Use lzo1x block compression.

If notes or invocations are set, they should be list of strings. These add annotation and invocation headers to the file. These values are visible by the rwfileinfo(1) program.

Examples:

 >>> myinputfile = SilkFile('/path/to/file', READ)
 >>> myoutputfile = SilkFile('/path/to/file', WRITE,
                             compression=LZO1X,
                             notes=['My output file',
                                    'another annotation'])

Instance attributes:

file.name

The filename that was used to create file.

Instance methods:

file.read()

Return an RWRec representing the next record in the SilkFile file. If there are no records left in the file, return None.

file.write(rec)

Write the RWRec rec to the SilkFile file. Return None.

file.next()

A SilkFile object is its own iterator. For example, iter(file) returns file. When the SilkFile is used as an iterator, the next() method is called repeatedly. This method returns the next record, or raises StopIteration once the end of file is reached

file.notes()

Return the list of annotation headers for the file as a list of strings.

file.invocations()

Return the list of invocation headers for the file as a list of strings.

file.close()

Close the file and return None.

A PrefixMap object represents an immutable mapping from IPv4 addresses or protocol/port pairs to labels. PrefixMap objects are created from SiLK prefix map files as created by rwpmapbuild(1).

class silk.PrefixMap(filename)

The constructor creates a prefix map initialized from the filename. The PrefixMap object will be of one of the two subtypes of PrefixMap: an AddressPrefixMap or a ProtoPortPrefixMap.

Supported operations and methods:

pmap[key]

Return the string label associated with key in pmap. key must be of the correct type: either an IPv4Addr if pmap is an AddressPrefixMap, or a 2-tuple of integers (protocol, port), if pmap is a ProtoPortPrefixMap. The method raises TypeError when the type of the key is incorrect.

pmap.get(key[,default])

Return the string label associated with key in pmap. Return the value default if key is not in pmap, or if key is of the wrong type or value to be a key for pmap. The default value of default is None.

pmap.values()

Return a tuple of the labels defined by the PrefixMap pmap.

pmap.iterranges()

Return an iterator that will iterate over ranges of contiguous values with the same label. The return values of the iterator will be the 3-tuple (startendlabel), where start is the first element of the range, end is the last element of the range, and label is the label for that range.

A Bag object is a representation of a multiset. Each key represents a potential element in the set, and the key's value represents the number of times that key is in the set.

class silk.Bag([mapping][,key_type=IPv4Addr])

The constructor creates a bag of type key_type. The default key_type is IPv4Addr. Object of class key_type must be constructable from an integer, and possess an __int__() method which retrieves that integer from the object.

If mapping is included, the bag is initialized from that mapping. Valid mappings are:

  • a Bag

  • a key/value dictionary

  • an iterable of key/value pairs

Other constructors, all class methods:

Bag.ipaddr(mapping)

Creates a Bag using IPv4Addr as the key_type (IP address bag). Equivalent to Bag(mappingkey_type = IPv4Addr).

Bag.integer(mapping)

Creates a Bag using long as the key_type (integer bag). Equivalent to Bag(mappingkey_type = long).

Bag.load(path[, key_type=IPv4Addr])

Creates a Bag by reading a SiLK bag file. path must be a valid location of a bag. key_type is used as in the Bag constructor. key_type defaults to IPv4Addr.

Bag.load_ipaddr(path)

Creates an IP address bag from a SiLK bag file. Equivalent to Bag.load(pathkey_type = IPv4Addr).

Bag.load_integer(path)

Creates an integer bag from a SiLK bag file. Equivalent to Bag.load(pathkey_type = long).

Constants:

BAG_COUNTER_MAX

This constant contains the maximum possible value for Bag counters.

Supported operations and methods:

In the lists of operations and methods below,

Bags contain the following attribute:

key_type

The class which represents the type of keys in this bag. Objects of this class must be constructable from an integer, and possess an __int__() method which retrieves that integer from the object.

The following operations and methods do not modify the Bag:

bag.copy()

Return a new Bag which is a copy of bag.

bag[key]

Return the number of elements key in bag.

bag[key:key2]

Return a new Bag which contains only the elements in the key range [key, key2).

bag[ipset]

Return a new Bag which contains only elements that are also contained in ipset. This is only valid for IP address bags.

bag[ipwildcard]

Return a new Bag which contains only elements that are also contained in ipwildcard. This is only valid for IP address bags.

key in bag

Return True if bag[key] is non-zero, False otherwise.

bag.get(key[, default=None])

Return bag[key] if key is in bag, otherwise return default. default defaults to None.

bag.items()

Return a list of (keyvalue) pairs for all keys in bag with non-zero values. This list is guaranteed to be sorted in int(key) order.

bag.iteritems()

Return an iterator over (keyvalue) pairs for all keys in bag with non-zero values. This iterator is guaranteed to iterate over items in int(key) order.

bag.keys()

Return a list of keys for all keys in bag with non-zero values. This list is guaranteed to be sorted in int(key) order.

bag.iterkeys()

Return an iterkeys over keys for all keys in bag with non-zero values. This iterator is guaranteed to iterate over keys in int(key) order.

bag.values()

Return a list of values for all keys in bag with non-zero values. This list is guaranteed to be sorted in int(key) order.

bag.itervalues()

Return an iterator over values for all keys in bag with non-zero values. This iterator is guaranteed iterate over values in int(key) order.

bag.group_iterator(bag2)

Return an iterator over keys and values of a pair of Bags. For each key which is in either bag or bag2, this iterator will return a (keyvaluevalue2) triple, where value is bag.get(key), and value2 is bag.get(key). This iterator is guaranteed to iterate over triples in int(key) order.

bag + bag2

Add two bags together. Return a new Bag for which newbag[key] = bag[key] + bag2[key] for all keys in bag and bag2. Will raise an OverflowError if the resulting value for a key is greater than 2^64 - 1.

bag - bag2

Subtract two bags. Return a new Bag for which newbag[key] = bag[key] - bag2[key] for all keys in bag and bag2, as long as the resulting value for that key would be non-negative. If the resulting value for a key would be negative, the value of that key will be zero.

bag.min(bag2)

Return a new Bag for which newbag[key] = min(bag[key], bag2[key]) for all keys in bag and bag2.

bag.max(bag2)

Return a new Bag for which newbag[key] = max(bag[key], bag2[key]) for all keys in bag and bag2.

bag.div(bag2)

Divide two bags. Return a new Bag for which newbag[key] = bag[key] / bag2[key]) rounded to the nearest integer for all keys in bag and bag2, as long as bag2[key] is non-zero. newbag[key] = 0, when bag2[key] is zero.

bag * integer
integer * bag

Multiple a bag by a scalar. Return a new Bag for which newbag[key] = bag[key] * integer for all keys in bag.

bag.intersect(set_like)

Return a new Bag which contains bag[key] for each key where key in set_like is true.

bag.complement_intersect(set_like)

Return a new Bag which contains bag[key] for each key where key in set_like is not true.

bag.ipset()

Return an IPSet consisting of the set of IP address key values from bag with positive values. This only works if bag is an IP address bag.

bag.inversion()

Return a new integer Bag for which all values from bag are inserted as key elements. Hence, if two keys in bag have a value of 5, newbag[5] will be equal to two.

bag == bag2

Return True if the contents of bag are equivalent to the contents of bag2, False otherwise.

bag != bag2

Return False if the contents of bag are equivalent to the contents of bag2, True otherwise.

bag.save(filename)

Save the contents of bag in the file filename.

The following operations and methods will modify the Bag:

bag.clear()

Empty bag, such that bag[key] is zero for all keys.

bag[key] = value

Set the number of key in bag to value.

del bag[key]

Remove key from bag, such that bag[key] is zero.

bag.update(mapping)

For each item in mapping, bag is modified such that for each key in mapping, the value for that key in bag will be set to the mapping's value.

Valid mappings are:

  • a Bag

  • a key/value dictionary

  • an iterable of key/value pairs

bag.add(key[, key2[, ...]])

Add each key to bag. This is the same as incrementing the value for each key by one.

bag.add(iterable)

Add each key in iterable to bag. This is the same as incrementing the value for each key by one.

bag.remove(key[, key2[, ...]])

Remove one of each key from bag. This is the same as decrementing the value for each key by one.

bag.remove(iterable)

Remove one of each key in iterable from bag, essentially decrementing the value for each key by one.

bag.incr(key, value = 1)

Increment the number of key in bag by value. value defaults to one.

bag.decr(key, value = 1)

Decrement the number of key in bag by value. value defaults to one.

bag += bag2

Equivalent to bag = bag + bag2, unless an OverflowError is raised, in which case bag is no longer necessarily valid. When an error is not raised, this operation takes less memory than bag = bag + bag2.

bag -= bag2

Equivalent to bag = bag - bag2. This operation takes less memory than bag = bag - bag2.

bag *= integer

Equivalent to bag = bag * integer, unless an OverflowError is raised, in which case bag is no longer necessarily valid. When an error is not raised, this operation takes less memory than bag = bag * integer.

bag.constrain_values(min = None, max = None)

Remove key from bag if that key's value is less than min, or greater than max. At least one of min or max must be specified.

bag.constrain_keys(min = None, max = None)

Remove key from bag if that key is less than min, or greater than max. At least one of min or max must be specified.

A TCPFlags object represents the eight bits of flags from a TCP session.

class silk.TCPFlags(value)

The constructor takes either a TCPFlags value, a string, or an integer. If a TCPFlags value, it returns a copy of that value. If an integer, the integer should represent the 8-bit representation of the flags. If a string, the string should consist of a concatenation of zero or more of the characters F, S, R, P, A, U, E, and C---upper or lower-case---representing the FIN, SYN, RST, PSH, ACK, URG, ECE, and CWR flags. Spaces in the string are ignored.

Examples:

 >>> a = TCPFlags('SA')
 >>> b = TCPFlags(5)

Instance attributes (read-only):

flags.FIN

True if the FIN flag is set on flags, False otherwise

flags.SYN

True if the SYN flag is set on flags, False otherwise

flags.RST

True if the RST flag is set on flags, False otherwise

flags.PSH

True if the PSH flag is set on flags, False otherwise

flags.ACK

True if the ACK flag is set on flags, False otherwise

flags.URG

True if the URG flag is set on flags, False otherwise

flags.ECE

True if the ECE flag is set on flags, False otherwise

flags.CWR

True if the CWR flag is set on flags, False otherwise

Supported operations and methods:

~flags

Return the bitwise inversion (not) of flags

flags1 & flags2

Return the bitwise intersection (and) of the flags from flags1 and flags2

flags1 | flags2

Return the bitwise union (or) of the flags from flags1 and flags2.

flags1 ^ flags2

Return the bitwise exclusive disjunction (xor) of the flags from flags1 and flags2.

int(flags)

Return the integer value of the flags set in flags.

str(flags)

Return a string representation of the flags set in flags.

flags.padded()

Return a string representation of the flags set in flags. This representation will be padded with spaces such that flags will line up if printed above each other.

flags

When used in a setting that expects a boolean, return True if any flag value is set in flags. Return False otherwise.

flags.matches(flagmask)

Given flagmask, a string of the form high_flags/mask_flags, return True if the flags of flags match high_flags after being masked with mask_flags; False otherwise. Given a flagmask without the slash (/), return True if flags matches high_flags, as if mask_flags contained all flags.

Constants:

The following constants are defined:

FIN

A TCPFlags value with only the FIN flag set

SYN

A TCPFlags value with only the SYN flag set

RST

A TCPFlags value with only the RST flag set

PSH

A TCPFlags value with only the PSH flag set

ACK

A TCPFlags value with only the ACK flag set

URG

A TCPFlags value with only the URG flag set

ECE

A TCPFlags value with only the ECE flag set

CWR

A TCPFlags value with only the CWR flag set

An FGlob object is an iterable object which iterates over filenames from a SiLK data store. It does this internally by calling the rwfglob(1) program. The FGlob object assumes that the rwfglob program is in the PATH, and will raise an exception when used if not.

class silk.FGlob(classname=None, type=None, sensors=None, start_date=None, end_date=None, data_rootdir=None, site_config_file=None)

Although all arguments have defaults, at least one of classname, type, sensors, start_date must be specified. The arguments are:

classname

if given, should be a string representing the class name. If not given, defaults based on the site configuration file, silk.conf(5).

type

if given, can be either a string representing a type name or comma-separated list of type names, or can be a list of strings representing type names. If not given, defaults based on the site configuration file, silk.conf.

sensors

if given, should be either a string representing a comma-separated list of sensor names or IDs, and integer representing a sensor ID, or a list of strings or integers representing sensor names or IDs. If not given, defaults to all sensors.

start_date

if given, should be either a string in the format YYYY/MM/DD[:HH], a date object, a datetime object (which will be used to the precision of one hour), or a time object (which is used for the given hour on the current date). If not given, defaults to start of current day.

end_date

if given, should be either a string in the format YYYY/MM/DD[:HH], a date object, a datetime object (which will be used to the precision of one hour), or a time object (which is used for the given hour on the current date). If not given, defaults to start_date. end_date cannot be used without a start_date.

data_rootdir

if given, should be a string representing the directory in which to find the packed SiLK data files. If not given, defaults to the value in the SILK_DATA_ROOTDIR environment variable or the compiled-in default.

site_config_file

if given, should be a string representing the path of the site configuration file, silk.conf. If not given, defaults to the value in the SILK_CONFIG_FILE environment variable or $SILK_DATA_ROOTDIR/silk.conf.

An FGlob object can be used as a standard iterator. For example:

 for filename in FGlob(classname="all", start_date="2005/09/22"):
     for rec in SilkFile(filename):
         ...

The silk.site module contains functions that load the SiLK site file, and query information from that file.

silk.site.init_site([filename])

Use the given filename as the name of the SiLK site configuration file (see silk.conf(3)). If filename is omitted, the value specified in the environment variable SILK_CONFIG_FILE will be used as the name of the configuration file. If SILK_CONFIG_FILE is not set, the module looks for a file named silk.conf in the following directories: the directory specified in the SILK_DATA_ROOTDIR environment variable; the data root directory that is compiled into SiLK; the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/.

This function should not generally be called explicitly unless one wishes to use a non-default site configuration file.

The init_site() function can only be called once. Subsequent invocations will raise a RuntimeError exception. Some methods and RWRec members require information from the silk.conf file, and when these methods are called or members accessed, the silk.site.have_site_config() method is invoked. That method will call init_site() with no argument if it has not yet been called. The list of functions, methods, and attributes include: silk.site.sensors(), silk.site.classtypes(), silk.site.classes(), rwrec.as_dict(), rwrec.classname, rwrec.typename, rwrec.classtype, and rwrec.sensor.

silk.site.have_site_config()

Return True if the module was able to locate the SiLK configuration file, False otherwise. Implicitly calls init_site() with no argument if it has not yet been called.

silk.site.sensors()

Return a tuple of valid sensor names. Calls have_site_config().

silk.site.classes()

Return a tuple of valid class names. Calls have_site_config().

silk.site.types(cls)

Return a tuple of valid type names for class cls. Calls have_site_config().

Return a tuple of valid class names. Calls have_site_config().

silk.site.classtypes()

Return a tuple of valid (class name, type name) tuples. Calls have_site_config().

silk.site.default_types(cls)

Return a tuple of default types associated with class cls. Calls have_site_config().

silk.site.class_sensors(cls)

Return a tuple of sensors that are in class cls.

site.site.sensor_classes(sensor)

Return a tuple of classes that are associated with sensor.

silk.site.sensor_id(sensor)

Return the numeric sensor ID associated with the string sensor.

silk.site.sensor_from_id(id)

Return the sensor name associated with id.

silk.plugin is a module to support using PySiLK code as a plug-in to the rwfilter(1), rwcut(1), rwgroup(1), rwsort(1), rwstats(1), and rwuniq applications. The module defines the following methods, which are described in the silkpython(3) manual page:

silk.plugin.register_switch(switch_name, handler=handler, [arg=needs_arg], [help=help_string])

Define the command line switch --switch_name that can be used by the PySiLK plug-in.

silk.plugin.register_filter(filter, [finalize=finalize], [initialize=initialize])

Register the callback function filter that can be used by rwfilter to specify whether the flow record passes or fails.

silk.plugin.register_field(field_name, [add_rec_to_bin=add_rec_to_bin,] [bin_compare=bin_compare,] [bin_bytes=bin_bytes,] [bin_merge=bin_merge,] [bin_to_text=bin_to_text,] [column_width=column_width,] [description=description,] [initial_value=initial_value,] [initialize=initialize,] [rec_to_bin=rec_to_bin,] [rec_to_text=rec_to_text])

Define the new key field or aggregate value field named field_name. Key fields can be used in rwcut, rwgroup, rwsort, rwstats, and rwuniq. Aggregate value fields can be used in rwstats and rwuniq. Creating a field requires specifying one or more callback functions---the functions required depend on the application(s) where the field will be used. To simplify field creation for common field types, the remaining functions can be used instead.

silk.plugin.register_int_field(field_name, int_function, min, max, [width])

Create the key field field_name whose value is an unsigned integer.

silk.plugin.register_ipv4_field(field_name, ipv4_function, [width])

Create the key field field_name whose value is an IPv4 address.

silk.plugin.register_ip_field(field_name, ipv4_function, [width])

Create the key field field_name whose value is an IPv4 or IPv6 address.

silk.plugin.register_enum_field(field_name, enum_function, width, [ordering])

Create the key field field_name whose value is a Python object (often a string).

silk.plugin.register_int_sum_aggregator(agg_value_name, int_function, [max_sum], [width])

Create the aggregate value field agg_value_name that maintains a running sum as an unsigned integer.

silk.plugin.register_int_max_aggregator(agg_value_name, int_function, [max_max], [width])

Create the aggregate value field agg_value_name that maintains the maximum unsigned integer value.

silk.plugin.register_int_min_aggregator(agg_value_name, int_function, [max_min], [width])

Create the aggregate value field agg_value_name that maintains the minimum unsigned integer value.

The following is an example using the PySiLK bindings. The code is meant to show some standard PySiLK techniques, but is not otherwise meant to be useful. Explanations for the code can be found in-line in the comments.

 #!/usr/bin/env python
 # Import the PySiLK bindings
 from silk import *
 # Import sys for the command line arguments.
 import sys
 # Main function
 def main():
     if len(sys.argv) != 3:
         print ("Usage: %s infile outset" % sys.argv[0])
         sys.exit(1)
     # Open an silk file for reading
     infile = SilkFile(sys.argv[1], READ)
     # Create an empty IPset
     destset = IPSet()
     # Loop over the records in the file
     for rec in infile:
       # Do comparisons based on rwrec field value
       if (rec.protocol == 6 and rec.sport in [80, 8080] and
           rec.packets > 3 and rec.bytes > 120):
           # Add the dest IP of the record to the IPset
           destset.add(rec.dip)
     # Save the IPset for future use
     try:
         destset.save(sys.argv[2])
     except:
         sys.exit("Unable to write to %s" % sys.argv[2])
     # count the items in the set
     count = 0
     for addr in destset:
         count = count + 1
     print "%d addresses" % count
     # Another way to do the same
     print "%d addresses" % len(destset)
     # Print the ip blocks in the set
     for base_prefix in destset.cidr_iter():
         print "%s/%d" % base_prefix
 # Call the main() function when this program is started
 if __name__ == '__main__':
     main()

The following environment variables affect the tools in the SiLK tool suite.

SILK_CONFIG_FILE

This environment variable contains the location of the site configuration file, silk.conf. This variable will be used by silk.site.init_site() if no argument is passed to that method.

SILK_DATA_ROOTDIR

This variable gives the root of directory tree where the data store of SiLK Flow files is maintained, overriding the location that is compiled into the tools. This variable will be used by the FGlob constructor unless an explicit data_rootdir value is specified. In addition, the silk.site.init_site() may search for the site configuration file, silk.conf, in this directory.

SILK_COUNTRY_CODES

This environment variable gives the location of the country code mapping file that the silk.init_country_codes() function will use when no name is given to that function. The value of this environment variable may be a complete path or a file relative to the SILK_PATH.

SILK_PATH

This environment variable gives the root of the directory tree where the tools are installed. As part of its search for the SiLK site configuration file, the silk.site.init_site() method checks for a file named silk.conf in the directories $SILK_PATH/share/silk and $SILK_PATH/share.

PYTHONPATH

This is the search path that Python uses to find modules and extensions. The SiLK Python extension described in this document may be installed outside Python's installation tree; for example, in SiLK's installation tree. It may be necessary to set or modify the PYTHONPATH environment variable so Python can find the SiLK extension.

PATH

This is the standard search path for executable programs. The FGlob constructor will invoke the rwfglob program; the directory containing rwfglob should be included in the PATH.

silkpython(3), rwfglob(1), rwfileinfo(1), rwfilter(1), rwcut(1), rwpmapbuild(1), rwset(1), rwsetbuild(1), rwsort(1), rwstats(1), rwuniq(1), silk.conf(5), silk(7), python(1), http://docs.python.org/