Requirements
============

    The SiLK Flow Analysis Portal requires the following additional
    software to operate.  Newer versions of many of these packages
    should work just fine.

      * SiLK 1.1+
      * Python 2.4+
      * RAVE 1.9.12+
      * Python Imaging Library (PIL) 1.1.6+ (with PNG and freetype2 support)
      * numpy 1.0.2+
      * matplotlib 0.90.0-0.91.x [1]
      * PostgreSQL 8.2.4+
      * psycopg2 2.0.6+ [2]
      * Apache httpd 2.2+ [3]
      * mod_python 3.3.1+ [3]

    (Later in this document, this list is broken down into which
    additional software is require by what components of the system.)

    Notes:

    [1] matplotlib versions greater than 0.91 are not currently
        supported due to incompatible changes.

    [2] Version 2.0.8 of psycopg2 is not recommended, since it
        contains a bug which may crash the portal. Note that in order
        for psycopg2 to work, you may need to make sure your
        postgresql lib directory is in the dynamic library load path.

    [3] We noticed a problem with the latest versions of Apache
        (2.2.10) and mod_python (3.3.1) as of this writing.  The error
        appears as a compile error in function '_conn_read' with a
        request for member 'next' in something not a structure.  See
 http://www.nabble.com/-Fwd:-connobject.c-broke-with-apr-1.3.2--td18200222.html
        for a description of the problem and a fix.

RPM Notes
---------

    If you want to install the required packages via RPM, we've tested
    the following available for Fedora 9:

        python-2.5.1-26.fc9.i386
        python-imaging-1.1.6-9.fc9.i386
        numpy-1.2.0-1.fc9.i386
        python-matplotlib-0.91.4-1.fc9.i386
        postgresql-8.3.5-1.fc9.i386
        postgresql-server-8.3.5-1.fc9.i386
        python-psycopg2-2.0.7-1.fc9.i386
        httpd-2.2.9-1.fc9.i386
        mod_python-3.3.1-7.i386

    In order to activate PostgreSQL, use the following commands:

        # Activate init.d entry for postgres
        chkconfig postgresql on
        # Initialize a postgres database storage area
        service postgresql initdb
        # Start postgres (without rebooting)
        service postgresql start

    Note that you may need to do some significant postgres
    administration (mucking about with
    /var/lib/pgsql/data/pg_hba.conf) to configure security settings in
    a workable way when using the RPM version.


Quick Installation
==================

    (See below for a variant Quick Installation with RPMs on RH5.)

    $ python install.py install --prefix=${prefix}

    This will create several subdirectories under ${prefix}.
    ${prefix}/htdocs will be served by your web server.
    ${prefix}/analyses will be served by RAVE.

    ${prefix}/cache, ${prefix}/log, and ${prefix}/state will be used
    to contain information generated by RAVE, logs for RAVE, and state
    information for the portal-jobs script.  These directories should
    be writable by the user who will be running raved (by default, the
    user "rave" is assumed, although this can be changed later in
    raved.init.)  It is recommended that you create a user named
    "rave" if you haven't already, and chown these directories to user
    rave.

    Note that if you have multiple versions of Python installed on
    your machine and the default version is older than 2.4, you may
    need to use "python2.4" or "python24" instead of just "python".

    In ${prefix}/etc, you will find a number of template files which
    have had as much information as possible placed into them.  Some
    of these files should be installed in other portions of the
    system.

      * portal.conf.sample should be placed in /etc/portal.conf NOTE:
        There are a number of items in this file that absolutely
        should be changed for your site.  Specifically, site_name must
        be changed.  The sensors and sensor_groups entries should be
        changed to match your locally installed sensor names.  You
        must also add one or more user names to the list of admins.
        By default the user "admin" is an admin.  You should
        definitely change the "portdb" database entry to reflect your
        local PostgreSQL database configuration.

      * portal-httpd.conf should be loaded into your Apache HTTP
        server.  On some systems, this may be done by placing the file
        into an /etc/httpd/conf.d directory.  On other systems, you
        might choose to add an Include directive to include this file
        from the portal install directory.  You should also add your
        own user authentication configuration to this file.

      * raved.init is an init script that should be run at startup
        time in whatever way is appropriate for your system.
        start-raved can also be used to run it by hand instead of as a
        service.  This script assumes you will be running raved as the
        user named "rave".  You will want to edit the script if you
        plan to run as a different user.  Make sure that whatever user
        the process runs as has 

      * portal-jobs is a Python script that should be run
	periodically by an entry in your crontab.  We recommend
	setting this to run every five minutes or so.  The script
	will run any periodic tasks required by installed portal
	modules, at whatever frequency those portal modules require.
	(So your machine won't in fact be doing a lot of work every
	five minutes.)

        NOTE: If you run with the standard modules installed, make
        sure to install a country_codes.pmap file for SiLK country
        code support.  Also be sure to install IP set files as
        described in modules/watchlists/etc/README.

    All of the other configuration files in this directory are read
    directly by the portal system and do not need to be installed
    elsewhere.

    Before running any analyses, you should load schemas into your
    database.  In order to configure your database, make sure that
    postgresql is running and that you have created a database and
    user for the portal to connect to that database.  We recommend
    using a PostgreSQL database named "portal" and a user named
    "rave".  Run the following command to install or upgrade any
    schemas:

    $ python etc/schema-tool create all

    This command assumes that you're running on a local database named
    portal as the user rave.  If you need more information to help you
    configure PostgreSQL, please refer to the PostgreSQL
    documentation.  (Specifically, you may need to install the plpgsql
    language on the portal database.)

    If you receive an error about 'language "plpgsql"' does not exist,
    you can use the command "createlang plpgsql portal" to add that
    language to the portal database, then re-run schema-tool.

    Also make sure to take a look at the PERFORMANCE_TUNING document
    for the port_database module for details on certain system
    parameters that may need to be tuned for proper operation of the
    system.

    Access control is now handled through the existence of files in
    the auth subdirectory of the portal.  By default, no files are
    created, and this means that only the user "admin" or any other
    users listed in the portal.conf file will be able to access any
    data (and they will be able to access all data.)  See below for
    details on files in the auth directory.

    Make sure to activate some authentication mechanism in Apache, for
    example basic user authentication.  Alternatively, you can turn on
    the "static_authen" module (see the portal-httpd.conf file's
    comments) to allow all users access as "anyone".  (The user
    "anyone" will still need to be given access to individual modules
    and sensors via files in the auth directory.)

    Make sure to restart your Apache HTTP server and start RAVE, and
    the system should be operational.


Advanced Installation
=====================

    The install.py script in this directory handles installation of
    the system.  The following flags and optional arguments are
    allowed by install.py:

    You can give this script a --prefix argument to specify where the
    portal should be installed.  You may optionally give it an
    --htdocs-prefix argument to specify a different location from the
    default for the HTML document directory to be installed.  You may
    also use --root to specify an overall replacement root directory,
    for certain specialized installations.

    The --debug switch will display more verbose information while
    processing.

    By using --force-install, you may tell the system to ignore the
    results of checking for required software and install anyway.

    In addition to the configuration parameters listed above, you must
    also give one or more commands to install.py.  The following
    commands are implemented:

      * "check" runs automated tests to determine whether software
        that the portal requires is installed and behaving correctly.
        Check will run automatically if you ask for an install.

      * "check-analysis" and "check-web" run checks for the portions
        of the software that will be used on analysis or web host
        machines.  See details below regarding "split installation".

      * "install" installs the software underneath the chosen
        directory prefix.  If --htdocs-prefix is given, HTML documents
        will be installed at a different location (perhaps under the
        document root of your web server.)

      * "install-analysis" and "install-web" separately install the
        components required for an analysis or web host machine,
        respectively.  See details below regarding "split
        installation".

      * "link" sets up symlink trees all of the installation areas in
        the source area, which is an appropriate style of installation
        for development purposes.  See doc/developing.txt for more
        details.

      * "clean" removes the symlink trees produced by "link".

    If you make changes to the installation, you should do it *only*
    under the ${prefix}/modules/ subdirectory, specifically in modules
    you create yourself.  If you make changes in ${prefix}/analyses/
    or other install locations, then the next time the "install"
    command is run (for an upgrade, for example), those changes will
    be destroyed.

    If you make changes to the configuration files under
    ${prefix}/etc/, however, these changes are guaranteed not to be
    overwritten by a later installation of the portal.  As a result,
    you should be sure to read through the upgrade notes in any future
    version of the portal software in order to be sure you add
    appropriate configuration information for new features.


Authentication
==============

    By default, the NetSA portal does not handle authentication
    itself, but leaves this to another agency (such as Apache's basic
    authentication support.)  The provided sample httpd configuration
    files assume that you will add this authentication yourself later.
    In fact, the system will not work if you do not add some form of
    authentication.

    If you want to use HTTP basic auth, you will need to add the
    following lines to the "/portal/" and "/rave/" sections of your
    Apache httpd configuration:

        AuthType Basic
        AuthName "NetSA Portal"
        AuthUserFile /path/to/httpd/password/file
        require valid-user

    You can then configure the access that individual users should
    have by using the authorization configuration files described
    below.

    If you wish to use the system without authorization (for example,
    if you only want to allow access to certain IP addresses, but
    don't wish to give access to individual users), you can use the
    provided portal.static_authen mod_python module.  This will cause
    all users to be treated as the user "anyone".  The following
    changes to httpd configuration are required in the "/portal/" and
    "/rave/" sections:

        Replace:
            PythonAccessHandler portal.access
        With:
            PythonAccessHandler portal.static_authen portal.access

    Note that you *do not* need any AuthType or require directives if
    you use this feature.

    Also note that all administration of the authorization will have
    to be done via editing the authorization config files if you
    choose this route, since no web user will be an admin.


Authorization
=============

    The files under ${prefix}/auth contain configuration for which
    users are authorized to view what information.  (Authentication is
    handled by Apache.)  The following files may be created (all of
    which allow comments on lines beginning with '#'):

users.txt
---------

    Example:

        user1:
        user2: group1, group2
        user3: group2

    Each line in this file is of the form "user: group, group, ...",
    and means that the listed user belongs to the listed groups.  Any
    user that is included in this file is considered a member of the
    "anyone" group.  A user may be listed with no groups, and is then
    only a member of "anyone".

    Special note: Users may not be added to the "admin" group via this
    file.  The only way for a user to become an admin is for that user
    to be listed in portal.conf.

groups.txt
----------

    Example:

        group1: group2
        group2: group3, group4

    Each line in this file is of the form "subgroup: super1, super2,
    ...", and means that the given group is a subgroup of the super-
    groups listed.  That is, if "A: B, C" is listed, then all members
    of group A are also members of groups B and C.

    Special note: Users may not be added to the "admin" group via this
    file.  The only way for a user to become an admin is for that user
    to be listed in portal.conf.

sensors.txt
-----------

    Example:

        S0: u1, g1, g2, ...
        S1: u2, u3, g3, ...
        default: u7

    Each line in this file is of the form "sensor: user, group, group,
    ...", and means that the given sensor is accessible to the given
    users and groups.  If a sensor is not listed in this file but is
    in portal.conf's sensors variable, only admins may view it.  If a
    sensor is listed here but not in the sensors variable, it will not
    appear for anyone.

    If an entry exists for a sensor named "default", all sensors that
    are listed in portal.conf but not listed individually in this file
    will use this default entry.

    Note that there is a special sensor "*SUM*".  Certain analyses
    will produce summary information across all active sensors for
    users who have access to this special meta-sensor.

    Special note: Admins always have access to all sensors.

modules.txt
-----------

    Example:

        mod1: u1, g1
        mod2: g2
        mod3: anyone
        mod4: g4, u4

    Each line in this file has the form "module: user, group, group, ...",
    and means that the given module is accessible to the given users and
    groups.

    If an entry for a module named "default" exists, it defines the default
    permissions for modules which are installed and enabled, but which
    have no permissions listed in this file.

    If permissions are granted to a module that does not exist, or a
    module that has been disabled using the "disabled_modules"
    variable in portal.conf, the non-existing or disabled module will
    not be accessible.

    Special note: Admins always have access to all existing
    non-disabled modules.


Split Installation
==================

    In order to maintain a greater separation between potentially
    sensitive flow data and web users, some people prefer to run the
    analysis software (which requires direct access to flow data) and
    the web server software (which does not) on separate machines.

    In order to install in this sort of environment, you should use
    the install-analysis command on the analysis host, and the
    install-web command on the web server host.  Make sure that you
    keep /etc/portal.conf the same across both machines, and that RAVE
    is installed on both machines.

    There are two separate portal-httpd.conf files for the split
    installation.  portal-httpd-split-web.conf contains the
    configuration needed for the web server host, while
    portal-httpd-split-analysis.conf contains the configuration for
    the analysis host.

    Also make sure that the portal-httpd.conf file on the web host is
    configured to look for the RAVE service in the correct location.
    (The portal.proxy.rave-service URL should point at the analysis
    host, and the appropriate port on that host should allow access
    from the web host.)

    You may also wish to change the URL_BASE in raved.init and
    start-raved to use https instead of http in a split configuration,
    so that this information does not transit wires in the clear.


Individual Feature Requirements
===============================

    Core system (including admin module):
        Apache httpd 2.3.4+
        mod_python 3.3.1+
        Python 2.4+
        RAVE 1.9.12+

    ip_info module:
        PostgreSQL 8.2.4+
        psycopg2 2.0.6+
        SiLK 1.1+

    network_map module:
        Python Imaging Library (PIL) 1.1.6+
        SiLK 1.1+

    port_database module:
        matplotlib 0.90.0-0.91.x
        numpy 1.0.2+
        PostgreSQL 8.2.4+
        psycopg2 2.0.6+
        SiLK 1.1+ (with country_codes.pmap installed)

    top_ports module:
        PostgreSQL 8.2.4+
        psycopg2 2.0.6+
        SiLK 1.1+ (with country_codes.pmap installed)

    watchlists module:
        matplotlib 0.90.0-0.91.x
        numpy 1.0.2+
        Python Imaging Library (PIL) 1.1.6+
        SiLK 1.1+
        
