Installing GUS from Scratch

September 15, 2003

This GUS documentation section details the steps in installing and initializing GUS on a newly created Oracle database. The database may be empty. It is important to keep in mind that the GUS distribution and its contents are under development -- occasionally a bug will slip through testing but not slip through your installation undetected.

The instructions outlined here are general. Experiences to date with these instructions have found no system related issues. A number of the steps outlined here and additional details appear at the GUS Platform Web Site for setup notes and schema setup.

1.0   Creating the Oracle Database

The GUS database may be created in Oracle using the dbca utitlity. Select the data warehouse option. For other parameters the defaults will work, but some parameters can clearly be changed for the better, such as percent memory utilization. A report by Jessica Kissinger discusses database configuration.

1.1   Creating Required GUS Users in Oracle

Assume now that a database has been created in Oracle, it is empty without tables and users, and has the name gus. Two GUS accounts must be created before going further in the installation. One account requires read/write access to the database, and the other has only read access. SQL to establish the accounts gusrw and gusdevreadonly is
        a. CREATE USER GUSRW IDENTIFIED BY password DEFAULT TABLESPACE users QUOTA 100M ON users;
      b. CREATE USER GUSdevReadOnly IDENTIFIED BY password DEFAULT TABLESPACE users QUOTA 100M ON users;
where password should differ for the two accounts.

The accounts require privileges as provided by the following SQL commands in the user initialization script. These commands grant privileges to the GUS accounts. The commands may be placed in a script (say, userinit.sql) and executed at the SQLPLUS prompt as @userinit.sql. This script is not part of the GUS distribution.

1.2 Creating GUS TableSpaces

In the next step tablespaces are created for the GUS namespaces CORE, SRES, DoTS, TESS and RAD3, along with index tables for each. This can be accomplished with the SQL commands in the table initialization script. These commands create the relational tables for the GUS namespaces. This script is not part of the GUS distribution.


2.0 Installing the GUS Distribution

This section gives instructions to install the GUS distribution from the CVS repository distribution into gus, the Oracle database created above. GUS installation includes the GUS schema, users, and the CBIL distribution, also distributed with GUS. The GUS installation consists of a source part and an executable part. The source part is rooted in the directory called $PROJECT_HOME. Installation is performed from $PROJECT_HOME, with "executable GUS" going to $GUS_HOME. The installation is interrupted at a couple of places to update parameter files in $GUS_HOME, as discussed below in building GUS. First, there are some additional details to attend to.

2.1 Setting up your shell environment

The GUS distribution can be installed in any account, but it can be handy to dedicate an account for GUS installation and management. For example, a GUS development account and a GUS production account could be useful. Each of account has its own Oracle database and Linux user account. Key locations are pointed to by shell environment variables. This snippet from from a .tcshrc script initializes the environment variables used by GUS. Note the requirements for Java and Ant. For further details about the use of these systems see the GUS Platform Web Site.

2.2 Obtaining and installing the GUS distribution

The GUS distribution and required CBIL project distribution can be downloaded from the Sanger Institute. The complete distributions can be obtained as three tarballs:

Note that the names of the tarballs can differ slightly from the generic titles used here. See the GUS Platform Site for further details about tarball locations.

After setting up one's shell environment, one can readily unpack the tar files by executing:

  • cd ${PROJECT_HOME}; zcat gus.tar.gz   |   tar xvpf -
  • cd ${PROJECT_HOME}; zcat install.tar.gz   |   tar xvpf -
  • cd ${PROJECT_HOME}; zcat cbil.tar.gz   |   tar xvpf -

2.3 GUS properties file

The .gus.properties file defines database information in your environment. Pointed to by the environment variable $GUS_CONFIG_FILE, .gus.properties stores information pertaining to the two Oracle databases: GUS read/write and GUS readonly. The contents of an operational .gus.properties file (minus the actual passwords) is given below.
        databaseLogin=GUSrw
databasePassword=mydogsname
readOnlyDatabaseLogin=GUSdevReadOnly
readOnlyDatabasePassword=mycatsname
coreSchemaName=Core
userName=dba
group=dba
project=Database administration
dbiDsn=dbi:Oracle:host=amrit.cs.uchicago.edu;sid=gus3
One can establish the GUS properties from the sample with the GUS distribution:
        cp $PROJECT_HOME/install/gus.properties.sample $GUS_CONFIG_FILE

The GUS properties file can be configured by copying the sample file and editing the contents to indicate the location of Perl on your system:
        cp $PROJECT_HOME/install/config/install.prop.sample $GUS_HOME/config/install.prop

2.4 Building GUS

Now one can instruct the build system to install both itself (the "install" project) and also to install the principal GUS project. There is a small workaround required though. The script $PROJECT_HOME/GUS/build.xml references database tables prior to installation of the Oracle tables, but the script that builds the tables scripts is {\tt build.xml}. So, we make a two pass install of GUS by first removing the database accesses in build.xml. These accesses occur in two places as of this writing in the following build.xml code fragments (line numbers are given from the release as of this writing) for GUS object generation:

    116      ⟨exec executable="generateGusObjects"
    117             failonerror="true"⟩
    118         ⟨arg value="--javaOrPerl=java"/⟩
    119       ⟨/exec⟩

    192       ⟨exec executable="generateGusObjects"
    193             failonerror="true"⟩
    194         ⟨arg value="--javaOrPerl=perl"/⟩
    195       ⟨exec⟩

Here is the build process with the work around. Note that three calls to build.xml are currently required.

  1. # remove GUS object generation from $PROJECT_HOME/GUS/build.xml
  2. shell> build GUS install -append
  3. shell> cp $GUS_HOME/config/schema.prop.sample $GUS_HOME/config/schema.prop
  4. # edit properties in schema.prop such as the SID and passwords for namespaces
  5. shell> build GUS install -append
  6. shell> cd $GUS_HOME/schema/oracle; create-db.sh |& tee create-db.sh.out
  7. # restore GUS object generation in $PROJECT_HOME/GUS/build.xml
  8. shell> build GUS install -append
where shell> is the shell command line prompt, and lines beginning with "#" describe a procedure to occur at that step.

2.5 Initializing GUS Database Contents

The process of moving data into and out of GUS uses plugins. Plugins are Perl modules registered with the GUS database by storing the plugin's version number and checksum, and other information. Plugins are always executed by the program ga: ga must be registered with the database before you can use GUS. However, before ga can be registered and GUS populated, there are several tables that require values.

2.6 Registering GUS namespace users

The .gus.properties file defines for GUS username, group and project entries. Whatever the definition of these entries in the properties file, they must be in the Core.UserInfo, Core.GroupInfo, and Core.ProjectInfo tables. During installation, GUS creates default values for these properties, so that one can simply define the user, group and project in .gus.properties to be the same as the values established by the GUS installation, namely:
  • userName=dba
  • group=dba
  • project=Database administration.
After registering ga and SubmitRow as described below for ga registration and submitrow registration, one can use the plugins to make entries in the Core.UserInfo, Core.GroupInfo, and Core.ProjectInfo tables for users, groups and projects at one's site; alternatively, one can use sqlplus to directly access the tables in the database. The latter is to be avoided if possible since plugins conform to protocols for registering algorithms that modify GUS database tables.

2.7 Populating the Machine Table

The Core.Machine table requires an entry that describes the machine where GUS is running. The following line inserted into the Core.Machine table will do the trick:
insert into core.machine values(0, 'unknown', NULL, 1, 0, SYSDATE, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1);
The attributes corresponding to the values for Core.Machine can be gleaned from the GUS schema browser.

2.8 Registering ga

The first step is to establish the plugin manager's property file. This file just contains the location of md5sum (for calculating checksums of plugins) on your system. The following will establish this:

  1. shell> cp $GUS_HOME/config/GUS-PluginMgr.prop.sample $GUS_HOME/config/GUS-PluginMgr.prop
  2. shell> # edit file to point to location of md5sum
ga can now be registered with the command ga +meta --commit.

2.9 Registering the SubmitRow plugin

Now that GUS is initialized, we will perform a partial check of the database by registering the broadly useful plugin SubmitRow. We use ga for this as follows

   ga +create GUS::Common::Plugin::SubmitRow
       ... check output and if looks okay, commit.
   ga +create GUS::Common::Plugin::SubmitRow --commit
The two stage approach for running ga allows one to check the output for problems without committing the changes to the database. GUS basics are initialized at this point. This is a good point to look at some of the tables created in the installation and initialization process. The next section tells how to populate other tables with information fundamental to GUS, for example the GO ontology, sequence types, and NCBIs taxonomy.