first previous next last contents

Example Integration - Simple Database

When processing a batch of trace files it is likely that not all of the readings will share all the same Experiment File fields. As described above Pregap supports using an external database for storing such information. In the following sections we outline methods for providing a very simple text based database.

The simplest of these cases is where all the readings to be processed with Pregap are from the same project (let's call this project TEST), sequenced using the same vectors, and all have the same expected vector insert size. The only differing information for each reading is it's primer type (whether a forward or reverse reading, and whether using the universal primer or from a custom oligo), vector primer site (which depends on the primer type), the reading name, and the template name. This information corresponds to the ID, PR, SP and TN Experiment File line types. As the ID field is read from the initial Experiment File created we only need to provide database hooks for the other line types. The other necessary line types can be defined as constants.

We add the following to our `.pregaprc' file.

#-----------------------------------------------------------------------------
# Our database hooks

# Constant things
CN=TEST
SF=m13mp18.vec
SV=m13mp18
SC=6249
SI=1400..2000
CF=lawrist7.seq
CV=lawrist7
ST=2

# Read from database file
ST_com='pregap_lookup db-short $ID 1'
PR_com='pregap_lookup db-short $ID 2'
SP_com='pregap_lookup db-short $ID 3'
TN_com='pregap_lookup db-short $ID 4'

# Constants evaluated as commands at pregap startup
OP=`whoami`
DT=`date`

The constant line types are defined simply using, for example, CN=TEST (to define the gap database name). Following these are the _com commands. These use a small program (supplied with the package distribution) named pregap_lookup that takes two arguments. The program searches a plain text file finding lines with the first word matching the first argument ($ID in this case, which is the reading name). It then prints the nth subsequent word on the line. where n is the second argument to pregap_lookup. This means we can store our primer, strand and template information in a simple text file, named `db-short' in our example, as follows.

#ID        ST  PR  SP TN
a11bc.f1   2   1   41 a11bc
a11bc.r1   2   2  -24 a11bc
a11bc.f2   2   3   41 a11bc
a11bc.r2   2   4  -24 a11bc
a22bc.s1   1   1   41 a22bc

Note that the PR field now can hold both the old PR and DR Experiment File line types.See section Experiment File.

Finally, some line types (chiefly DT for date and OP for operator) are not known at the time of creating a `.pregaprc' file, but are still constant for all readings. These have been defined last in the `.pregaprc' file. Note the use of backquotes instead of forward quotes. For example, DT=`date` sets the DT field for all readings to be the output of the date command.


first previous next last contents
This page is maintained by James Bonfield. Last generated on 2 Febuary 1999.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/pregap_8.html