When processing a batch of trace files it is likely that not all of the readings will share all the same Experiment File fields. As described above Pregap supports using an external database for storing such information. In the following sections we outline methods for providing a very simple text based database.
The simplest of these cases is where all the readings to be processed with
Pregap are from the same project (let's call this project TEST),
sequenced using the same vectors, and all have the same expected vector insert
size. The only differing information for each reading is it's primer type
(whether a forward or reverse reading, and whether using the universal primer
or from a custom oligo), vector primer site (which depends on the primer
type), the reading name, and the template name. This information corresponds
to the ID, PR, SP and TN Experiment File line
types. As the ID field is read from the initial Experiment File created
we only need to provide database hooks for the other line types. The other
necessary line types can be defined as constants.
We add the following to our `.pregaprc' file.
#----------------------------------------------------------------------------- # Our database hooks # Constant things CN=TEST SF=m13mp18.vec SV=m13mp18 SC=6249 SI=1400..2000 CF=lawrist7.seq CV=lawrist7 ST=2 # Read from database file ST_com='pregap_lookup db-short $ID 1' PR_com='pregap_lookup db-short $ID 2' SP_com='pregap_lookup db-short $ID 3' TN_com='pregap_lookup db-short $ID 4' # Constants evaluated as commands at pregap startup OP=`whoami` DT=`date`
The constant line types are defined simply using, for example,
CN=TEST (to define the gap database name). Following these are
the _com commands. These use a small program (supplied with the
package distribution) named pregap_lookup that takes two arguments. The
program searches a plain text file finding lines with the first word
matching the first argument ($ID in this case, which is the
reading name). It then prints the nth subsequent word on the line.
where n is the second argument to pregap_lookup. This means we can
store our primer, strand and template information in a simple text file,
named `db-short' in our example, as follows.
#ID ST PR SP TN a11bc.f1 2 1 41 a11bc a11bc.r1 2 2 -24 a11bc a11bc.f2 2 3 41 a11bc a11bc.r2 2 4 -24 a11bc a22bc.s1 1 1 41 a22bc
Note that the PR field now can hold both the old PR and
DR Experiment File line types.See section Experiment File.
Finally, some line types (chiefly DT for date and OP for
operator) are not known at the time of creating a `.pregaprc' file,
but are still constant for all readings. These have been defined last in
the `.pregaprc' file. Note the use of backquotes instead of forward
quotes. For example, DT=`date` sets the DT field for all
readings to be the output of the date command.