next last contents


Before assembly into a Gap4 database the raw data from sequencing instruments needs to pass through several steps, such as sequencing vector removal, quality evaluation, and conversion of data formats. A list of the more common steps can be seen in the figure on the following page. Each of these steps is performed by a small individual program. Pregap4 is used to pass a batch of readings through these steps in an automatic way.

The philosophy of Pregap4 is that the processing should be broken down into a series of simple separate tasks (termed "modules"). Each module is typically managed by a dedicated program. The goal of Pregap4 is to wrap all of these modules into a single easy to use environment, whilst maintaining the flexibility to select and extend the processing modules.


The picture above shows Pregap4 providing a list of the currently available modules to choose from. Those with a tick next to them have been chosen for use on the current data set. This is not the complete list. Further modules may be added and existing modules may be used more than once (such as when screening against two separate blast databases).

The general operation of Pregap4 is that on start up the user first needs to select the files to process. This is done using the "Set Files to Process" command (from the File menu). Alternatively the files can be specified on the command line at the time of starting up Pregap4. The "Configure Modules" allows for the currently available modules to be enabled or disabled, and the module parameters edited accordingly.


Once all modules have been configured (so that none have edit listed next to their name) Pregap4 may start processing. This is done by pressing "Run" or by selecting "Run" from the File menu. If there are any problems, go back to the configure step, adjust the parameters, and try again. When you're happy that Pregap4 has a setup that you wish to keep for future use select "Save All Parameters (in all modules)" from the Modules menu. This will make Pregap4 store all the module parameters to a configuration file which will be read by subsequent uses of Pregap4.

To run Pregap4 in a non interactive mode use "pregap4 -nowin". This will not bring up a graphical interface and will attempt to "Run" automatically. Hence it is necessary to also specify the files to process on the command line and also to have previously configured Pregap4 (typically by using the GUI).

Gap4 can make use of information about your readings which may not be contained within the raw data files, such as sequencing chemistry and whether it is a forward or reverse reading. Gap4 will work without this information, but at a reduced level. For instance knowing which forward and reverse readings belong together allows for checking the validity of assembly and automatic ordering of contigs.

Pregap4 can be used to add this data to each readings Experiment File, and via that into Gap4. Sometimes such information is encoded in a local sequencing naming convention. Pregap4 can be configured to extract information from your own naming convention. Alternatively a simple "flat file" text database may be used to associate information with each sequence. Ultimately it is possible to configure Pregap4 to obtain this information from any data source, although it may require some work.

next last contents
This page is maintained by James Bonfield. Last generated on 2 Febuary 1999.