next last contents

Introduction

Gap4 (1) Bonfield,J.K., Smith,K.F. and Staden,R. A new DNA sequence assembly program. Nucleic Acids Res. 24, 4992-4999 (1995) is a Genome Assembly Program. It is being used for both large and small projects. By default the program creates a database of sufficient size to store approximately 8000 readings but this can be extended. The program contains all the tools that would be expected from an assembly program plus many unique features and a very easily used interface.

Gap can handle data produced by a variety of sequencing instruments including ABI 373A, ABI 377, Pharmacia A.L.F. and LiCor. It can also handle data entered using digitisers or that has been typed in by hand. Usually the trace files which are in proprietary format, such as those of ABI, are converted to SCF files (see section SCF introduction). By analysing the traces we also calculate base accuracy values Bonfield,J.K. and Staden,R. The application of numerical estimates of base calling accuracy to DNA sequencing projects. Nucleic Acids Research 23, 1406-1410 (1995). which are then stored in the SCF files. All the preassembly steps including those already mentioned, plus quality clipping, sequencing vector and cloning (cosmid) vector removal, are controlled by the program Pregap4 (see section Pregap4 introduction). During this processing the readings are stored in Experiment files (see section Experiment files).

Experiment file format is similar to that of EMBL sequence entries in that each record starts with a two letter identifier, but we have invented new records specific to sequencing experiments. One of PREGAP's tasks is to augment the experiment files to include data about the vectors, primers and templates used in the production of each reading, and if necessary it can extract this information from external databases. Some of the information is needed by PREGAP and some by gap.

NOTE that in order to get the most from gap it is essential to make sure that it is supplied, via the experiment files, with all the information it needs. (We are aware of programs from other groups that perform similar tasks to PREGAP but which create incomplete or incorrect experiment files and which hence lessen the usefulness of gap and so can increase the time taken to complete projects).

Gap inputs reading data stored in experiment files and stores them in its own database. The only other files required during a project are trace files from sequencing instruments, but these are not copied into the database. The experiment file for a reading should contain the name of the trace file from which it was derived and this name is copied into the database so that gap4 can read the trace whenever it is required.

The final result from a sequencing project is a consensus sequence and gap can write these in experiment file format, fasta format or staden format. Of course the whole database and all the trace files are also useful for future reference as they allow any queries about the accuracy of the sequence to be answered quickly.

Readings are entered into the gap4 database using the assembly algorithms (see section Assembly Introduction). In general these algorithms will build the largest contigs they can by finding overlaps between the readings, however some, perhaps more doubtful, joins between contigs may be missed, and these can be revealed, checked and made using Find Internal Joins (see section Find Internal Joins) and Join Contigs (see section Editor joining). Other relationships between contigs can be revealed by analysing the positions and orientations of readings derived from the same template Show Templates (see section Template Display). Readings can be checked and edited using the Contig Editor (see section Editor introduction).

The main window for gap contains File, Edit, View, Options, Experiments, Lists and Assembly menus. The File menu includes database opening and copying functions and consensus calculation options. The Edit menu contains options that alter the contents of the database such as Edit Contig (see section Editor introduction), Join Contigs (see section Editor joining), Break Contig (see section Break Contig), Disassemble Readings (see section Break Contig), Double Strand (see section Double Strand), and Doctor Database (see section Doctor database).

The View menu contains Contig Selector (see section Contig Selector), ResultsManager (see section Results Manager), Find Internal Joins (see section Find Internal Joins), Find read Pairs (see section Find Read Pairs), Find repeats (see section Find repeats), Check Assembly (see section Check Assembly), Find Oligos (see section Find Oligos), Show Templates (see section Template Display), Show Relationships (see section Show Relationships), Restriction Enzyme map (see section Restriction Enzyme Search), Stop Codon Map (see section Stop Codon Map), Quality Plot (see section Quality Plot) and Order Contigs (see section Ordering Contigs).

The Options menu (see section Options Menu) contains Configure Cutoffs, Set Maxseq, and Set Fonts.

The Experiment menu contains options to analyse the contigs and to suggest experimental solutions to problems including Suggest Long Readings (see section Suggest Long Readings), Suggest Primers (see section Suggest Primers), Compressions and Stops (see section Compressions and Stops) and Suggest Probes (see section Suggest Probes).

The Lists menu contains a set of options for creating and editing lists for use in other parts of the program (see section Lists Introduction), including Minimal Coverage (see section Lists Minimum Coverage), and Unattached Readings (see section Lists Unattached Readings).

The Assembly menu contains various assembly modes including Normal Shotgun Assembly (see section Normal Shotgun Assembly), Directed Assembly (see section Directed Assembly), Screen Only (see section Assembly Screen Only), Enter Pre-assembled data (see section Assemble Pre), Assembly Independently (see section Assembly Independently), Cap2 Assembly (see section Assembly CAP2) and FAKII Assembly (see section Assembly FAKII).

The main window (shown below) contains an Output window for textual results and an Error window for error messages.

[picture]
(Click for full size image)

Other displays used by the program include (shown below) the Contig Selector,

[picture]

(shown below) the Contig Comparator,

[picture]
(Click for full size image)

(shown below) the Template Display,

[picture]
(Click for full size image)

(shown below) the Restriction Enzyme Map,

[picture]
(Click for full size image)

(shown below) the Stop Codon map,

[picture]
(Click for full size image)

(shown below) the Contig Editor,

[picture]
(Click for full size image)

and the contig editor displaying quality grey scales, disagreements and edits

[picture]
(Click for full size image)

and (shown below) the Contig Joining Editor.

[picture]
(Click for full size image)

Only one copy of the Contig Selector and Contig Comparator can be shown, but any number of the other types of displays can be used simultaneously, even on the same contig. For example it is possible to have several contig editors running on the same contig.


next last contents
This page is maintained by James Bonfield. Last generated on 2 Febuary 1999.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/gap4_1.html