first previous next last contents

Records

It is important to note that the assembly program gap4 (see section Gap4 introduction) will not operate to its full effect if it is not given all the necessary data. For example gap4 contains many functions that can analyse the positions and relative orientations of readings from the same template in order to check the correctness of the assembly and determine the contig order. However if the records that name templates and their estimated lengths, and define the primers used to obtain readings from them are missing, none of these valuable analyses can be performed reliably. One way to ensure that all the necessary fields are present is to use the script PREGAP (see section Pregap4 introduction).

In the descriptions below records containing * are those read into the database during normal assembly; those with ** are extra items required when entering pre-assembled data; those with *** are read from SCF files (after the experiment file has been read to obtain the SCF file name); (see section SCF introduction) the record marked **** is an extra item required for Directed Assembly.

The order of records in the file is not important. They are listed here in alphabetical order with, where possible, reasons for the origin of their names. Several are redundant and no group is likely to make use of them all. Obviously others can be added in the future. Initially they might be of local use but if their use becomes wider they can be added to the standard set. Standard EMBL records such as FT are assumed to be included.

AC
ACcession number
AP
Assembly Position ****
AQ
AVerage Quality for bases 100..200
AV
Accuracy values for externally assembled data **, ***
BC
Base Calling software
CC
Comment line
CF
Cloning vector sequence File
CH
Special CHemistry
CL
Cloning vector Left end
CN
Clone Name
CR
Cloning vector Right end
CS
Cloning vector Sequence present in sequence *
CV
Cloning Vector type
DR
Direction of Read
DT
DaTe of experiment
EN
Entry Name
EX
EXperimental notes
FM
sequencing vector Fragmentation Method
ID
IDentifier *
LE
was Library Entry, but now identifies a well in a micro titre dish
LI
was subclone LIbrary but now identifies a micro titre dish
LN
Local format trace file Name *
LT
Local format trace file Type *
MC
MaChine on which experiment ran
MN
Machine generated trace file Name
MT
Machine generated trace file Type
ON
Original base Numbers (positions) **
OP
OPerator
PC
Position in Contig **
PD
Primer data (the sequence of a primer)
PN
Primer Name
PR
PRimer type *
PS
Processing Status
QL
poor Quality sequence present at Left (5') end *
QR
poor Quality sequence present at Right (3') end *
SC
Sequencing vector Cloning site
SE
SEnse (ie whether complemented) **
SF
Sequencing vector sequence File
SI
Sequencing vector Insertion length *
SL
Sequencing vector sequence present at Left (5') end *
SP
Sequencing vector Primer site (relative to cloning site)
SQ
SeQuence *
SR
Sequencing vector sequence present at Right (3') end *
SS
Screening Sequence
ST
STrands *
SV
Sequencing Vector type *
TG
Gel reading Tag *
TC
Contig Tag *
TN
Template Name *
WT
Wild type trace

first previous next last contents
This page is maintained by James Bonfield. Last generated on 2 Febuary 1999.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/formats_12.html