Vector_clip compares batches of readings against vector sequences. Sequence readings are stored in experiment file format (see section Experiment File) and, for the majority of projects each experiment file should contain the data required by vector_clip: the file names of the vectors to screen against, and, for the sequencing vector, the position of the cloning and primer sites. See section Vector_Primer file format for an alternative method of defining vector data. The program pregap4 (see section Pregap4), contains modules for creating experiment files from trace files, and for adding data about the vectors used. When vector_clip runs it adds records to the readings' experiment file to denote the start and end of any segments which are found to match the vectors.
For conventional sequencing projects there are two types of vector for which readings will need to be screened: the sequencing vector, and, for cases where, say, whole cosmids or BACs have been shotgunned, the cloning vector. These two screening tasks are different. When screening for the sequencing vector we may expect to find data to exclude, both from the primer region and, when the insert is short, from the other side of the cloning site. It is also a wise precaution to check for rearrangements of the sequencing vector. When screening out cosmid vector we may find that either the 5' end, or the 3' end, or the whole of the sequence is vector. Also for the cloning vector search we need to compare both strands of the sequence.
In order to filter out readings that contain the sequences of contaminant DNA such as E. coli, a separate program screen_seq should be used (see section Screening for known possible contaminant sequences)
A further type of search is required for a new method that is being developed at MRC HGMP, Hinxton, UK. This new method (M. Starkey, personal communication) is an application of a technology described as "molecular indexing" Unrau, P. and Deugau, K.V. (1994) Non-cloning amplification of specific DNA fragments from whole genomic DNA digests using DNA indexers. Gene 145, 163-169. It produces sequences with a primer at their 3' ends which need to be found and removed.
Vector_clip is usually run as part of the pregap4 process (see section Pregap4) and will usually be called three times: the first to locate and mark the sequencing vector; next to check for vector rearrangements; and finally to locate and mark cosmid vector segments.
Vector_clip operates on batches of readings using files of file names: one input file and two output files - one for the names of the readings that pass and one for those that fail. The program also modifies the reading files.
In earlier versions of vector_clip all the information needed about the vector (i.e. its name, location on disk, the cloning and primer sites used) for each reading was expected to be stored in the readings' experiment file (See section Experiment File.) but, as is explained in the next paragraph, the newest version employs an alternative method for providing data about sequencing vectors. For notes on defining the cloning and primer sites, see section Defining Cloning and Primer Sites for Vector_Clip.
Primarily designed designed for people doing simple sequencing projects or who are processing sets of readings that are not going to be assembled, the new method uses files containing, not the complete vector sequences, but the segments of sequence between the primers and the cloning site. These files are termed "vector_primer" files see section Vector_Primer file format, and the vector_primer mode of vector_clip uses these files to discover which vectors and primers were used for each reading, and then to search for the corresponding sequences. Those doing sequence assembly should continue to put the vector data in their experiment files as some of it can be used by gap4.
If setting up these programs seems a little daunting, it is important to realise that the majority users need not concern themselves with the details of vector_clip and the creation of experiment files for their readings; or if they do, these configuration operations are only performed once per project, and are made relatively easy by the use of pregap4.