Vector sequences should be stored in simple text files with up to 80 characters of data per line. Sequencing vectors are those vectors such as m13 used to produce templates for sequencing. All other vectors, such as cosmid vectors, that are used to purify and grow the DNA prior to it being subcloned into sequencing vectors are termed "cloning vectors". It is important that the files containing cloning vector sequences which are used by vector_clip are arranged so that the cloning site follows the last base in the file. For example (where X is the cloning site):
start of file acatacatacatatata acatagatagatacaga . . . cagatataX end of file Cloning Vector File Base Ordering
In order for vector_clip to search readings for segments of sequencing vectors it needs to know the positions of the cloning site and primers. Specifically, each readings' experiment file should contain SC and SP records, and also a primer type record (PR). The following section explains the numbering system used with an example for m13mp18, and then describes how to use nip4 (see section Introduction) to work out the values for other vector, cloning site, and primer combinations.
The position of the cloning site depends on the ordering of the bases in the particular vector sequence file being used. That is, as the sequences are circular, the file may be arranged to start at any base and still give the same circular sequence. Vector_Clip must be told the correct position of the cloning site, then, relative to that, the position of the first base that will be included in the reading. i.e. the relative position of the first base 3' of the primer.
Below we use EMBL entry M13MP18 as an example. The figure includes a double stranded listing of 120 characters of m13mp18 around the SmaI site at 6249, and some of the restriction sites. Between the restriction sites and the sequence we have added lines to explain the numbering used by vector_clip. The numbers below the row of "+" symbols show positive positions (to the right of the SmaI cloning site), and the numbers below the "-" symbols show negative positions (to the left of the cloning site). Below these lines we show the sequences of the 16mer reverse primer "r(-21)" which is at relative position -24, and the 17mer forward primer "f(-20)" which is at relative position 41.
The positions of SmaI site and forward and reverse primers for M13MP18 EcoRI . TaqI . . SacI . . . XmaI . . . .HpaII . . . ..AsuC2I . . . ..SmaI . . . ... BamHI . . . ... MboI . . . ... Sau3AI . . . ... XhoII . . . ... . PspN4I . . . ... . . XbaI ++++++++++10+ ----20--------10-------- 123456789012 r(-21) 432109876543210987654321 aacagctatgaccatg acacaggaaacagctatgaccatgattacgaattcgagctcggtacccggggatcctcta 6210 6220 6230 6240 6250 6260 tgtgtcctttgtcgatactggtactaatgcttaagctcgagccatgggcccctaggagat HinfI . SalI . .AccI . .. SdaI . .. . BspMI . .. . . BbuI CfrI . .. . . Hsp92II . BshI . .. . . PaeI . HaeIII . .. . . SphI . PalI . .. . . . Cac8I . .Bse1I MaeII . .. . . . HindIII . .BseNI . TaiI . .. . . . . AluI . .BsrI . TscI . .. . . . . . MwoI . .TspRI . .Tsp45I +++++++20++++++++30++++++++40 34567890123456789012345678901 f(-20) tgaccggcagcaaaatg gagtcgacctgcaggcatgcaagcttggcactggccgtcgttttacaacgtcgtgactgg 6270 6280 6290 6300 6310 6320 ctcagctggacgtccgtacgttcgaaccgtgaccggcagcaaaatgttgcagcactgacc