first previous next last contents

Vector_clip

NAME

vector_clip -- finds and marks vector segments in sequence readings

SYNOPSIS

vector_clip -[schr] [-w word_length (4)] [-n num_diags (7)] [-d diagonal_score (0.35)] [-l minimum_match (20/70%)] [-m minimum_5'_position] [-t] [-p passed_fofn] [-f failed_fofn] input_fofn

DESCRIPTION

vector_clip finds and marks vector segments in sequence readings stored in experiment file format. For sequencing vectors it can be used to find the 5' primer and, for short inserts, the sequence to the 3' side of the cloning site. It can also be used to find 3' primer sequences. A further option can do a final check for any vector rearrangements that could be missed by the more specific searches around the cloning site. For cloning vectors it will search both orientations of the sequence and mark any segments found. The vector sequences must be stored as simple text files. The experiment file for each reading must contain the information about the vector sequences that is required by vector_clip

The program processes batches of readings by the use of file of file names: one is used for input and two for output. The input file lists the names of all the readings to process, one name per line. One output file contains the names of all the readings that pass the screening and the other contains the names of those that fail.

OPTIONS

-s
Select to search for sequencing vector at the 5' and 3' ends of the readings.
-c
Select to search for cloning vector.
-h
Select to search for 3' primer sequence. When searching for a 3' end primer the primer sequence must be stored in the readings experiment file as a PD record type. See section Experiment File.
-r
Select to search for sequencing vector rearrangements.
-w word_length
Select the word length for sequencing and cloning vector searches. The search finds matches of at least this length and adds them to the score for the diagonal on which they lie.
-n num_diags
Select the number of diagonals whose scores should be added to the diagonal with the highest score. For example a value of 7 means that the 3 diagonals on either side of the highest scoring one are added, making a total of 7.
-d diagonal_score
Select the cutoff score for the diagonal summing algorithm. The scores are normalised so that the highest possible for any diagonal is 1.0. This cutoff is used by the cloning vector search and the search for sequencing vector at the 3' side of the cloning site.
-l minimum_match
The value set by this option is used as a cutoff by two distinct searches! When searching for vector rearrangements it defines the minimum number of consecutive identical matching bases that are treated as a match. For example -l15 means if 15 bases in a row match the vector the reading is said to contain vector rearrangement. When searching for 5' or 3' primer sequences the value defines the percentage match to use as a cutoff. If the best alignment between the primer sequence and a segment of the reading reaches this value it will be marked as primer sequence.
-m minimum_5'_position
This value is used as a default 5' primer position. It is used to mark the reading if no 5' primer match is found.
-t
This option produces a more verbose output about matches found, but does not alter the experiment files of the readings searched. It is for testing.
-p file
Vector_clip outputs two files of filenames: one for the readings that pass the screening and one for those that fail. This option is for naming the pass file.
-f file
Vector_clip outputs two files of filenames: one for the readings that pass the screening and one for those that fail. This option is for naming the fails file.

EXAMPLES

Screen for sequencing vector using a word length of 4, summing 7 diagonals, with diagonal cutoff score of 0.4, default 5' primer position of 30 and 5' primer cutoff of 75%. The batch of files to process are named in files.in, the names of the passed files are written to files.pass and the names of those that fail to files.fail.

vector_clip -s -w4 -n7 -d0.4 -m30 -l75 -pfiles.pass -f files.fail files.in

Screen for cosmid vector using a word length of 4, summing 7 diagonals and diagonal cutoff score of 0.4. The batch of files to process are named in files.in, the names of the passed files are written to files.pass and the names of those that fail to files.fail.

vector_clip -c -w4 -n7 -d0.4 -pfiles.pass -f files.fail files.in

Screen for 3' primer using a cutoff of 75%. The batch of files to process are named in files.in, the names of the passed files are written to files.pass and the names of those that fail to files.fail.

vector_clip -h -l75 -pfiles.pass -f files.fail files.in

Screen for sequencing vector rearrangements using a cutoff of 20 bases. The batch of files to process are named in files.in, the names of the passed files are written to files.pass and the names of those that fail to files.fail.

vector_clip -r -l20 -pfiles.pass -f files.fail files.in

NOTES

The following error messages can be generated.

  1. Error: could not open experiment file
  2. Error: no sequence in experiment file
  3. Error: sequence too short
  4. Error: missing vector file name
  5. Error: missing cloning site
  6. Error: missing primer site
  7. Error: could not open vector file
  8. Error: could not write to experiment file
  9. Error: could not read vector file
  10. Error: missing primer sequence
  11. Warning: sequence now too short (no message)
  12. Warning: sequence entirely cloning vector (no message)
  13. Warning: possible vector rearrangement (no message)

SL, SR, CL, CR, CS and PS records are written to the experiment files.

SEE ALSO

See section Experiment File. For notes on defining the cloning and primer sites, @xref{Vector_Clip-Sites,Vector_Clip-Sites,Defining the Positions of Cloning and Primer Sites for Vector_Clip,,}.

See section scf(4).


first previous next last contents
This page is maintained by James Bonfield. Last generated on 2 Febuary 1999.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/manpages_12.html