next last contents

Clip

NAME

clip -- an Experiment File sequence clipper

SYNOPSIS

clip [-v] [-s start_offset] [-m minimum_extent] [-M maximum_extent] [-w r_length_1] [-u r_unknown_1] [-W r_length_2] [-U r_unknown_2] [-l l_length_1] [-y l_unknown_1] [-L l_length_2] [-Y l_unknown_2] file ...

DESCRIPTION

Clip is a simple program to decide how much of the 3' end of a sequence, stored as an Experiment File, should be clipped off and ignored during assembly. The decision is made by simply counting the numbers of unknown bases (eg - or N) found within windows slid left to right along the sequence.

The file arguments, of which there can be several, are processed one at a time. Each argument is assumed to be a valid Experiment File. The sequence is read from the Experiment File SQ identifier; clipping is performed; and QL and QR identifiers are appended to the file.

The right clip position is calculated by sliding to the right a window of length r_length_1 along the sequence starting from base start_offset. We stop once we find greater than or equal to r_unknown_1 unknown bases. At this stage two choices are available; to place our clip at the start position of our first window or to proceed from our current position plus half of r_length_1 using a second window. In the latter case we perform a similar operation to the first window, except using the r_length_2 and r_unknown_2 parameters. We will then set the clip to be the start position of this second window.

The left clip position is calculated by sliding a window to the left starting from base start_offset. The algorithm used is identical to the right clip position except that the l_unknown_1, l_len_1, l_unknown_2 and l_len_2 parameters are used.

To only use one window (the default parameters) set the length_2 parameter to be 0 using -W 0.

The default arguments are "-s 70 -m 0 -M 999999 -w 100 -u 5 -W 0 -U 0 -l 20 -y 3 -L 0 -Y 0."

OPTIONS

-v
Enable verbose output. This outputs information on which files are currently being clipped.
-s offset
Force the first window to start the calculations from position offset in the sequence. This can be useful to avoid poor data at the 5' end of a sequence.
-m extent
If the clip algorithm returns a QL clip value of less than extent bases into the sequence then use extent as the QL value.
-M extent
If the clip algorithm returns a QR clip value of more than extent bases into the sequence then use extent as the QR value.
-w length
Set the length for the first rightwards window to length
-u unknown
Stop sliding the first rightwards window when we have greater than or equal to unknown bases within the current window.
-W length
Set the length for the second rightwards window to length. Setting this value to zero prevents the second window calculations from being performed.
-U unknown
Stop sliding the second rightwards window when we have greater than or equal to unknown bases within the current window.
-l length
Set the length for the first leftwards window to length
-y unknown
Stop sliding the first leftwards window when we have greater than or equal to unknown bases within the current window.
-L length
Set the length for the second leftwards window to length. Setting this value to zero prevents the second window calculations from being performed.
-Y unknown
Stop sliding the second leftwards window when we have greater than or equal to unknown bases within the current window.

EXAMPLE

To clip a batch of sequences listed in the `fofn' file with a minimum left clip value of 20 bases use:

clip -m 20 `cat fofn`

SEE ALSO

See section ExperimentFile(4).See section trace_clip.


next last contents
This page is maintained by James Bonfield. Last generated on 2 Febuary 1999.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/manpages_1.html