first previous next last contents

Find Internal Joins Dialogue

[picture]

The contigs to use in the search can be defined as "all contigs", a list of contigs in a file "file", or a list of contigs in a list "list". If "file" or "list" is selected the browse button is activated and gives access to file or list browsers. Two types of search can be selected: one, "Probe all against all" compares all the contigs defined against one another; the other "Probe with single contig", compares one contig against all the contigs in the list. If this option is selected the Contig identifier panel in the dialogue box is ungreyed. Both sense of the sequences are compared.

The search algorithm first finds matching words of length "Word length", and then the diagonals that contain the highest proportion of hits. The parts of the contigs to which these diagonals correspond are then sent to an alignment routine. All potential overlaps found must produce an alignment reaching the ends of the contigs which is at least "Minimum overlap" long with less than "Maximum percent mismatch" differences.

The default is to "Use hidden data" which means that where possible the contigs are extended using the poor quality data from the readings near their ends. To ensure that this additional data is not so poor that matches will be missed, the program uses the following algorithm. It slides a window of size "Window size for good data scan" along the hidden data for each reading and stops if it finds a window that contains more than "Max dashes in scan window" non-ACGT characters. The data that extends the contig the furthest is added to its consensus sequence. If the user toggles off the use of hidden data the "Window size for good data scan" and "Max number of dashes in scan window" dialogues will be greyed out.

If users elect not to "Use standard consensus" they can either "Mark active tags" or "Mask active tags", in which cases the "Select tags" button will be activated. Clicking on this button will bring up a check box dialogue to enable the user to select the tags types they wish to activate. Masking the active tags means that all segments covered by tags that are "active" will not be used in the first phase of the matching algorithm, but will be used in the second phase. That is matches will not be initiated within these segments but if they extend into them the alignment will be performed in the normal way. A typical use of this mode is to avoid finding matches in segments covered by tags of type ALUS (ie segments thought to be Alu sequence) or REPT (ie segment that are known to be repeated elsewhere in the data (see section Tag types). "Marking" is of less use: matches will be found in marked segments during the first phase of searching, but in the alignment shown in the Output Window, marked segments will be shown in lower case.

If your total consensus sequence length (including a 20 character header for each contig that is used internally by the program) plus any hidden data at the ends of contigs is greater than the current value of a parameter called maxseq, Find Internal Joins will produce an error message advising you to increase maxseq. Maxseq can be set on the command line (see section Command line arguments) or by using the options menu (see section Options Menu).


first previous next last contents
This page is maintained by James Bonfield. Last generated on 2 Febuary 1999.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/gap4_98.html