After using some assembly methods, especially those using the full extent of the sequences, it may be noticed that some of the alignments are poor. This is not necessarily an error in the assembly system, but simply that with sequences that disagree considerably there may be no good alignment. However this can cause problems for the consensus algorithms as it implies that the column of bases above a particular consensus base may not infact be the correct bases. The bases themselves may well be correct, or have correct error rates, but they also need to be in the correct columns.
The difference clipping system produces the current most likely consensus sequence and then compares each reading against that consensus sequence. It then identifies areas at the ends of sequence where there are enough differences to indicate possible badly aligned bases.
To identify the clip points for each reading we firstly find a good matching
segment near the middle of the reading. We then step base by base from this
point to the left totally a score as we go with +1 for a sequence to consensus
match and -2 for a mismatch. We define the left clip point to be the highest
accumulated score. Similarly we step base by base to the right to define the
right clip point. Then only if these new clips are more stringent than the
original clip points do we adjust them. The portions of readings which have
been clipped are then tagged using a
DIFF tag type. If you wish to see
which segments have been clipped use the contig editor search tool.
After clipping the algorithm then identifies any holes (breaks in the contigs) that may have been created and fills them up again by extending the sequence(s) with the fewest number of expected errors.