first previous next last contents

List Confidence

If the probabilistic consensus algorithm is used it is possible to give the expected number of errors in a particular consensus sequence. This is produced by simply summing the error rates at each base.

Each confidence value has a known error rate determined by the formula 10^(-confidence / 10.0). We also know the frequency that each confidence value occurs in the consensus sequence and hence know the expected number of errors for each confidence value. Working on the assumption that we are likely to check and fix the consensus bases with the lowest confidence values first, this allows us to give information on the cumulative number of errors that we would fix by checking every consensus base with a confidence value less than a particular threshold.

The List Confidence option, in the View menu, provides this ability. The dialogue simply allows selection of one or more contigs. Pressing OK then produces a table similar to the following:

Sequence length = 164068 bases.
Expected errors =  168.80 bases (1/971 error rate).

Value   Frequencies     Expected  Cumulative    Cumulative      Cumulative
                        errors    frequencies   errors          error rate
--------------------------------------------------------------------------
  0          0             0.00         0          0.00         1/971
  1          1             0.79         1          0.79         1/976
  2          0             0.00         1          0.79         1/976
  3          3             1.50         4          2.30         1/985
  4         30            11.94        34         14.24         1/1061
  5          2             0.63        36         14.87         1/1065
  6        263            66.06       299         80.94         1/1867
  7        151            30.13       450        111.06         1/2841
  8        164            25.99       614        137.06         1/5168
  9         96            12.09       710        149.14         1/8344
 10         80             8.00       790        157.14         1/14069

The above table tells us that we have 164068 bases in our consensus sequences with an expected 169 errors (giving us an average error rate of one in 971). Next it lists each confidence value along with the frequency of this value and the expected number of errors. For any particular confidence value the cumulative columns tell us how many bases in the sequence have the same or lower confidences and how many errors are expected in those bases. From this we know that if all these bases were checked and all the errors fixed we would have a new expected error rate.

In the above table we see that there are 790 bases with confidence values of 10 or less. We expect there to be 157 errors in those 790 bases. As we expect there to be about 169 errors in total that implies that manually checking those 790 bases would leave only 12 undetected errors. Given that the sequence length is 164068 bases this means an average error rate of 1 in 14069. Note that this error rate could be achieved by checking only .48% of the total number of consensus bases. In this particular example, editing the same sequence with a 100% consensus cutoff using the either of the frequency bases consensus methods would require checking 25165 bases (15.34%), although the overall error rate would be better.


first previous next last contents
This page is maintained by James Bonfield. Last generated on 2 Febuary 1999.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/gap4_122.html