next last contents

Introduction to mutation detection

The figures shown are taken from: James K Bonfield, Cristina Rada and Rodger Staden, "Automated detection of point mutations using fluorescent sequence trace subtraction", Nucleic Acids Res. 26, 3404-3409, 1998. and are copyright Oxford University Press. The text below gives an outline of the paper to provide an introduction to the use of the methods. At present the methods are only suitable for automatic detection of point mutations, although the visual trace checking available in the gap4 contig editor can be used to examine possible heterozygote readings.

Mutations are detected and identified by sequencing using a fluorescence based sequencing instrument and comparing the resulting traces and base-calls with those of a reference sequence. The task of the comparison method is to distinguish base differences that are real mutations from those that are due to base-calling errors.

A program called trace_diff can automatically detect the mutations, and special types of trace display in our sequence assembly program gap4 greatly simplify the task of visually checking the results and data. trace_diff aligns the new traces with those of the reference trace and then analyses their differences. Bases with trace differences above a user-defined threshold are tagged as mutations. The traces of the new sequence and the reference sequence can be viewed with their differences from within the gap4 editor.

(Click for full size image)

The basic idea is illustrated in Figure 1. which shows the traces as displayed by the gap4 contig editor. The top trace is from a reference sequence (actually it is a "consensus trace" calculated by combining the data from a set of reference sequences), the middle one is from an individual reading, and the bottom one the difference between the other two. In general the difference trace contains very few features however three base changes are shown with their associated peaks in the differences plot. The left-hand mutation (C-T) at position 179 causes a pair of strong peaks in opposite directions with a small context effect. The next mutation (T-G) at 184 has strong opposite peaks and a strong context effect peak. The third mutation has quite strong opposite peaks and a single context effect peak. The crucial point is that context effects have peaks in only one direction but mutations have strong peaks in both directions, and this is what trace_diff searches for (and labels ready for viewing in gap4).

Any number of sequences can be processed in a single run and for each individual prospective mutant sequence the operation is generally performed in two steps. First trace_diff, is used to align and compare the mutant and reference sequences and traces and to locate possible mutations. Secondly the sequence is assembled into a gap4 database from where users can visually check the differences between the reference and mutant traces.

trace_diff is being used to study somatic hypermutation in immunoglobulin genes. In Table 1, column A, we show the results of applying trace_diff to 3 sets of readings (214 in total) determined as part of the somatic hypermutation study. Sequencing was performed using fluorescent dye terminators in an ABI377 sequencer. The results from trace_diff were compared with those obtained from scanning the complete traces by eye after the readings had been assembled into a gap4 database. The test data consisted of 108497 bases called using the standard ABI software. After the readings had been aligned with their consensus sequences they contained 1232 differences, of which 392 were bases called as unknown (N), and a further 166 were padding characters introduced during alignment. Visual inspection showed that there were 353 real mutations, and with the threshold n = 4.0 trace_diff missed 36 of them and found 28 false positives. The false positives tended to be at the two ends of the readings where the data were less reliable and the false negatives were almost entirely due to the weak G after A problem that is found in the chemistry used.

Table 1
                               A      B      C
Number of readings            214     65     65
Number of bases            108497  43741  36400
Average analysed length       503    673    560
Base differences             1232    274    162
Real mutations                353    165    132
trace_diff false positives     28     15      0
trace_diff false negatives     36      5      0

We have recently tried the new ABI BigDye terminators and found a marked improvement in the sequences obtained: their lengths were increased and the weak G problem was almost non-existent. The results from one batch of data are shown in Table 1, columns B and C. Column B contains results from sequences that were loosely clipped for quality, giving an average analysis length per reading of 673 bases; and column C has the results for when the readings were clipped more severely, to leave only high quality data of an average analysed length of 560 bases. As can be seen there are far fewer base calling errors or uncertainties for both ranges. For the extended range set trace_diff missed 5 mutations and found 15 false positives, and for the narrower range it missed no mutations and gave no false positives.

(Click for full size image)

The gap4 program is mainly used for large-scale sequencing projects but in Figure 2 we see its contig editor showing modes of operation useful for mutation detection. Along the top are a series of menus and buttons, one of which "Next problem", will automatically move the editing cursor to the position of the next potential mutation that has been tagged by trace_diff. The traces and their differences can be scrolled in register with the editing cursor and so the traces for the tagged bases can quickly be inspected. Individual readings are numbered, named and written left to right. Accuracy estimates or confidence values for each base are shown using grey scales: the darker the background the poorer the data. Mutation tags are shown in dark green (for example there are 3 visible on the top sequence). Changes to the original base calls or accuracy estimates are also colour coded by changes to the background colour: deletions are shown in red, base changes in pink, padding characters in light green and modified confidence values in blue.

(Click for full size image)

The gap4 template display can provide an overview of all the mutations in a set of readings. Figure 3. shows readings as red arrows and tags as small coloured rectangles. In this example tags automatically generated by trace_diff are shown in green, false negatives have been manually edited to red and false positives to yellow. Tags are shown both on the individual readings and on the scale at the bottom. This display can also be used to immediately identify polymorphic residues in population studies.

We have demonstrated the reliability of the automatic mutation detection for dye terminators and for a more limited dataset, the new BigDye terminators. Given the wide choice of instruments and protocols in use it is not possible for us to cover them all. Nevertheless we believe that those using the programs will quickly be able to establish suitable threshold values for trace_diff appropriate to the sequencing method of their choice. Obviously the choice of threshold value also depends on the type of project being undertaken: for some work an error rate similar to that obtained for our test data would be acceptable and no visual checking within gap4 would be required, but for other projects the threshold would need to be set low enough to give a high chance of finding all possible mutations, and visual inspection using the tag search routine essential in order to rule out the false positives.

next last contents
This page is maintained by James Bonfield. Last generated on 2 Febuary 1999.