The figures shown are taken from: James K Bonfield, Cristina Rada and Rodger Staden, "Automated detection of point mutations using fluorescent sequence trace subtraction", Nucleic Acids Res. 26, 3404-3409, 1998. and are copyright Oxford University Press. The text below gives an outline of the paper to provide an introduction to the use of the methods. At present the methods are only suitable for automatic detection of point mutations, although the visual trace checking available in the gap4 contig editor can be used to examine possible heterozygote readings.
Mutations are detected and identified by sequencing using a fluorescence based sequencing instrument and comparing the resulting traces and base-calls with those of a reference sequence. The task of the comparison method is to distinguish base differences that are real mutations from those that are due to base-calling errors.
A program called trace_diff can automatically detect the mutations, and special types of trace display in our sequence assembly program gap4 greatly simplify the task of visually checking the results and data. trace_diff aligns the new traces with those of the reference trace and then analyses their differences. Bases with trace differences above a user-defined threshold are tagged as mutations. The traces of the new sequence and the reference sequence can be viewed with their differences from within the gap4 editor.
The basic idea is illustrated in Figure 1. which shows the traces as
displayed by the gap4 contig editor. The top trace is from a reference
sequence (actually it is a "consensus trace" calculated by combining
the data from a set of reference sequences), the middle one is from an
individual reading, and the bottom one the difference between the
other two. In general the difference trace contains very few features
however three base changes are shown with their associated peaks in
the differences plot. The left-hand mutation (C-T) at position 179
causes a pair of strong peaks in opposite directions with a small
context effect. The next mutation (T-G) at 184 has strong opposite
peaks and a strong context effect peak. The third mutation has quite
strong opposite peaks and a single context effect peak. The crucial
point is that context effects have peaks in only one direction but
mutations have strong peaks in both directions, and this is what
trace_diff searches for (and labels ready for viewing in gap4).
Any number of sequences can be processed in a single run and for each
individual prospective mutant sequence the operation is generally
performed in two steps. First trace_diff, is used to align and compare
the mutant and reference sequences and traces and to locate possible
mutations. Secondly the sequence is assembled into a gap4 database
from where users can visually check the differences between the
reference and mutant traces.
trace_diff is being used to study somatic hypermutation in
immunoglobulin genes. In Table 1, column A, we show the results of
applying trace_diff to 3 sets of readings (214 in total) determined as
part of the somatic hypermutation study. Sequencing was performed
using fluorescent dye terminators in an ABI377 sequencer. The results
from trace_diff were compared with those obtained from scanning the
complete traces by eye after the readings had been assembled into a
gap4 database. The test data consisted of 108497 bases called using
the standard ABI software. After the readings had been aligned with
their consensus sequences they contained 1232 differences, of which
392 were bases called as unknown (N), and a further 166 were padding
characters introduced during alignment. Visual inspection showed that
there were 353 real mutations, and with the threshold n = 4.0
trace_diff missed 36 of them and found 28 false positives. The false
positives tended to be at the two ends of the readings where the data
were less reliable and the false negatives were almost entirely due to
the weak G after A problem that is found in the chemistry used.
We have recently tried the new ABI BigDye terminators and found a
marked improvement in the sequences obtained: their lengths were
increased and the weak G problem was almost non-existent. The results
from one batch of data are shown in Table 1, columns B and C. Column B
contains results from sequences that were loosely clipped for quality,
giving an average analysis length per reading of 673 bases; and column
C has the results for when the readings were clipped more severely, to
leave only high quality data of an average analysed length of 560
bases. As can be seen there are far fewer base calling errors or
uncertainties for both ranges. For the extended range set trace_diff
missed 5 mutations and found 15 false positives, and for the narrower
range it missed no mutations and gave no false positives.
The gap4 program is mainly used for large-scale sequencing projects
but in Figure 2 we see its contig editor showing modes of operation
useful for mutation detection. Along the top are a series of menus and
buttons, one of which "Next problem", will automatically move the
editing cursor to the position of the next potential mutation that has
been tagged by trace_diff. The traces and their differences can be
scrolled in register with the editing cursor and so the traces for the
tagged bases can quickly be inspected. Individual readings are
numbered, named and written left to right. Accuracy estimates or
confidence values for each base are shown using grey scales: the
darker the background the poorer the data. Mutation tags are shown in
dark green (for example there are 3 visible on the top
sequence). Changes to the original base calls or accuracy estimates
are also colour coded by changes to the background colour: deletions
are shown in red, base changes in pink, padding characters in light
green and modified confidence values in blue.
The gap4 template display can provide an overview of all the mutations
in a set of readings. Figure 3. shows readings as red arrows and tags
as small coloured rectangles. In this example tags automatically
generated by trace_diff are shown in green, false negatives have been
manually edited to red and false positives to yellow. Tags are shown
both on the individual readings and on the scale at the bottom. This
display can also be used to immediately identify polymorphic residues
in population studies.
We have demonstrated the reliability of the automatic mutation
detection for dye terminators and for a more limited dataset, the new
BigDye terminators. Given the wide choice of instruments and protocols
in use it is not possible for us to cover them all. Nevertheless we
believe that those using the programs will quickly be able to
establish suitable threshold values for trace_diff appropriate to the
sequencing method of their choice. Obviously the choice of threshold
value also depends on the type of project being undertaken: for some
work an error rate similar to that obtained for our test data would be
acceptable and no visual checking within gap4 would be required, but
for other projects the threshold would need to be set low enough to
give a high chance of finding all possible mutations, and visual
inspection using the tag search routine essential in order to rule out
the false positives.
(Click for full size image)
Table 1
A B C
Number of readings 214 65 65
Number of bases 108497 43741 36400
Average analysed length 503 673 560
Base differences 1232 274 162
Real mutations 353 165 132
trace_diff false positives 28 15 0
trace_diff false negatives 36 5 0
(Click for full size image)
(Click for full size image)
This page is maintained by
James Bonfield.
Last generated on 2 Febuary 1999.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/mutations_1.html