first previous next last contents

Searching for point mutations using trace_diff

Trace_diff is a program for automatic detection of point mutations. It has a command line interface and is normally run from a script such as pregap (see section Pregap4 introduction) to operate on a batch of sequences. Usually the file to be scanned for mutations will be an experiment file which contains the name of its own trace file and the name of a file containing the reference trace. The readings' trace is aligned with that of the reference, and bases that differ and have significant trace differences are tagged as possible mutations. Trace_diff calculates the mean and standard deviation of the difference trace, and the "significance" of the trace differences is defined in standard deviation units. The user specifies the threshold using the -n option. When a possible mutation is found trace_diff writes a MUTN tag to the readings experiment file. The tag text contains an NC-IUB code (see below) to define the change, e.g. Y = C to T change and y = T to C change, and a numerical value gives the corresponding peak height in sd units.

              NC-IUB SYMBOLS USED BY TRACE_DIFF
 
        A,C,G,T
        r        (A,G)        'puRine'
        y        (T,C)        'pYrimidine'
        w        (A,T)        'Weak'
        s        (C,G)        'Strong'
        m        (A,C)        'aMino'
        k        (G,T)        'Keto'

        R        (G,A)        'puRine'
        Y        (C,T)        'pYrimidine'
        W        (T,A)        'Weak'
        S        (G,C)        'Strong'
        M        (C,A)        'aMino'
        K        (T,G)        'Keto'

The options are described below.

trace_diff [-v scf_version] [-p scf_precision] [-n num_sd] [-b band_width] [-s position] [-e position] [-o file] [-S] [-c] [-a] mutant_file [wild_type_file]

-v scf_version
Specifies the version of the SCF created when using -o. Valid values are 2 and 3. Defaults to 3.
-p scf_precision
Specifies the precision (in bits) of the trace samples stored in the SCF file. Valid values are 8 and 16. Defaults to 16.
-n num_sd
Specifies the threshold at which peaks in the difference trace are to be considered as potential mutations. This is the value most likely to be changed by the user. The default is 4.
-b band_width
Specifies the width of the band along the diagonal the sequence alignment matrix checks when aligning the sequences. Roughly speaking this is equivalent to the expected difference in the number of pads needed to each sequence, including end gaps. To force a full alignment specify band_width as the sequence length or greater.
-s position
-e position
Specifies the start and end positions within the mutant sequence in which to check for mutations. Note that looking for mutations in really poor quality data may have detrimental effects on the detection in good data. The default range is from 50 to 300.
-o file
Specifies the name of an SCF file in which to save the difference trace. No default exists, but this is optional.
-S
Silent mode: do not output to stdout information on mutations found. If an experiment file has been used, the mutations will still be written to the experiment file as tags.
-c
Specifies that the range checked (-s to -e) should be clipped, if necessary, by the QL and QR line types in the experiment file. Hence the start position is the maximum of the QL and -s values, whilst the end position in the minimum of QR and -e values.
-a
Specifies that all mutations found by analysing the difference trace will be output and/or tagged regardless of whether the base calls in the mutant and wild type sequences differ.

Examples

1. Get the usage.

arran [53]% trace_diff
trace_diff v1.06
Copyright (c) MRC Laboratory of Molecular Biology, 1997. All rights reserved.
Usage: trace_diff [options] file [wildtype-file]
Where options are:
    [-b band width (def. 30)] [-p precision (8|16)]
    [-v version (2|3)]        [-n num_sd (def. 4.000000)]
    [-s start (def. 50)]      [-e end (def. 300)]
    [-o output file]          [-S] [-c] [-a]

2. Run trace_diff on the file hs0091 using the quality clip points
(the QL and QR values) or 45 and 500, whichever gives the narrowest
range, and a threshold of 2.9.

arran [54]% trace_diff  -n2.9 -s45 -e 500 -c hs0091


first previous next last contents
This page is maintained by James Bonfield. Last generated on 2 Febuary 1999.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/mutations_2.html