The codon usage method Staden, R. and McLachlan, A.D. (1982) Codon preference and its use in identifying protein coding regions in long DNA sequences. Nucl. Acid Res. 10, 141-156. scans along a sequence and measures the closeness of each reading frames' codon composition to an expected set of codons. The results for each reading frame are plotted in the graphics window with frame 1 in the top panel, frame 2 the middle and frame 3 in the bottom panel. Frame 1 is the frame of the first base in the active region. At each position along the sequence the program also plots a single dot for the reading frame with the highest score. These dots appear at the midpoints of the three panels and will form a continuous line if one reading frame is consistently the highest scoring. The figure shown below shows a nip plot window containing the results of the codon usage method on a sequence from E. coli. Also visible are the cross hairs. Their x position is shown in sequence base numbers in the left hand box above the plot, and the y coordinate, expressed using the score values of the gene search, is shown in the right hand box. Each line in the window has its own colour and can be dragged and dropped to new locations to reorganise the plot. The cursor in the plot can be used to control the position of the cursor in the sequence display.
As can be seen in the dialogue below
the user can define the size of the scan window in codons (note that the
window length must be odd), the name of
the file containing the codon usage table, and the region of the
sequence to be analysed. The longer the window the smoother the plots
but the more difficult it is to finds the ends of the coding
segments. The stronger the codon preference in the codon table the
higher the discrimination between coding and non-coding (assuming the
sequence being analysed has the same preferences as those of the
table). Note also that the amino acid composition represented in the
table will also influence the results.
The codon tables produced by nip4 have a different format to those
produced by the old nip program. Nip4 can read nip codon tables but
nip cannot read those written by nip4. A typical table calculated by
nip4 is shown below.
At present the nip4 codon usage gene search calculation is not
identical to that of nip. Nip also allows various normalisations of
the codon usage table but as yet they are not implemented in
nip4. These included "Normalise to no amino acid bias" and "Normalise
to average amino acid composition". It is expected that these
operations will be performed by small additional programs and not by
functions internal to nip4. However nip4, like nip, does produce
values that are independent of the frequency of stop codons: their
value in the calculations is set to the mean of all the other codons
in the table.
(Click for full size image)
Sequence atpase.dat_r
Range from 3171 to 4000
===============================================
F ttt 0 S tct 6 Y tat 2 C tgt 3
F ttc 3 S tcc 8 Y tac 6 C tgc 0
L tta 0 S tca 0 * taa 0 * tga 0
L ttg 1 S tcg 0 * tag 0 W tgg 0
===============================================
L ctt 1 P cct 0 H cat 0 R cgt 12
L ctc 1 P ccc 0 H cac 4 R cgc 5
L cta 1 P cca 2 Q caa 2 R cga 0
L ctg 19 P ccg 7 Q cag 12 R cgg 0
===============================================
I att 5 T act 3 N aat 2 S agt 2
I atc 22 T acc 6 N aac 7 S agc 1
I ata 0 T aca 1 K aaa 8 R aga 0
M atg 8 T acg 0 K aag 2 R agg 0
===============================================
V gtt 14 A gct 12 D gat 7 G ggt 16
V gtc 1 A gcc 4 D gac 9 G ggc 11
V gta 7 A gca 8 E gaa 14 G gga 0
V gtg 4 A gcg 5 E gag 2 G ggg 0
===============================================
This page is maintained by
James Bonfield.
Last generated on 2 Febuary 1999.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/nip4_23.html