[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents


[ Top | Next ]

BackTranslate backtranslates an amino acid sequence into a nucleotide sequence. The output helps you identify areas with fewer ambiguities that might be candidates for synthetic probes.


[ Previous | Top | Next ]

BackTranslate uses a translation table to backtranslate a protein sequence to the most probably or most ambiguous nucleic acid sequence. The output file can be used as input to other Wisconsin Package(TM) programs.

If you choose one of the table of back-translations parameters, the program also uses a codon preference table and writes the codons for each amino acid in order of their preference in that table. Below each codon list, there is a number between 0 and 1,000; it is the product of the probabilities for the most likely codons for the next four amino acids multiplied by 1,000. The higher the number, the more likely it is that the next 12 nucleotides (four amino acids) contain preferred codons.


[ Previous | Top | Next ]

To make a back-translation of the ilvI protein showing all possible back-translations from amino acids one to six, using codon frequencies from the file ecohigh.cod, you would do the following:

% backtranslate

  BACKTRANSLATE what sequence ?  ilvhiaa.pep

                 Begin (* 1 *) ?
                 End (* 956 *) ?  6

  Would you like to see:

      a) table of back-translations and most probable sequence
      b) table of back-translations and most ambiguous sequence
      c) most probable sequence only
      d) most ambiguous sequence only

  Please choose one (* b *):

  Use what codon frequency file (* GenRunData:ecohigh.cod *) ?

  What should I call the output file (* ilvhiaa.seq *) ?



[ Previous | Top | Next ]

Here is part of the output file:

 BACKTRANSLATE of: : ilvhiaa.pep  check: 2165  from: 1  to: 6

E Coli. ilvI - ilvH (peptide)

 Using codon frequencies from: /package/share/9.0/gcgcore/data/rundata/ecohigh.cod
 CheckFile: 9032

Codon usage for enteric bacterial (highly expressed) genes 7/19/83

    Ser        Phe        Ser        Gln        Pro        Trp

  UCC 0.37   UUC 0.76   UCC 0.37   CAG 0.86   CCG 0.77   UGG 1.00
  UCU 0.34   UUU 0.24   UCU 0.34   CAA 0.14   CCA 0.15
  AGC 0.20              AGC 0.20              CCU 0.08
  UCG 0.04              UCG 0.04              CCC 0.00
  AGU 0.03              AGU 0.03
  UCA 0.02              UCA 0.02
  89         186        245        0          0          0

ilvhiaa.seq  Length: 18  September 30, 1998 17:08  Type: N  Check: 2929  ..



[ Previous | Top | Next ]

BackTranslate accepts a single protein sequence and a single codon frequency table as input. Look at the CodonFrequency program for information about how to create or modify a codon frequency file. If BackTranslate rejects your protein sequence, see Appendix VI for information on how how to change or set the type of a sequence.


[ Previous | Top | Next ]

Prime selects oligonucleotide primers for a template DNA sequence. The primers may be useful for the polymerase chain reaction (PCR) or for DNA sequencing. You can allow Prime to choose primers from the whole template or limit the choices to a particular set of primers listed in a file.

CodonFrequency tabulates codon usage from sequences or existing codon frequency tables. Composition counts trinucleotides from any set of sequences. The mapping programs can be run with -ALL to identify all potential restriction sites in back-translated sequences. If you run the mapping programs with -SILent, they will identify potential restriction sites that can be created which won't change the translation of the nucleic acid sequence.


[ Previous | Top | Next ]

No checking is done to see that your codon frequency table and your translation table agree. The most ambiguous back-translated sequence comes from the translation table. The most probable back-translated sequence comes from the codon frequency table. The table of codon choices also comes from the codon frequency table.


[ Previous | Top | Next ]

You should realize that the most ambiguous back-translation uses three IUB codes (see Appendix III) to represent each codon. These codes are not capable of correctly representing sets of codons where more than one of the bases is incompletely permuted. This is the case for the stop codons and for the residues with six synonymous codons. For instance, serine should back-translate into the codons TCT, TCC, TCA, TCG, AGT or AGC . These can be represented precisely as either TCN or AGY. The codon shown by BackTranslate for serine is WSX, which has eight permutations, six of which are correct and two of which are not!


[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

Minimal Syntax: % backtranslate [-INfile1=]ilvhiaa.pep -Default

Prompted Parameters:

-BEGin=1 -END=6         sets the range of interest
-MENu=A                 menu for the type of output, where:
                          A=table of all back-translations and most
                            probable sequence
                          B=table of all back-translations and most
                            ambiguous sequence
                          C=most probable sequence only
                          D=most ambiguous sequence only

[-INfile2=]ecohigh.cod    specifies the codon frequency table
[-OUTfile=]ilvhiaa.seq    names the output file

Local Data Files:

-TRANSlate=translate.txt  defines most ambiguous representation for
                            each codon family

Optional Parameters:

-WINdow=4                 shows probability of the preferred codons for
                            next 4 amino acids occurring together by chance


[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.

BackTranslate uses translate.txt to create the most ambiguous back-translation in your protein sequence. If the standard translation table does not apply to your sequence, you can provide an alternate file named translate.txt in your current working directory or use -TRANSlate=mycode.txt. Translation tables are discussed in more detail in Appendix VII.


[ Previous | Top | Next ]

You can set the parameters listed below from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


sets the beginning position for all input sequences.


sets the ending position for all input sequences.


indicates the type of output to create. -MENu=A produces a table of possible codons plus the most probable nucleic acid sequence. In order, the remaining three options are: a table of possible codons plus the most ambiguous sequence, the most probable sequence only, and the most ambiguous sequence only.


selects the codon frequency table to use when constructing the most probable sequence. You can select optional codon frequency tables to bias the results in favor of the codon usage in E. coli, human, drosophila, maize, yeast, and some other class of organisms.


Usually, translation is based on the translation table in a default or local data file called translate.txt. This parameter allows you to use a translation table in a different file. (See Appendix VII for information about translation tables.)


BackTranslate normally displays the probability of the preferred codons for the next four amino acids in the sequence, based on your codon frequency table. Use this parameter to set the number of codons used in the display to a number other than four.

Printed: December 9, 1998 16:29 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]

Documentation Comments:
Technical Support:

Copyright (c) 1982-2001 Genetics Computer Group, Inc. A subsidiary of Pharmacopeia, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group