[ Program Manual | User's Guide | Data Files | Databases ]

Sequence Typing

As you work with the Wisconsin Package, you will find that some programs accept only nucleotide sequences while others accept only proteins. Many programs allow both nucleotide and protein sequences as input but perform their analysis differently depending on the input sequence type.

You can determine the type of a sequence by looking at the sequence file. Sequences in GCG format contain a dividing line between an optional text heading and the sequence data. Consider the following example of a typical dividing line:

Gamma.Seq Length: 11375 January 1, 1997 10:09 Type: N Checksum: 6474 ..

The sequence type should appear on the dividing line as either Type: N for nucleotide or Type: P for protein. (See "Types of Sequence Files" in Chapter 2, Using Sequence Files and Databases of the User's Guide for a complete description of sequence file formats.) Sequences created before version 7.0 of the Wisconsin Package (April 1991) do not have this Type: field on the dividing line. If the dividing line doesn't contain a Type: field, the Wisconsin Package infers the sequence type from the characters in the sequence. This inference may not always be correct.

In previous versions of the Wisconsin Package, you could ensure that programs inferred the correct sequence type by specifying the sequence type on the command line when you ran a program. However, starting with Version 8.0 of the Package, the sequence type is now an inherent part of the sequence; it cannot be changed from the command line.

If the Type: field of any sequence is incorrect or missing, you can correct it with the Reformat program. Type

% reformat /NUCleotide filename or

% reformat /PROtein filename

For more information, see the Reformat documentation in the Program Manual. ("Specifying Sequence Type" in Chapter 2, Using Sequence Files and Databases of the User's Guide also details how to change the sequence type.)

Printed: December 9, 1998 16:22 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]

Documentation Comments:
Technical Support:

Copyright (c) 1982-2001 Genetics Computer Group, Inc. A subsidiary of Pharmacopeia, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group