SIMPLIFY

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents
FUNCTION
DESCRIPTION
EXAMPLE
INPUT FILES
RELATED PROGRAMS
SIMPLIFICATION FILE
COMMAND-LINE SUMMARY
LOCAL DATA FILES
PARAMETER REFERENCE

FUNCTION

[ Top | Next ]

Simplify lets you reduce the number of symbols in a sequence. Such a simplification would allow you, for instance, to treat all hydrophobic amino acids as equivalent.

DESCRIPTION

[ Previous | Top | Next ]

Scientists searching for the basic design features in protein sequences believe that there may be functionally similar amino acids that can be substituted without causing radical changes in the function of the protein. Therefore, it may be useful to treat some amino acids as equivalent in peptide sequence comparisons. The simplifications below are from Dr. Miguel A. Jimenez-Montano, who worked with Dr. Hugo Martinez at the University of California in San Francisco, and is now at Univ. de las Americas-Puebla (Mexico). You can determine your own simplification by changing the local data file simplify.txt. Here are the default simplifications in the public data file.


           A  =  P,A,G,S,T    (neutral, weakly hydrophobic)
           D  =  Q,N,E,D,B,Z  (hydrophilic, acid amine)
           H  =  H,K,R        (hydrophilic, basic)
           I  =  L,I,V,M      (hydrophobic)
           F  =  F,Y,W        (hydrophobic, aromatic)
           C  =  C            (cross-link forming)
           All other characters are unchanged.

The simplify.txt file in the public data directory is only appropriate for simplifying peptide sequences. You must create your own simplify.txt file to define equivalences for nucleic acid simplifications.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using Simplify to make a simplification of gzeinaa.pep:


% simplify

  SIMPLIFY what sequence(s) ?  gzeinaa.pep

               Begin (* 1 *) ?  18
             End (*   285 *) ?  243

  What should I call the output file (* gzeinaa.sim *) ?

%

INPUT FILES

[ Previous | Top | Next ]

Simplify accepts a single sequence or multiple sequences as input. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenEMBL:*. The function of Simplify depends on whether your input sequence(s) are protein or nucleotide. Programs determine the type of a sequence by the presence of either Type: N or Type: P on the last line of the text heading just above the sequence. If your sequence(s) are not the correct type, see Appendix VI for information on how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

CompTable writes a scoring matrix based on the simplifications from a simplification file like simplify.txt. You can assign match and mismatch values.

SIMPLIFICATION FILE

[ Previous | Top | Next ]

You can use Fetch to make a copy of simplify.txt in your own directory, and then modify it with an editor to suit your own needs. Here is the default version:


!!SIMPLIFY 1.0
A standard simplification used by SIMPLIFY and WORDSEARCH to simplify
peptide sequences.  The first line below means "for all of the P, A, G,
S, or T characters in the sequence, substitute A." The program COMPTABLE
can construct a symbol comparison table with the equivalences from this
file.

10/7/84 ..

A PAGST
D QNEDBZ
H HKR
I LIVM
F FYW
C C

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % simplify [-INfile=]ggamma.pep -Default

Prompted Parameters: (for a single sequence)

-BEGin=1 -END=444          sets the range of interest
[-OUTfile=]ggamma.sim      names the output file

Local Data Files:

-DATa=simplify.txt         specifies a file of equivalences

Optional Parameters:

-EXTension=.sim            sets the default output file name extension
-LIStfile[=simplify.list]  writes a list file of output sequence names
-NOMONitor                 suppresses the screen trace

The default simplification is as follows:

           A  =  P,A,G,S,T    (neutral, weakly hydrophobic)
           D  =  Q,N,E,D,B,Z  (hydrophilic, acid amine)
           H  =  H,K,R        (hydrophilic, basic)
           I  =  L,I,V,M      (hydrophobic)
           F  =  F,Y,W        (hydrophobic, aromatic)
           C  =  C            (cross-link forming)
           All other characters are unchanged.

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.

Simplify reads the file simplify.txt to find the equivalences you desire. The first letter in each equivalence row is the letter that is substituted for all of the rest of the letters in the row.

The simplify.txt file in the public data directory is only appropriate for simplifying peptide sequences. You must create your own simplify.txt file to define equivalences for nucleic acid simplifications.

PARAMETER REFERENCE

[ Previous | Top | Next ]

You can set the parameters listed below from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-EXTension=.sim

sets the default file output file name extension.

-LIStfile=simplify.list

writes a list file with the names of the output sequence files. This list file is suitable for input to other Wisconsin Package programs that support list files (see Chapter 2, Using Sequence Files and Databases in the User's Guide.) If you don't specify a file name, then Simplify makes one up using simplify for the file name and .list for the file name extension.

-MONitor

This program normally monitors its progress on your screen. However, when you use -Default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.

Printed: December 9, 1998 16:29 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]


Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Copyright (c) 1982-2001 Genetics Computer Group, Inc. A subsidiary of Pharmacopeia, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group

www.gcg.com