FROMSTADEN

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents
FUNCTION
DESCRIPTION
EXAMPLE
OUTPUT
INPUT FILES
RELATED PROGRAMS
RESTRICTIONS
COMMAND-LINE SUMMARY
LOCAL DATA FILES
PARAMETER REFERENCE

FUNCTION

[ Top | Next ]

FromStaden changes a sequence from Staden format into GCG format. If the file contains a nucleotide sequence, the ambiguity codes are converted as shown in Appendix III of the Program Manual.

DESCRIPTION

[ Previous | Top | Next ]

Any sequence created with Staden programs can be converted with FromStaden into a format suitable for use with Wisconsin Package(TM) programs. All of the compatible ambiguity codes are converted. If more than one contig is present in the Staden file, then all of the contigs are concatenated into a single sequence. The contig markers in the Staden sequences are retained in the heading of the GCG output file. In the example below, the 11th to the 29th bases contain all of the Staden ambiguity codes. You can see how they are converted in Appendix III or in the example below. If the sequence is a protein sequence then no conversion is made.

The command % seqformat Staden sets a global switch to make Wisconsin Package programs accept sequences in Staden format without running FromStaden. (See "Using Global Switches" in Chapter 3, Using Programs of the User's Guide.) Use the FromStaden program only to convert sequences that you wish to keep in GCG format.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using FromStaden to convert the Staden file origin.sdn into a GCG-format file:


% fromstaden

  FROMSTADEN of what Staden sequence file ?  origin.sdn

  What should I call the output file (* origin.seq *) ?

       origin.seq    202 bp

%

OUTPUT

[ Previous | Top | Next ]

Here is the complete output of the file origin.seq:


 FROMSTADEN of: origin.sdn  check: 8676  from: 1  to: 202

 <---ORIGIN.001----->

origin.seq  Length: 202  September 30, 1998 12:49  Type: N  Check: 8676  ..

       1  ATGGATCCTA RYWSMKHBVD N.ctagmkws .CTGAGGAGG AGATTCACTT

      51  GTTTAGAGGC TGGGAGTGGT GGCTCACGCC TGTAATCCCA GAATTTTGGG

     101  AGGCCAAGGC AGGCAGATCA CCTGAGGTCA AGAGTTCAAG ACCAACCTGG

     151  CCAACATGGT GAAATCCCAT CTCTACAAAA ATACAAAAAT TAGACAGGCA

     201  TG

INPUT FILES

[ Previous | Top | Next ]

FromStaden accepts a single sequence file created with the Staden Package of sequence analysis programs as input. If multiple contigs are present in the Staden file, all of the contigs are concatenated into a single sequence in the output. Here is the input file used for the example above:


<---ORIGIN.001----->ATGGATCCTARYWSMKHBVDN-12345678*CTGAGGAGG
AGATTCACTTGTTTAGAGGCTGGGAGTGGTGGCTCACGCCTGTAATCCCAGAATTTTGGG
AGGCCAAGGCAGGCAGATCACCTGAGGTCAAGAGTTCAAGACCAACCTGGCCAACATGGT
GAAATCCCATCTCTACAAAAATACAAAAATTAGACAGGCATG

When FromStaden writes GCG sequence files, it assigns the sequence type based on the composition of the sequence characters. This method is not fool-proof, so to ensure that the output files are written with the correct sequence type, use -PROtein or -NUCleotide on the command line when running FromStaden.

If FromStaden is run interactively, you can watch the program monitor to see if the sequences are assigned the correct type. As each new file is written, its name and the number of bases (bp) or amino acids (aa) appears on the screen. If the wrong abbreviation appears (for example, bp appears for a protein sequence), the sequence file was assigned the wrong type. The sequence type also appears in the sequence file. Look on the last line of the text heading just above the sequence itself for Type: N or Type: P.

If the sequence type was incorrectly assigned, see Appendix VI for information on how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

The following programs convert sequences between other formats and GCG format: FromEMBL, FromGenBank, FromIG, FromPIR, FromStaden, FromFastA, ToIG, ToPIR, ToStaden and ToFastA.

DataSet creates a GCG data library from any set of sequences in GCG format. GCGToBLAST creates a database that can be searched by the BLAST program from any set of sequences in GCG format.

RESTRICTIONS

[ Previous | Top | Next ]

Staden nucleotide ambiguity codes are not all strictly comparable to IUB-IUPAC ambiguity codes (see Appendix III). If contigs are present in the Staden file, all of the contigs are concatenated into a single sequence. If this is not what you want, put the contigs into separate files with a text editor and run FromStaden on each of them individually. The contig comments cannot be more than 130 characters long.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % fromstaden [-INfile1=]origin.sdn -Default

Prompted Parameters:

[-OUTfile1=]origin.seq  names the output file

Local Data Files: None

Optional Parameters:

-PROtein                insists that the input sequence is a protein
-NUCleotide             insists that the input sequence is a nucleic acid
-NOMONitor              suppresses the screen trace for each output sequence

LOCAL DATA FILES

[ Previous | Top | Next ]

None.

PARAMETER REFERENCE

[ Previous | Top | Next ]

You can set the parameters listed below from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-PROtein and -NUCleotide

set the program to expect either protein or nucleic acid sequences, respectively. Normally, FromStaden determines whether an input sequence is protein or nucleic acid by looking at its composition. If the first 300 alphanumeric characters in a sequence are composed entirely of Staden nucleotide codes (see Appendix III), it is reformatted as a nucleic acid sequence in GCG format; otherwise it is reformatted as a protein sequence. Using these command-line parameters, you can insist that your sequences are proteins (-PROtein) or nucleic acids (-NUCleotide).

-MONitor

This program normally monitors its progress on your screen. However, when you use -Default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.

Printed: December 9, 1998 16:27 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]


Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Copyright (c) 1982-2001 Genetics Computer Group, Inc. A subsidiary of Pharmacopeia, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group

www.gcg.com