FROMIG

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents
FUNCTION
DESCRIPTION
EXAMPLE
OUTPUT
INPUT FILES
RELATED PROGRAMS
CONSIDERATIONS
COMMAND-LINE SUMMARY
LOCAL DATA FILES
PARAMETER REFERENCE

FUNCTION

[ Top | Next ]

FromIG reformats one or more sequences from IntelliGenetics format into individual files in GCG format.

DESCRIPTION

[ Previous | Top | Next ]

Use FromIG when you want to move sequences being used or assembled with IntelliGenetics software into a format suitable for use with programs in the Wisconsin Package(TM). Since IG software maintains many sequences in one file, FromIG must write many output files, one for each sequence in the IG file. Each output file is named according to the identifier word just above the sequence data in the IG file. All the documentation from the IntelliGenetics input file is preserved in the GCG output files. If an IG sequence is circular, the GCG sequence file says (circular sequence) just above the dividing line.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using FromIG to convert the IG-format sequences in urchin.nih into separate files in GCG format:


% fromig

  FROMIG of what IntelliGenetics sequence file(s) ?  urchin.nih

  surphist1   788 bp
  surphist2   188 bp
  surphist3   159 bp

  ///////////////////

  surshist2   682 bp

  Finished FROMIG with 12 files written.
  6418 bases were reformatted.

%

OUTPUT

[ Previous | Top | Next ]

Here is part of the first output file, surphist1, from the example above:


 FROMIG of: urchin.nih

 definition  sea urchin(p.mil.) histone genes; h4 gene. 788bp
 locus       surphist1       788 bp                  updated   11/01/82
 segment     1 of 9

 //////////////////////////////////////////////////////////////////////

 composition:   180 a   158 c   198 g   129 t   123 n
 total:         788 nucleotides.

 (circular sequence)
Surphist1.  Length: 788  September 29, 1998 17:59  Type: N  Check: 6642  ..

       1  CAACATATTA GAGGAAGGGA GAGAGAGAGA GAGAGAGAGA GAGAGAGAGA

      51  GGGGGGGGGG GAGGGAGAAT TGCCCAAAAC ACTGTAAATG TAGCGTTAAT

    ////////////////////////////////////////////////////////////

     701  NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNGGCCGAAC ACTGTACGGC

     751  TTCGGCGGCT AAGTGAAGCA GACTTGGCTA GAATAACG

INPUT FILES

[ Previous | Top | Next ]

FromIG accepts multiple (one or more) files containing sequences in IG format as input. You can specify multiple input files as a file of file names, for example @igseqs.list, or by using a file specification with an asterisk (*) wildcard, for example ig*.seq. Each input file may contain multiple (one or more) sequences. Here is part of the input file used for the example above (the number 2 appears at the end of circular sequences in IG format):


; DEFINITION  SEA URCHIN(P.MIL.) HISTONE GENES; H4 GENE. 788BP
; LOCUS       SURPHIST1       788 BP              UPDATED   11/01/82
; SEGMENT     1 OF 9

////////////////////////////////////////////////////////////

; Composition:   180 A   158 C   198 G   129 T   123 N
; Total:         788 nucleotides.
SURPHIST1
CAACATATTAGAGGAAGGGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGGGGGGGGGG
GAGGGAGAATTGCCCAAAACACTGTAAATGTAGCGTTAATGAACTTTTCATCTCATCGAC

////////////////////////////////////////////////////////////

NNNNNNNNNNNNGGCCGAACACTGTACGGCTTCGGCGGCTAAGTGAAGCAGACTTGGCTA
GAATAACG2
; DEFINITION  SEA URCHIN(P.MIL.) HISTONE GENES; PARTIAL SPACER. 188BP
; LOCUS       SURPHIST2       188 BP               UPDATED   11/01/82

/////////////////////////////////////////////////////////////////////

When FromIG writes GCG sequence files, it assigns the sequence type based on the composition of the sequence characters. This method is not fool-proof, so to ensure that the output files are written with the correct sequence type, use -PROtein or -NUCleotide on the command line when running FromIG.

If FromIG is run interactively, you can watch the program monitor to see if the sequences are assigned the correct type. As each new file is written, its name and the number of bases (bp) or amino acids (aa) appears on the screen. If the wrong abbreviation appears (for example, bp appears for a protein sequence), the sequence file was assigned the wrong type. The sequence type also appears in the sequence file. Look on the last line of the text heading just above the sequence itself for Type: N or Type: P.

If the sequence type was incorrectly assigned, see Appendix VI for information on how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

The following programs convert sequences between other formats and GCG format: FromEMBL, FromGenBank, FromIG, FromPIR, FromStaden, FromFastA, ToIG, ToPIR, ToStaden and ToFastA.

DataSet creates a GCG data library from any set of sequences in GCG format. GCGToBLAST creates a database that can be searched by the BLAST program from any set of sequences in GCG format.

CONSIDERATIONS

[ Previous | Top | Next ]

As of IntelliGenetics Release 5.3, IntelliGenetics programs use only IUB-IUPAC nucleotide ambiguity codes. Prior to Release 5.3, IntelliGenetics programs used the Stanford ambiguity codes. The GCG program FromIG assumes that sequence files in IntelliGenetics format contain only IUB-IUPAC sequence symbols and will not perform any symbol conversion.

If there is no identifier above the sequence entry in the IntelliGenetics file, the sequence is written into a file called scratch.fromig.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % fromig [-INfile=]urchin.nih -Default

Prompted Parameters: None

Local Data Files: None

Optional Parameters:

-PROtein                 insists that the input sequences are proteins
-NUCleotide              insists that the input sequences are nucleic acids
-LIStfile[=fromig.list]  writes a list file of output sequence names

LOCAL DATA FILES

[ Previous | Top | Next ]

None.

PARAMETER REFERENCE

[ Previous | Top | Next ]

You can set the parameters listed below from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-PROtein and -NUCleotide

set the program to expect protein or nucleic acid sequences, respectively. Normally, FromIG determines whether an input sequence is protein or nucleic acid by looking at its composition. If the first 300 alphabetic characters in a sequence are composed entirely of IUB-IUPAC nucleotide codes (see Appendix III), it is reformatted as a nucleic acid sequence in GCG format; otherwise it is reformatted as a protein sequence. Using these command-line parameters, you can insist that your sequences are proteins (-PROtein) or nucleic acids (-NUCleotide).

-LIStfile=fromig.list

writes a list file with the names of the output sequence files. This list file is suitable for input to other Wisconsin Package programs that support list files (see Chapter 2, Using Sequence Files and Databases in the User's Guide.) If you don't specify a file name, then FromIG makes one up using fromig for the file name and .list for the file name extension.

Printed: December 9, 1998 16:27 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]


Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Copyright (c) 1982-2001 Genetics Computer Group, Inc. A subsidiary of Pharmacopeia, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group

www.gcg.com