GELENTER

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents
FUNCTION
DESCRIPTION
EXAMPLE
RELATED PROGRAMS
RESTRICTIONS
CONSIDERATIONS
SUGGESTIONS
COMMAND-LINE SUMMARY
LOCAL DATA FILES
PARAMETER REFERENCE

FUNCTION

[ Top | Next ]

GelEnter adds fragment sequences to a fragment assembly project. It accepts sequence data from your terminal keyboard, a digitizer, or existing sequence files.

DESCRIPTION

[ Previous | Top | Next ]

See the Fragment Assembly System (FAS) Introduction for an overview of working with the programs within the FAS to assemble sequences in a sequencing project.

GelEnter enters fragment sequences into the Fragment Assembly System. GelEnter accepts sequence data from the following: 1) your terminal keyboard; 2) a digitizer; or 3) existing sequence files using -ENTER.

GelEnter and SeqEd are essentially the same program. However, unlike SeqEd, GelEnter writes its output file to the fragment assembly project database rather than your local directory. For a complete description of GelEnter commands, see SeqEd.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using GelEnter to enter previously existing sequence files into the project database created in the example session for GelStart:


% gelenter -ENTER=mu*.seq

 Entering mu*.seq into the database.
 "mu10"  361 nucleotides
 "mu18"  42 nucleotides

 /////////////////

 "mu6"  296 nucleotides
 "mu9"  39 nucleotides

%

RELATED PROGRAMS

[ Previous | Top | Next ]

GelStart begins a fragment assembly session by creating a new fragment assembly project or by identifying an existing project. GelEnter adds fragment sequences to a fragment assembly project. It accepts sequence data from your terminal keyboard, a digitizer, or existing sequence files. GelMerge aligns the sequences in a fragment assembly project into assemblies called contigs. You can view and edit these assemblies in GelAssemble. GelAssemble is a multiple sequence editor for viewing and editing contigs assembled by GelMerge. GelView displays the structure of the contigs in a fragment assembly project. GelDisassemble breaks up the contigs in a fragment assembly project into single fragments.

GelEnter is SeqEd customized to run in the fragment assembly environment.

RESTRICTIONS

[ Previous | Top | Next ]

A contig may not contain more than 1,650 fragments and may not be longer than 200,000 bases. No single fragment may be longer than 2,500 bases.

CONSIDERATIONS

[ Previous | Top | Next ]

GelEnter doesn't allow you to enter two different fragments with the same name. This protects you from overwriting existing fragments in the database by accidentally reusing a name. If two files share the same file name but have different file extensions, GelEnter considers them to have the same name. (See Chapter 1, Getting Started in the User's Guide for more information on naming files.)

The heading documentation that appears above the sequence and the sequence comments are not preserved when fragments are entered into the fragment assembly database.

Sequence Symbols

GelEnter accepts any valid GCG sequence character (see Appendix III). GelMerge and GelAssemble recognize all IUB nucleotide ambiguity codes (see Appendix III) and the period (.) and tilde (~) as gap symbols for the generation of consensus sequences. All other sequence characters are treated as non-nucleotide symbols in GelMerge and GelAssemble.

SUGGESTIONS

[ Previous | Top | Next ]

If you are repeatedly entering an over-cloned fragment or cloning vector into the database, you can have GelEnter highlight the redundant fragment each time you begin to enter it. To do this, edit the redundant fragment with GelAssemble. Extract the sequence using GelAssemble's : SEQOUT command and then run GelStart with -VECtor, using the file specification of the extracted sequence. Similarly, GelStart's -SITes parameter causes GelEnter to highlight restriction sites that you have specified. This can help you to detect instances of rejoined sticky ends. GelEnter only highlights vector sequences and restriction sites when you are entering sequence data from either your terminal keyboard or a digitizer; it does not highlight sequences in existing sequence files entered using -ENTER.

Do not use GelEnter's : Write command to enter a sequence in the database that you wish to continue editing. Once the sequence is entered into the database, any further editing cannot be saved because you are not permitted to reenter a fragment already in the database. If you wish to modify a fragment that is already in the database, use GelAssemble. If you wish to replace a fragment in the database, first delete that fragment from the database using GelAssemble's : ERASE command, and then reenter the new version using GelEnter.

Because GelEnter does not reedit a fragment that has already been entered, it is advisable to enter, without interruption, as much of the fragment sequence as possible. If you exit GelEnter in the middle of entering a sequence, you can either enter the remaining sequence as a different fragment or use GelAssemble to enter the remaining sequence. GelAssemble can modify existing sequences only from the keyboard.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % gelenter [-INfile1=]mu*.seq

Prompted Parameters: None

Local Data Files:

set.keys (must be in your current working directory to be used)

Optional Parameters:

-ENTER=mu*.seq        enters existing files into the database
  -STAden               enters existing Staden format files into the
                          database
  -FASTA                enters existing FASTA format files into the
                          database
-SINGlecommand        automatically returns to screen mode after each command
-PERFect              sets find to search for perfect symbol matches
-VECtors=gb:synpbr322 highlights sequences from pBR322
-SITes=gaattc         highlights GAATTC patterns
-LANes=g,A,T,C        sets lane order for digitizer
-MINOverlap=10        sets minimum overlap length for Reload command
-PCTOverlap=95        sets stringency for the Reload command
-TOLerance=0.4        sets tolerance for digitizing ambiguity (0 to 1),
                        with 1 being the most tolerant

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.

Customizing Your Keyboard With SetKeys

You can use the program SetKeys to create a set.keys file that tells the SeqEd, GelEnter, LineUp, GelAssemble, and SeqLab sequence editors how to interpret the letters you type at the terminal. When entering gel readings, it is useful to have the symbols for G, A, T, and C under the fingers of one hand in the same positions as the lanes in your gel. SeqEd, GelEnter, LineUp, GelAssemble, and the SeqLab sequence editor automatically read the file set.keys if it is present in your local directory. If set.keys is absent, or if the sequence type is set to Protein (in SeqEd and LineUp only) the terminal keys retain their conventional meanings.

If you have a set.keys file in your directory, SeqEd, GelEnter, LineUp, and GelAssemble only respond to the keys that it redefines. You can edit the file set.keys with a text editor if some of the keys you want to use are not in it. Any keys not mentioned in set.keys appear to be dead in these sequence editors. In the SeqLab sequence editor, keys that are not redefined retain their normal meanings.

Several keys are vital for the control of SeqEd, LineUp, GelEnter, and GelAssemble; this means you are not allowed to redefine the keys for /, [, ], {, }, (, ), :, ,, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, <Ctrl>R, <Ctrl>D, <Ctrl>H, <Return>, and <Ctrl>E.

PARAMETER REFERENCE

[ Previous | Top | Next ]

You can set the parameters listed below from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-ENTER=filename

enters an already existing sequence file into the database. You can enter several sequences into the database at once by specifying a list file or sequence specification using an asterisk (*) wildcard. (See Chapter 2, Using Sequence Files and Databases in the User's Guide for help in naming a group of sequences.)

-STAden

allows input sequence files to be in Staden format instead of GCG format. You use this parameter only with -ENTER=filename

-FASTA

allows input sequence files to be in FastA format instead of GCG format. You use this parameter only with -ENTER=filename

-SING lecommand

sets GelEnter to return automatically to Screen Mode after every command in Command Mode.

-PERFect

makes pattern searches use perfect symbol matches. Normally if you type /GARC in Screen Mode, the patterns GAAC or GAGC could be found. If you use -PERFect, /GARC would only find the pattern GARC. This also makes GelEnter treat sequences as linear and not find patterns that start at the end and continue into the beginning of the sequence.

-VECtors=GB:SynpBR322

tells GelEnter which cloning vector or vectors are of interest to you. When you are entering a sequence from your terminal keyboard or a digitizer, GelEnter checks the sequence against the vectors of interest to make sure you are not entering a vector sequence. If GelEnter finds that you are entering vector sequence, the terminal bell rings and the vector sequence characters are highlighted with reverse video.

Normally, you specify in GelStart the cloning vectors you want to highlight. Specifying them in GelEnter allows you to override those choices.

-SITes=GAATTC,genetic

tells GelEnter to highlight enzyme recognition sites that interest you when you are entering a sequence from your terminal keyboard or a digitizer.

Normally, you specify in GelStart the enzyme recognition sites you want to highlight. Specifying them in GelEnter allows you to override those choices.

The following parameters affect entering sequences in GelEnter using a digitizer. For a complete description of digitizer use and commands, see SeqEd.

-LANes=A,C,G,T

establishes the default left-to-right order of gel lanes. The default may be over-ridden when you issue a DIGitizer command in Command Mode.

-MINOverlap=10

sets the minimum overlap length regarded as meaningful by the RELoad command. GelEnter ignores matches shorter than this, even if they are perfect. However, you are always free to end a reload with the ACCept command.

-PCTOverlap=95

sets the minimum percentage of matching bases regarded as meaningful by the RELoad command. In Reload Mode, when the overlap is long enough and good enough, the terminal bell rings to alert you. Again, you have complete freedom to reject or ACCept GelEnter's opinion.

-TOLerance=0.4

sets the tolerance for digitizing. When digitizing, the program must determine which base lane the sonic pen has touched. Since the gel lane may bend, the program must have some tolerance for deviation. The tolerance value determines how great this deviation can be before you must redefine your lanes. A tolerance of 0 is the least tolerant setting and the slightest deviation would require you to redefine your lanes. A tolerance of 1.0 is the most tolerant setting such that any deviation is accepted. Based on our limited experience, you should not use a tolerance value less than 0.25 or greater than 0.6. The default value (0.4) was chosen because it has seldom made an incorrect assignment and does not require you to redefine the lanes too frequently. The algorithm employed is that of Staden (Nucl. Acids Res., 14; 217 (1986)).

Printed: December 9, 1998 16:25 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]


Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Copyright (c) 1982-2001 Genetics Computer Group, Inc. A subsidiary of Pharmacopeia, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group

www.gcg.com