GELMERGE

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents
FUNCTION
DESCRIPTION
EXAMPLE
OUTPUT
RELATED PROGRAMS
RESTRICTIONS
ALGORITHM
CONSIDERATIONS
SUGGESTIONS
COMMAND-LINE SUMMARY
ACKNOWLEDGEMENTS
LOCAL DATA FILES
PARAMETER REFERENCE

FUNCTION

[ Top | Next ]

GelMerge aligns the sequences in a fragment assembly project into assemblies called contigs. You can view and edit these assemblies in GelAssemble.

DESCRIPTION

[ Previous | Top | Next ]

See the Fragment Assembly System (FAS) Introduction for an overview of working with the programs within the FAS to assemble sequences in a sequencing project.

GelMerge takes unassembled fragment sequences in your sequencing project database and creates complete assemblies called contigs. Each contig is a multiple alignment of contiguous, overlapping sequences in the project database. These contigs are saved in your project database, where you can review and edit them with GelAssemble. As you add sequences that connect separate contigs to the project database, GelMerge aligns the separate contigs into larger assemblies. Ultimately, all fragments may be assembled into a single contig that represents the underlying genomic sequence from which all the fragments were derived.

You can use GelMerge to assemble the fragments in your sequencing project database into aligned contigs in several ways. You may choose to: 1) preserve the relationships among all the fragments in existing contigs as you align those contigs into larger assemblies; 2) reassemble the entire project from the original gel readings, disregarding all sequence edits and assemblies you've previously created; or 3) reassemble the entire project from the individual edited fragments, retaining all edits you have made (base insertions, deletions, and substitutions) but disregarding all of the assemblies you've previously created.

In GelStart, you can specify cloning vectors used to isolate the sequenced fragments. In GelMerge, you can identify and optionally remove vector sequences from the working copies of the fragment sequences in your project database; the original sequences entered into the project database are unaffected.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using GelMerge to assemble contigs from the fragments in "myproject", which was created in the example sessions of GelStart and GelEnter.


% GelMerge -MINIdentity=13

 What word size (* 7 *) ?

 What fraction of the words in an overlap must match (* 0.80 *) ?

 What is the minimum overlap length (* 14 *) ?

   Reading ............

 Comparing ............

  Aligning .........

   Writing ...

          Input Contigs:         12
         Output Contigs:          3

               CPU time:      02.29 (seconds)

$

OUTPUT

[ Previous | Top | Next ]

GelMerge modifies your sequencing project database. The contigs assembled by GelMerge are saved into your project database where you can review and modify them using the other Fragment Assembly programs.

FAS never modifies the files in the project archival directory. This allows you to always recover the original gel readings.

RELATED PROGRAMS

[ Previous | Top | Next ]

GelStart begins a fragment assembly session by creating a new fragment assembly project or by identifying an existing project. GelEnter adds fragment sequences to a fragment assembly project. It accepts sequence data from your terminal keyboard, a digitizer, or existing sequence files. GelMerge aligns the sequences in a fragment assembly project into assemblies called contigs. You can view and edit these assemblies in GelAssemble. GelAssemble is a multiple sequence editor for viewing and editing contigs assembled by GelMerge. GelView displays the structure of the contigs in a fragment assembly project. GelDisassemble breaks up the contigs in a fragment assembly project into single fragments.

RESTRICTIONS

[ Previous | Top | Next ]

The total length of all fragment sequences entered into GelMerge cannot exceed 380,000 bases. If you want to assemble your sequencing project with GelMerge, the total length of all consensus sequences in the project database is therefore limited to 380,000. Contigs longer than 100,000 bases cannot be assembled with any other contigs. However, two contigs that are each shorter than 100,000 bases can be assembled into a contig that is longer than 100,000 bases.

You can enter a maximum of 1650 fragments into GelMerge. No fragment entered into GelMerge can be longer than 2500 bases. In creating alignments, no fragment can have more than 100 gap insertions in a single session with GelMerge.

In an overlap, the part of each contig with fragments that at least partially span the overlap can be no longer than 5,000 bases. This limit is set high enough that you should never encounter it in normal sessions with GelMerge, regardless of the size of the sequencing project.

The surface of comparison (see the CONSIDERATIONS topic for an explanation) is limited to 500,000. Again, this limit is set high enough that you should never encounter it in normal sessions with GelMerge.

ALGORITHM

[ Previous | Top | Next ]

General Method

Each fragment in the sequencing project database is part of a contig. A contig contains either a group of aligned sequences associated together or a single, unaligned fragment called either a contig-of-one or a single-fragment contig. GelMerge finds the two contigs with the longest overlap and then aligns them to assemble a single contig. The program then finds the next two contigs with the longest overlap (possibly involving the just-assembled contig) and aligns them to assemble a single contig. GelMerge repeats this process of overlap determination and contig assembly until there are no remaining overlaps among the contigs in the project database. The result of GelMerge may either be a single contig (if all contigs overlap to form a contiguous assembly) or several contigs (if none of the remaining contigs share significant overlap). As you add new fragments to the sequencing project database, they may connect separate contigs to form larger assemblies.

GelMerge allows you to modify several parameters that affect the speed and sensitivity of the overlap determination. Additionally, you can modify parameters affecting the alignment process in contig assembly and the speed and sensitivity of vector recognition. Most of these modifiable parameters are discussed in the remaining algorithm discussion.

Finding Overlaps

The following discussion uses the concept of a diagonal, which is a register of comparison between two sequences. For any two positions in the two sequences, you can calculate their diagonal as the position in the second sequence minus the position in the first sequence. For instance, if a block of 5 sequential identities begins at position 412 in the second sequence and position 1 in the first sequence, then the block is located on diagonal 411 (412 - 1 = 411, 413 - 2 = 411, ..., 416 - 5 = 411). The comparison surface includes all possible diagonals in a comparison of two sequences.

In finding overlaps among the contigs, GelMerge represents each contig by its consensus sequence. GelMerge uses a modification of the approximate alignment procedure of Wilbur and Lipman (SIAM J. Appl. Math. 44; 557-567 (1984)) to determine the amount of overlap between any two consensus sequences. This procedure creates alignments from short blocks of contiguous sequence identities. The program places gaps between blocks, but not within blocks. You can set the minimum length for each short block of identities in response to the What word size program prompt (default length is 7).

In GelMerge, each alignment must contain at least one long block of contiguous sequence identities. You can set the minimum length for each long block with -MINIdentity (default length is 14). The requirement for at least one long block of identities allows the program to exclude trivial overlaps from consideration. This requirement also limits the extent of gapping permitted in the approximate alignment. The alignment cannot get out of phase by more than 10 registers of comparison from diagonals containing long blocks of sequence identities. (Use -MAXGap to adjust this default of 10 registers of comparison.) This effectively prevents the alignment from wandering too far away from registers of comparison with significant sequence identity.

GelMerge determines all of the possible distinct alignments between the two contig consensus sequences being compared. Each alignment corresponds to a different overlap between the same two contig consensus sequences. An alignment does not necessarily extend over the entire length of the corresponding overlap. For example, the length of the alignment in the following figure is the length from B to C while the length of the overlap is the length from A to D.

After finding all of the distinct alignments between a pair of contigs, GelMerge counts the number of exact nucleotide matches in the best alignment (alignment with the most nucleotide matches). By default, if this number of identities is at least 80% of the length of the corresponding overlap, the program saves the position and length of this overlap. (You can changes this default in response to the What fraction of the words in an overlap must match program prompt.) If this overlap does not meet the identity criterion, the next best alignment is checked. For example, in the figure below, although the alignment for overlap A has the greatest number of sequence identities, it does not meet the default identity criterion. The alignment for overlap B, containing fewer sequence identities, does meet the default identity criterion. In this case, GelMerge selects B as the overlap between these two contigs.

If no alignment of two contig consensus sequences meets the identity criterion, then the two contigs do not overlap. After GelMerge finds overlaps among all the contigs, it precisely aligns the two contigs with the longest overlap to assemble a single contig. The approximate alignments used to determine overlaps are not used to create final contig assemblies. More precise alignments, involving all of the sequences in both contigs, and without gap size limitations, are used to create the actual assemblies from overlapping contigs.

Aligning Contigs

Once GelMerge determines the pair of contigs with the longest overlap (see above), it aligns them using the method of Needleman and Wunsch (J. Mol. Biol. 48; 443-453 (1970)). This method, originally used to align individual sequences, has been extended for use with contigs of aligned sequences. For a pairwise alignment of individual sequences, the comparison score between any two sequence symbols is found in a scoring matrix (see the LOCAL DATA FILES topic for more information). For a pairwise alignment of contigs of aligned sequences, the comparison score between any two positions in those contigs is the arithmetic average of the scores for all possible symbol comparisons at those positions. When the program inserts gaps into a contig to produce an alignment, they are inserted at the same position in all of the sequences of the contig.

Recognizing (and Optionally Removing) Vector Sequences

GelMerge searches for vector sequences in single-fragment contigs using a two-step approach. First, GelMerge finds approximate alignments between vector and contig sequences using a modification of the method of Wilbur and Lipman (described above in "Finding Overlaps"). You can fine tune the vector searching by adjusting some parameters of the approximate alignment procedure. For vector searching, the minimum length for each short block of sequence identities is the same as the length used to find overlaps among the contigs; you can set the minimum length in response to the What word size program prompt (default length is 7). The minimum length of each long block of sequence identities in vector searching can be set with -VECTORMINIdentity (default length of 12). The alignment cannot get out of phase by more than 5 registers of comparison from diagonals containing long blocks of sequence identities. (This default of 5 registers of comparison can be adjusted with -VECTORMAXGap.)

Each of the approximate alignments indicates the position of possible vector sequences in the contig. In the second step of vector recognition, GelMerge refines these alignments using the method of Smith and Waterman (Advances in Applied Mathematics 2; 482-489 (1981)). By default, if the aligned portions of the contig and vector sequences share greater than 80% sequence identity, the contig bases in the alignment become candidates for excision. (You can adjust this value using -VECTORSTringency.) Additionally, by default, the vector sequences must begin within 12 bases of either end of the contig in order to be excised. (This value is the same as the minimum length for each long block; you can adjust it with -VECTORMINIdentity.)

CONSIDERATIONS

[ Previous | Top | Next ]

Sequence Symbols

GelEnter accepts any valid GCG sequence character (see Appendix III). GelMerge and GelAssemble recognize all IUB nucleotide ambiguity codes (see Appendix III) and the period (.) and tilde (~) as gap symbols for the generation of consensus sequences. All other sequence characters are treated as non-nucleotide symbols in GelMerge and GelAssemble.

Consensus Sequences

Each contig in a sequencing project database is associated with a contig consensus sequence. GelMerge creates a new consensus sequence for each assembled contig and stores it in the project database. In a contig consensus sequence, the consensus symbol at any position in the contig is the most common symbol among all of the contig fragments at that position. GelMerge treats IUB nucleotide ambiguity symbols (described in Appendix III) in the fragment sequences as weighted representations of their constituent bases in order to generate a consensus. For example, an R represents half A and half G. If there is no absolute plurality at a position (that is, two or more unambiguous bases are tied), then those bases tied for plurality are used to generate an IUB nucleotide ambiguity symbol for the consensus. If the most common symbol at a position is a gap character (. or ~), then the consensus contains a gap character at that position. (The automatic consensus generation function in GelAssemble follows these same rules.)

You can edit the consensus sequence associated with each contig in the database with GelAssemble. GelMerge does not use these consensus sequences to find overlaps, but rather determines a new consensus sequence for each contig as described below in "Sequence Simplification". Therefore, any edits you've made to any consensus sequences in GelAssemble are ignored in subsequent sessions with GelMerge.

Sequence Simplification

In finding overlaps among the contigs, GelMerge represents each contig by a simplified consensus sequence, containing only the symbols G, A, T, and C. The consensus symbol at any position in the contig is simply the most common unambiguous sequence symbol among all of the fragments at that position. As previously described, GelMerge treats IUB nucleotide ambiguity symbols in the fragment sequences as weighted representations of their constituent bases in order to generate a consensus. If there is no absolute plurality at a position in the consensus, meaning that two or more unambiguous bases are tied, then the consensus symbol chosen is the one that is present at the highest frequency in GenBank. The relative frequency of bases in GenBank is A>T>G>C. For example, if T and C were tied, the consensus symbol chosen for that position would be T. Gap characters (. and ~) are ignored so they are never found in the simplified consensus sequence.

Similarly, in the first step of vector recognition, GelMerge represents each vector and fragment sequence as a simplified sequence containing only the symbols G, A, T, and C. The simplification of ambiguous bases is achieved as described above.

Simplified sequences are used only in the preliminary steps of both contig assembly and vector recognition. Once overlaps are found, the final alignment of two overlapping contigs into a single assembly uses all of the sequence information from all of the fragments in both contigs. In vector recognition, the final alignments, which are both listed in the Report file and used to determine vector sequence excisions, are created using the original vector and fragment sequences.

Surface of Comparison

GelMerge performs a series of pairwise alignments between clusters of fragments (contigs) to create the final contigs. Normally, each pairwise alignment requires enough computer memory for a surface of comparison proportional to the product of the lengths of the two contigs being aligned. However, since GelMerge limits the alignment to the region of overlap between the contigs, a much smaller surface of comparison is required. The amount of memory allocated for the surface of comparison in GelMerge should be sufficient for the automated assembly of almost all sequencing projects.

Memory Requirement

GelMerge is shipped with the Wisconsin Package(TM) so that you can run the program if you have access to 18 MB of virtual memory.

SUGGESTIONS

[ Previous | Top | Next ]

Creating Assemblies

By default, GelMerge assembles existing contigs together. As you add new sequences to your project databases, you may want to reassemble the entire project "from scratch." With -ARChive, GelMerge reassembles the entire project from the original gel readings, ignoring any sequence edits you have made and contig relationships that you have previously created. This is equivalent to first using GelDisassemble with -ARChive and then using GelMerge; however, using GelMerge with -ARChive as a single step is much faster. With -WORKing, GelMerge assembles the entire project from the individual edited fragments, retaining all edits you've made (base insertions, deletions, and substitutions) but disregarding all of the assemblies you've created. -WORKing removes all gap characters from the edited sequences before they are reassembled.

GelMerge recognizes overlaps based upon the fraction of sequence identity across the entire overlap (see the ALGORITHM topic for more information). Because of the requirement for similarity across the entire overlap, GelMerge may not recognize overlaps between cDNAs and the corresponding genomic sequences if the genomic sequences contain introns. Furthermore, GelMerge may not recognize overlaps between overlapping genomic sequences if they are flanked by vector sequences that are not removed using -EXCise.

Using the default program parameters, GelMerge may fail to recognize weak overlaps among fragments whose sequences are poorly or ambiguously determined. Initially, you should consider using the default parameters with GelMerge to quickly assemble contigs from the fragments with strong overlaps. If you believe that GelMerge has missed some existing overlaps among the contigs in your sequencing project, you may then consider modifying some of these parameters in subsequent sessions with the program.

For example, you might first reduce the minimum fraction of required matching words in an overlap with -STRIngency or in response to the program prompt. Reducing this requirement won't greatly increase the time required for GelMerge to complete. However, greatly reducing this requirement may result in incorrect assemblies.

If you believe that some overlaps may not contain even one block of 14 contiguous sequence identities, you might reduce this requirement with -MINIdentity. Reducing this requirement may significantly increase the time required for GelMerge to complete.

If you believe that fragment sequences are so poorly determined that overlaps may not contain mostly runs of at least 7 identical bases in a row, you might reduce this requirement in response to the What word size program prompt. Reducing this requirement may greatly increase the time required for GelMerge to complete and is not recommended.

Removing Vector Sequences

GelMerge only checks for vector sequences in unaligned, single-fragment contigs. While it recognizes and reports vector sequences located at any position within the fragment, GelMerge can automatically remove only vector sequences found near the ends of a fragment. By default, GelMerge removes vector sequences from a fragment if they begin within 12 bases from either end of the fragment. For instance, let's say the alignment between fragment and vector begins at position 5 and ends at position 40 in the fragment. With -EXCise GelMerge would remove the first 40 bases in the fragment.

If you've specified more than one vector sequence in GelStart, GelMerge finds matches between all vector sequences and the project fragments. However, GelMerge excises only those vector sequences found near the ends of the fragments. For instance, if the order of vector sequences in a fragment is

you could remove only vectorA sequences from the fragment in a single session with GelMerge. To remove all vector sequences, you must first excise the vectorA sequences using -EXCise and -NOMERge. Then, in the next session with GelMerge, you would find vectorB sequences that you could excise near the end of the fragment.

You can preview which vector sequences would be removed from single-fragment contigs in your project database without actually removing those sequences or aligning the contigs. If you use -REPortfile and -NOMERge, GelMerge generates a file of matches between the vector and fragment sequences without making any modifications to your sequencing project database. The excisions that GelMerge would have made if you had used -EXCise are clearly marked above the appropriate alignments in the Report file.

After previewing the GelMerge decisions for vector sequence removal, you could alter the parameters for vector removal if you disagree with those decisions. For instance, the Report file may indicate that GelMerge would remove a 10 base sequence at the beginning of a fragment. If you don't want GelMerge to remove these bases, you can choose to increase the minimum run of contiguous identities that must be found at least once in a match with -VECTORMINIdentity (see the ALGORITHM topic above). Alternatively, instead of allowing GelMerge to automatically remove vector sequences from your project fragments, you could use the alignments listed in the Report file as a guide to make manual excisions in GelAssemble.

If a project fragment contains only vector sequences, GelMerge writes a zero length contig into the project database after removing all bases from the contig. While this zero length fragment provides a record of the original fragment, you may eventually want to remove it from your sequencing project database. You can do this in GelAssemble with the : ERASE command (see the GelAssemble entry in the Program Manual). GelMerge notifies you if your project contains any zero length contigs in the screen summary displayed at the end of the program.

Execution Speed and the Batch Queue

GelMerge uses an algorithm in which computation time is approximately proportional to the number of contigs multiplied by the total length of all contigs. For example, using the default program parameters, it takes a DEC 5000/300 about one minute of CPU time to assemble a project containing 300 fragments with an average length of 400 bases (119,994 total bases) into a single contig of length 21,139. As another example, it takes about 25 seconds of CPU time to assemble a project containing 200 fragments with an average length of 300 bases (60,018 total bases) into five contigs. In a subsequent session with GelMerge, by reducing the required fraction of word matches in an overlap from 0.8 to 0.7, it takes about 7 seconds of CPU time to assemble the five contigs into a single contig of length 10,716.

You may want to consider running GelMerge in the batch queue for large sequencing projects. You can specify that this program run at a later time in the batch queue by using -BATch. Run this way, the program prompts you for all the required parameters and then automatically submits itself to the batch or at queue. For more information, see "Using the Batch Queue" in Chapter 3, Using Programs in the User's Guide. Very large assemblies may exceed the CPU limit set by some systems.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % gelmerge -Default

Prompted Parameters:

-WORdsize=7           sets word size for overlap determination
-STRIngency=0.8       sets minimum fraction of matching words in overlap
-MINOverlap=14        sets minimum length of overlap

Local Data Files:

-MATRix1=gelmergedna.cmp       assigns the scoring matrix for contig assembly
-MATRix2=gelmergelocaldna.cmp  assigns the scoring matrix for vector recognition

Optional Parameters:

-MINIdentity=14         sets minimum run of identical bases found at least
                          once in an overlap between two contigs
-MAXGap=10              sets maximum gap size for overlap determination
-GAPweight=8            sets gap creation penalty in contig assembly
-LENgthweight=2         sets gap extension penalty in contig assembly
-ARChive                creates contigs from the original gel readings
-WORKing                creates contigs from individual working
                          fragment (with gaps removed)
-REPortfile[=Filename]  writes report of recognized vector sequences
-EXCise                 removes vector sequences from single-fragment
                          contigs
-VECTORSTrigency=0.8    sets minimum fraction of matches in vector recognition
-VECTORMINIdentity=12   sets minimum run of identical bases found at least
                          once in a match between vector and fragment
-VECTORMAXGap=5         sets maximum gap size in first step of vector
                          recognition
-VECTORGAPweight=30     sets gap creation penalty in vector recognition
-VECTORLENgthweight=3   sets gap extension penalty in vector recognition
-NOMERge                suppresses contig assembly
-NOMONitor              suppresses screen trace of program progress
-NOSUMmary              suppresses screen summary at the end of the
                          program
-BATch                  submits program to the batch queue

ACKNOWLEDGEMENTS

[ Previous | Top | Next ]

GelMerge was written by Irv Edelman.

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.

Local Scoring Matrices

This program reads one or more scoring matrices for the comparison of sequence characters. The program automatically reads the program's default scoring matrix in a public data directory unless you either 1) have a data file with exactly the same name as the program default scoring matrix in your current working directory; or 2) have a data file with exactly the same name as the program default scoring matrix in the directory with the logical name MyData; or 3) name a file on the command line with an expression like -MATRix=mymatrix.cmp. If you don't include a directory specification when you name a file with -MATRix, the program searches for the file first in your local directory, then in the directory with the logical name MyData, then in the public data directory with the logical name GenMoreData, and finally in the public data directory with the logical name GenRunData. For more information see "Using a Special Kind of Data File: A Scoring Matrix" in Chapter 4, Using Data Files in the User's Guide.

GelMerge uses the scoring matrix file gelmergedna.cmp to align overlapping contigs into a single assembly. GelMerge uses the scoring matrix file gelmergelocaldna.cmp to align fragment and vector sequences as the final step in vector sequence recognition. Neither scoring matrix contains values for the comparison of IUB nucleotide ambiguity symbols, but nucleotide ambiguity symbols are completely supported in these GelMerge alignments. Any ambiguity symbol is converted into a weighted representation of its constituent bases within GelMerge. For instance, GelMerge treats an R as half A and half G for the purpose of alignment.

PARAMETER REFERENCE

[ Previous | Top | Next ]

You can set the parameters listed below from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-WORdsize=7

sets the word size for overlap determination. This is the minimum contiguous length of matching nucleotides that is needed to extend an overlap between two sequences.

-STRIngency=0.8

sets the minimum fraction of matching words in an overlap to be considered for assembly into a contig.

-MINOverlap=14

sets the minimum length of an overlap to be considered for assembly into a contig.

-MATRix=mymatrix.cmp

allows you to specify a scoring matrix file name other than the program default. If you don't include a directory specification when you name a file with -MATRix, the program searches for the file first in your local directory, then in the directory with the logical name MyData, then in the public data directory with the logical name GenMoreData, and finally in the public data directory with the logical name GenRunData.

For more information see the Local Scoring Matrices section.

Use -MATRix1 to specify a different scoring matrix for contig assembly and -MATRix2 to specify a different scoring matrix for vector recognition.

-MINIdentity=14

sets the minimum size for the long block of contiguous sequence identities that must be found at least once in an overlap between two contig consensus sequences (see the ALGORITHM topic for more information).

-MAXGap=10

In overlap determination, sets the maximum amount by which an approximate alignment can deviate from registers of comparison containing long blocks of sequence identities (see the ALGORITHM topic for more information).

-GAPweight=8

sets the gap creation penalty for the alignment of contigs during the assembly.

-LENgthweight=2

sets the gap extension penalty for the alignment of contigs during the assembly.

-ARChive

reassembles the entire project from the original gel readings. All current contig relationships and edits to fragments are lost.

-WORKing

reassembles the entire project from the individual, edited fragments. GelMerge saves all your edits to the fragment sequences, but all current contig relationships are lost. Before reassembly, the program removes all gap characters from the project sequences.

-REPortfile=myproject.report

writes an output file with sequence alignments of matching regions between vector and fragment sequences. The output file uses the project name as the file name and .report as the file name extension unless you set it to something else.

-EXCise

removes vector sequences from the ends of unaligned project fragments (single-fragment contigs). This parameter also writes an output file with sequence alignments of matching regions between vector and fragment sequences unless you suppress it by also specifying -NOREPortfile. The output file uses the project name as the file name and .report as the file name extension unless you set it to something else with -REPortfile.

-VECTORSTringency=0.8

sets the minimum fraction of sequence identity in a match between fragment and vector sequences that is required for vector sequences to be excised from the fragment.

-VECTORMINIdentity=12

sets the minimum size for the long block of contiguous sequence identities that must be found at least once in a match between fragment and vector sequences (see the ALGORITHM topic for more information).

-VECTORMAXGap=5

In the first step of vector recognition, sets the maximum amount by which an approximate alignment can deviate from registers of comparison containing long blocks of sequence identities (see the ALGORITHM topic for more information).

-VECTORGAPweight=30

sets the gap creation penalty for the alignment of vector and contig sequences during the second step of vector recognition (see the ALGORITHM topic for more information).

-VECTORLENgthweight=3

sets the gap extension penalty for the alignment of vector and contig sequences during the second step of vector recognition (see the ALGORITHM topic for more information).

-NOMERge

suppresses contig assembly.

-MONitor

shows the progress of GelMerge on your screen. Use this parameter to see this same monitor in the log file for a batch process. If the monitor slows down the program because your terminal is connected to a slow modem, suppress it by including -NOMONitor.

-SUMmary

writes a summary of the program's work to the screen when you've used -Default to suppress all program interaction. A summary typically displays at the end of a program run interactively. You can suppress the summary for a program run interactively with -NOSUMmary.

You can also use this parameter to cause a summary of the program's work to be written in the log file of a program run in batch.

-BATch

submits the program to the batch queue for processing after prompting you for all required user inputs. Any information that would normally appear on the screen while the program is running is written into a log file. Whether that log file is deleted, printed, or saved to your current directory depends on how your system manager has set up the command that submits this program to the batch queue. All output files are written to your current directory, unless you direct the output to another directory when you specify the output file.

Printed: December 9, 1998 16:26 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]


Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Copyright (c) 1982-2001 Genetics Computer Group, Inc. A subsidiary of Pharmacopeia, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group

www.gcg.com