Fragment Assembly System

[ Program Manual | User's Guide | Data Files | Databases ]

Introduction

The GCG Fragment Assembly System (FAS) is a series of programs that help you assemble the overlapping fragment sequences from a sequencing project. Specifically, the FAS enables you to: 1) store fragment sequences; 2) recognize overlapping sequences and create aligned assemblies, called contigs; and 3) display and edit the contigs.

The sequence data for a sequencing project are maintained and manipulated in a sequencing project database. Using the six programs of the FAS, you create and maintain a separate project database for each sequencing project. You create project databases with GelStart, enter fragment sequences with GelEnter, assemble contigs with GelMerge, and edit assembled contigs with GelAssemble. As you enter new sequence data with GelEnter and run GelMerge and GelAssemble repeatedly, your project evolves from a collection of short sequence fragments into a final contig that represents the entire underlying genomic sequence from which all fragments in the project were derived. At any time, you can create a schematic display of the current state of your project database with GelView. If you want to break up the contigs in a sequencing project, GelDisassemble recreates the database as single, unassembled fragments.

GelStart

Use GelStart to create a new project database for each sequencing project. For each new project, GelStart creates a new directory, named after the project, as a subdirectory of your current working directory. For example, if you create a new sequencing project named myproject, a new file named myproject appears in your current working directory.

You must also run GelStart each time you want to begin work on an existing project to lock in the appropriate sequencing project database. Once a database is locked in, all Fragment Assembly programs work on the data in the corresponding sequencing project. The FAS remains locked in to the same project database until you either run GelStart again to lock in a different project or until you log off from the computer.

GelEnter

After locking in a sequencing project database with GelStart, use GelEnter to enter fragment sequences into the project database. GelEnter is a sequence editor that accepts sequence data from: 1) a terminal keyboard; 2) a digitizer; or 3) existing sequence files. You can enter new sequences at any time; they do not all have to be entered when you first create the project. Once you enter sequences into a project database, you can no longer edit them with GelEnter. You can edit the sequences later with GelAssemble.

GelMerge

After entering sequences into a project database, use GelMerge to assemble contigs of aligned sequences from the overlapping fragments in the sequencing project. GelMerge automatically recognizes overlaps among all of the sequences in a project database and creates aligned assemblies, called contigs, from the overlapping sequences. These contigs are stored in the project database. As you add new sequences that connect separate contigs to the project database, GelMerge aligns the contigs into larger assemblies. GelMerge can also automatically remove vector sequences from the individual fragment sequences.

GelAssemble

After assembling contigs with GelMerge, use the contig editor, GelAssemble, to review and modify the alignments. After choosing a contig for review, GelAssemble lets you edit the individual sequences in that contig to resolve inconsistencies. GelAssemble creates a consensus sequence that uses the IUB nucleotide ambiguity codes (see Appendix III of this manual). You can modify a sequence and change the alignment in the same way you edit text with a text editor. Although GelMerge assembles and aligns contigs automatically, you can assemble contigs manually using GelAssemble. For example, you could manually assemble separate contigs that do not share sufficient overlap for GelMerge to assemble automatically. You can also separate fragments from a contig if you believe they should not be included.

Once you are satisfied with a contig, you can store it in the sequencing project database.

GelView

GelView displays bar diagrams that show the overlaps among the fragments in each contig, providing a schematic view of the whole sequencing project.

GelDisassemble

GelDisassemble breaks up the contigs in a sequencing project, thus recreating the database as a collection of single fragments.

Structure of a Fragment Assembly Database

You do not have to understand the structure of a fragment assembly database to successfully use the FAS. All of the programs access and manipulate the project database in a manner that is transparent to you. This description of the database is, therefore, just for those who want to know more about the Fragment Assembly System.

The data in the FAS are stored as text files in a group of subdirectories. This makes the database vulnerable to corruption since you can edit, delete, and rename any of the files in the database with UNIX commands. Use GelMerge and GelAssemble to modify these files. Do not manipulate any file in the database with a text editor!

A fragment assembly database consists of a command directory, with the same name as your project, and four data subdirectories below it: archive, working, consensus, and relation. FAS stores the data for each fragment in separate files in each of these subdirectories. A newly-entered fragment becomes a new contig before it is assembled, and is represented by a new file in each of these four subdirectories.

- The archive directory stores the original fragment sequences that you entered into the database with GelEnter. The FAS never modifies the files in this directory.

- The working directory contains the same fragment sequences as the archive, but with all of the gap insertions and edits that were made to assemble the fragments into contigs.

- The consensus directory has a consensus sequence file for each contig in the project database. Each contig is named after the left-most fragment in the alignment. Newly-entered fragments and other unassembled fragments are also considered contigs and have a consensus sequence in the consensus directory. Because they do not yet align with any other contig, they are called contigs-of-one or single-fragment contigs.

- The relation directory contains a file for each contig that lists the orientation, position, and length of each fragment in the contig.

In addition to these subdirectories, the command directory also contains a copy of each cloning vector specified in GelStart as well as command initializing files for GelEnter, GelMerge, and GelAssemble.

Acknowledgement

Dr. Roger Staden's pioneering work remains the basis of all work on fragment assembly. GelAssemble comes from the MSE editor written by Dr. William Gilbert. Irv Edelman developed the method of fragment assembly implemented in GelMerge.

We are very grateful to those of you who have taken the time to learn the system and give us useful suggestions. We appreciate your time and hope that implementing your suggestions expresses our gratitude.

Printed: December 9, 1998 16:25 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]


Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Copyright (c) 1982-2001 Genetics Computer Group, Inc. A subsidiary of Pharmacopeia, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group

www.gcg.com