BREAKUP*

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents
FUNCTION
DESCRIPTION
EXAMPLE
INPUT FILES
RELATED PROGRAMS
CONSIDERATIONS
COMMAND-LINE SUMMARY
LOCAL DATA FILES
PARAMETER REFERENCE

FUNCTION

[ Top | Next ]

BreakUp reads a GCG-format sequence file containing more than 350,000 sequence characters and writes it as a set of separate, shorter, overlapping sequence files that can be analyzed by Wisconsin Package programs.

DESCRIPTION

[ Previous | Top | Next ]

This program converts a user sequence that is longer than 350,000 bases to a set of sequences, none longer than 110,000 bases, by breaking the input sequence at 100,000 base boundaries and including 10,000 bases of overlap in the output files.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using BreakUp to convert the user sequence lengthy.seq, of length 600,000 bases, to a set of six output sequence files, each with no more than 110,000 bases.


% breakup

 BREAKUP what file(s) ?  lengthy.seq
        lengthy_0.seq  length: 110000 bp
        lengthy_1.seq  length: 110000 bp
        lengthy_2.seq  length: 110000 bp
        lengthy_3.seq  length: 110000 bp
        lengthy_4.seq  length: 110000 bp
        lengthy_5.seq  length: 100000 bp

%

INPUT FILES

[ Previous | Top | Next ]

BreakUp accepts a single sequence or multiple sequences as input. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenEMBL:*. The function of BreakUp depends on whether your input sequence(s) are protein or nucleotide. Programs determine the type of a sequence by the presence of either Type: N or Type: P on the last line of the text heading just above the sequence. If your sequence(s) are not the correct type, see Appendix VI for information on how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

Replace, CompressText, OneCase, ShiftOver, DeTab, ChopUp, LPrint, and ListFile are the Wisconsin Package file utilities programs.

CONSIDERATIONS

[ Previous | Top | Next ]

Sequence files prepared with a text editor or brought to your computer from other sources may contain lines longer than 511 characters. These sequence files must be converted by ChopUp before being read by BreakUp.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % breakup [-INfile=]breakup.txt -Default

Prompted Parameters:  None

Local Data Files:   None

Optional Parameters:

-NOMONitor                suppresses the screen trace showing each file
-LINesize=50              sets number of characters per line
-BLOcksize=10             sets number of characters per block
-BLAnklines=1             puts blank lines between the sequence lines
-SEGmentsize=100000       sets number of nonoverlapping bases per segment
-OVErlap=10000            sets number of overlapping bases per segment
-NONUMbering              suppresses numbering
-NOCOMments               suppresses comments
-PROtein                  insists that the sequences are reformatted as
                            protein sequences
-NUCleotide               insists that the sequences are reformatted as
                            nucleic acid sequences
[-OUTfile=]newseqname     lets you name the output file
-EXTension=.seq           defines a file name extension

LOCAL DATA FILES

[ Previous | Top | Next ]

None.

PARAMETER REFERENCE

[ Previous | Top | Next ]

You can set the parameters listed below from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-MONitor

This program normally monitors its progress on your screen. However, when you use -Default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.

-LINesize=50

lets you set the number of sequence characters per line to any number between 1 and 120.

-BLOcksize=10

lets you set the number of sequence characters in each block to any number between 1 and the line size.

-BLAnklines=1

leaves zero or more blank lines between the sequence lines.

-SEGmentsize=100000

lets you set the number of non-overlapping sequence characters in each output file to any number greater than the overlap and less than 350000.

-OVErlap=10000

lets you set the number of overlapping sequence characters in each output file to any number between 0 and the segment size. The sum of the segment size and the overlap size must, however, be less than 350000.

-NONUMbering

suppresses the numbering next to each sequence line.

-NOCOMments

suppresses any comments that may have been in the input sequence file.

-PROtein

sets the sequence type to protein.

-NUCleotide

sets the sequence type to nucleotide.

-OUTfile=newseqname

selects an output filename other than the name of the input file.

-EXTension=.seq

selects a filename extension other than the input filename extension.

Printed: December 9, 1998 16:27 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]


Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Copyright (c) 1982-2001 Genetics Computer Group, Inc. A subsidiary of Pharmacopeia, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group

www.gcg.com