NOOVERLAP

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents
FUNCTION
DESCRIPTION
EXAMPLE
OUTPUT
INPUT FILES
RELATED PROGRAMS
RESTRICTIONS
CONSIDERATIONS
COMMAND-LINE SUMMARY
ACKNOWLEDGEMENT
LOCAL DATA FILES
PARAMETER REFERENCE

FUNCTION

[ Top | Next ]

NoOverlap identifies the places where a group of nucleotide sequences do not share any common subsequences.

DESCRIPTION

[ Previous | Top | Next ]

This program determines if there are regions where a group of nucleotide sequences do not share any common subsequences. Witkiewicz, Bolander, and Edwards assert that hybridization probes specific enough to detect individual members of a gene family can be prepared if a region 100 bases or longer can be found that does not have a perfect match of nine or more bases with any other member of the family (BioTechniques 14(3); 458-463). NoOverlap is designed to find out if such regions occur in a group of sequences.

To use NoOverlap, you name a group of related sequences in which you want to find regions that do not share any 9-mer with any other sequence in the group. The resulting output is a list of the sequences that have such regions and the coordinates of the regions where no common 9-mers occur.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using NoOverlap to find all of the regions of length 100 or greater that contain no common 9-mers in the sequences named in the file of sequence names inhibit.list.


% nooverlap

 (Double-stranded) NOOVERLAP among what sequences ?  @inhibit.list

 What is the word size (* 9 *) ?

 What minimum region length with no 9-mers (* 100 *) ?

 What should I call the output file (* nooverlap.dat *) ?

   Reading ..
 Comparing ..

 NOOVERLAP complete!

             Sequences:       2
          Total Length:   1,844
        Common  9-mers:      22
 Regions of no overlap:       7

%

OUTPUT

[ Previous | Top | Next ]

NoOverlap makes an output file with a list of all the non-overlapping regions in every sequence that meet your requirements for word size and length. Here is the output file from this session:


 (Double-stranded) NOOVERLAP of: @inhibit.list  October 8, 1998 10:57

 Window: 9  Minimum No-hit region: 100  Sequences: 2

Sequence   Ranges     ..

X03124
               1-116
             422-583
             593-772

J05593
               1-195
             275-402
             493-599
             691-790

INPUT FILES

[ Previous | Top | Next ]

NoOverlap accepts multiple (two or more) nucleotide sequences as input. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenEMBL:*. If NoOverlap rejects your nucleotide sequence, see Appendix VI for information on how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

Compare compares two protein or nucleic acid sequences and creates a file of the points of similarity between them for plotting with DotPlot. Compare finds the points using either a window/stringency or a word match criterion. The word comparison is 1,000 times faster than the window/stringency comparison, but somewhat less sensitive.

RESTRICTIONS

[ Previous | Top | Next ]

NoOverlap only works with nucleotide sequences. The total of all sequence lengths cannot be greater than 350,000 bases.

CONSIDERATIONS

[ Previous | Top | Next ]

If your setting for the minimum region length without an n-mer is greater than the longest sequence in the set of sequences you search, NoOverlap will adjust it downwards to the length of the longest sequence in the group.

Different ambiguity codes will not necessarily match one another. That is, NoOverlap converts ambiguity codes to single, unambiguous bases. Thus, ambiguity codes match only those other ambiguity codes which have been converted to the same unambiguous base.

RNA and DNA are treated the same way; that is, T is equivalent to U.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % nooverlap [-INfile1=]@inhibit.list -Default

Prompted Parameters:

-WORdsize=9                 sets length of words that must not occur
-MINlength=100              sets minimum size of region with no common words
[-OUTfile=]nooverlap.dat    names the output file

Local Data Files: None

Optional Parameters:

-ONEstrand   searches only the top strand of your sequences
-NOMONitor   suppresses the screen trace: "Reading ..."
-NOSUMmary   suppresses the summary at the end of the program

ACKNOWLEDGEMENT

[ Previous | Top | Next ]

NoOverlap was written by John Devereux in collaboration with Dr. Halina Witkiewicz at the Mayo clinic.

LOCAL DATA FILES

[ Previous | Top | Next ]

None.

PARAMETER REFERENCE

[ Previous | Top | Next ]

You can set the parameters listed below from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-WORdsize=9

sets the word size.

-MINlength=100

sets the minimum length of the region that must contain no word matches among the sequences in the specified list.

-ONEstrand

searches only for regions in the top strand of each of your sequences.

-MONitor

This program normally monitors its progress on your screen. However, when you use -Default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.

-SUMmary

writes a summary of the program's work to the screen when you've used -Default to suppress all program interaction. A summary typically displays at the end of a program run interactively. You can suppress the summary for a program run interactively with -NOSUMmary.

You can also use this parameter to cause a summary of the program's work to be written in the log file of a program run in batch.

Printed: December 9, 1998 16:23 (1162)


[ Program Manual | User's Guide | Data Files | Databases ]


Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Copyright (c) 1982-2001 Genetics Computer Group, Inc. A subsidiary of Pharmacopeia, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group

www.gcg.com