first previous next last contents

Local alignment

The local alignment routine is based around the program sim.c by Huang and Miller which is an implementation of the Smith-Waterman algorithm Huang,X.Q. & Miller, W. A Time-Efficient, Linear-Space Local Similarity Algorithm. Advances in Applied Mathematics 12 337-357 (1991).

SIM finds k best non-intersecting alignments between two sequences or within a sequence using dynamic programming techniques. The alignments are reported in order of decreasing similarity score and share no aligned pairs. SIM requires space proportional to the sum of the input sequence lengths and the output alignment lengths, so it accommodates 100,000-base sequences on a workstation. Both sequences must be of the same type, ie both be DNA or both be protein.

[picture]

A dialogue box (shown above) requests the horizontal and vertical sequences and their ranges over which they are to be aligned ( see section Selecting a sequence). Either a specified number of alignments can be specified or alternatively, all alignments above a certain score. If the sequence is DNA, the scores for a matching aligned pair, a transition and a transversion must be provided. These values are used to generate a score matrix. For protein sequences, the score matrix can be changed from the "Options" menu ( see section Changing the score matrix). Both DNA and protein sequences require the penalty for opening a gap and the penalty for gap extension.

The alignments are displayed in the output window along with the percentage mismatch (see below) and on the sip plot as a series of lines, each line corresonding to a single alignment. The line represents the positions of the bases in the alignment. Stretches of pads will appear as straight horizontal or vertical regions depending on whether the pads were in the vertical or horizontal sequence respectively.


 Percentage mismatch  35.7
               438       448       458       468       478       488
               h caggcctgtgaggaccagcagtgctgtcctgagatgggcggctggtctggctgggggccc
                 :::::::::::   :::: ::  ::: ::       :: : :::: :   :::::: :::
               m caggcctgtgacacccagaagacctgccccacacatggggcctgggcatcctggggcccc
               451       461       471       481       491       501

               498       508       518
               h tgggagccttgctctgtcacctgc
                 :::   ::  :::: :   :::::
               m tggagcccccgctcaggatcctgc
               511       521       531

Further operations available for local alignments are:

Information
This command gives a brief description of the sequences used in the comparison and the input parameters used.

horizontal PERSONAL: h from 1 to 1553
vertical PERSONAL: m from 1 to 1358
number of alignments 3 
score for match 1
score for transition -1
score for transversion -1
penalty for starting gap 6
penalty for each residue in gap 0.2

Configure
This option allows the line width and colour of the matches to be altered.See section Colour Selector. A colour browser is displayed from which the desired line width or colour can be configured. Pressing OK will update the sip plot.
Display sequences
Selecting this command invokes the sequence display ( see section Sequence display). Moving the cursor in the sequence display will move the cursors of the same sequence in any sip plot ( see section Cursors). To force the sequence display to show the nearest match, use the "nearest match" button in the sequence display plot.
Hide
This option removes the points from the sip plot but retains the information in memory.
Reveal
This option will redisplay previously hidden points in the sip plot.
Remove
This command removes all the information regarding this particular invocation of Local alignment, and access to this data is lost.

first previous next last contents
This page is maintained by James Bonfield. Last generated on 2 Febuary 1999.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/sip_13.html