first previous next last contents

Find matching words

The find matching words routine finds runs of identical characters in the sequence. Its main value is speed, being 100's of times faster than the find similar spans function. It is of course not very sensitive but is useful for long DNA sequences.

[picture]

The dialogue allows the horizontal and vertical sequences and their ranges to be selected ( see section Selecting a sequence). The word length is the minimum number of consecutive matching characters. All runs of identical characters that are at least as long as the word length will produce a line on the sip plot of length proportional to the actual word length ( see section Sip plot).

If the expected number of matches is greater than a maximum value (currently set by default to be 50000), a dialogue box will be invoked (see below). This dialogue box displays the expected number of matches and the maximum number of matches allowed. This operation will be aborted and if you really wish to display that number of matches, then change the maximum number of matches allowed ( see section Changing the maximum number of matches).

[picture]

Further operations available for find matching words are:

Information
This command gives a brief description of the sequences used in the comparison, the input parameters used and the number of hits found.

horizontal EMBL: hsproperd
vertical EMBL: mmproper
word length 8 
Number of matches 140

Results
A detailed listing of all the matching words is obtained in the output window. The horizontal (h) and vertical (v) positions of the beginning of the match are listed along with the length of the match and the match itself.

Positions        162 h          4 v and length         14
ttcacccagtatga
Positions        225 h         67 v and length         18
gaagactgctgtctcaac
Positions        509 h        118 v and length          8
ctctgtca
Positions        276 h        118 v and length          9
ctctgtcag
Positions        288 h        130 v and length          8
tgcaggtc
Positions        626 h        131 v and length          8
gcaggtct
Positions       1208 h        144 v and length          8
atggtcag

Tabulate scores
This option lists scores, probabilities, and their expected and observed numbers of matches.

score    8 probability 2.06e-05 expected           43 observed 140
score    9 probability 5.35e-06 expected           11 observed 67
score   10 probability 1.39e-06 expected            3 observed 45
score   11 probability 3.60e-07 expected            1 observed 35
score   12 probability 9.35e-08 expected            0 observed 22
score   13 probability 2.43e-08 expected            0 observed 18
score   14 probability 6.30e-09 expected            0 observed 17
score   15 probability 1.63e-09 expected            0 observed 11
score   16 probability 4.24e-10 expected            0 observed 9
score   17 probability 1.10e-10 expected            0 observed 9
score   18 probability 2.86e-11 expected            0 observed 8
score   19 probability 7.42e-12 expected            0 observed 6
score   20 probability 1.93e-12 expected            0 observed 5
score   21 probability 5.00e-13 expected            0 observed 3
score   22 probability 1.30e-13 expected            0 observed 2
score   23 probability 3.37e-14 expected            0 observed 2
score   24 probability 8.74e-15 expected            0 observed 2

Configure
This option allows the line width and colour of the matches to be altered.See section Colour Selector. A colour browser is displayed from which the desired line width or colour can be configured. Pressing OK will update the sip plot.
Display sequences
Selecting this command invokes the sequence display ( see section Sequence display). Moving the cursor in the sequence display will move the cursors of the same sequence in any sip plot ( see section Cursors). To force the sequence display to show the nearest match, use the "nearest match" button in the sequence display plot.
Hide
This option removes the points from the sip plot but retains the information in memory.
Reveal
This option will redisplay previously hidden points in the sip plot.
Remove
This command removes all the information regarding this particular invocation of Find matching words, and access to this data is lost.

first previous next last contents
This page is maintained by James Bonfield. Last generated on 2 Febuary 1999.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/sip_10.html