first previous next last contents

Probabilities and expected numbers of matches

To help users assess the significance of the matches found between sequences and to suggest reasonable ranges of cutoff scores for the comparison functions, sip calculates their probabilities and the expected number of matches. There are two calculations Staden R, Methods for calculating the probabilities of finding patterns in sequences. CABIOS 5 89-96 (1989). One gives the probability of finding particular scores for the Find similar spans algorithm ( see section Find similar spans) and the other for finding matching words of a given length ( see section Find matching words). In both cases the probability depends on the composition of the two sequences, the cutoff score, and, for the matching spans algorithm, the score matrix. The probability is the chance of finding the given score in infinitely long random sequences of the same composition as the pair being compared. The expected number of matches for any score is calculated by multiplying its probability value by the product of the lengths of the two sequences. Note that no correction is made for the case of comparing a sequence against itself.


first previous next last contents
This page is maintained by James Bonfield. Last generated on 2 Febuary 1999.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/sip_18.html