Sip is a program for comparing and aligning pairs of nucleic acid or protein sequences. The program, which is based around "dot matrix" plots, includes several methods of finding matches between pairs of sequences and ways of displaying them. The methods vary in speed and sensitivity. The two dimensional plots created by the comparison methods can be zoomed and cursors or crosshairs used to locate the positions of individual matches. The alignments between segments of the sequences can be examined using a display that enables the sequences to be slid passed one another. The sliding can be linked to the position of the cursors or the sequences can be positioned automatically to show the alignment of the nearest match.
To help assess the statistical significance of comparisons the program can calculate tables of expected and observed score frequencies for each score level ( see section Probabilities and expected numbers of matches). A dynamic programming algorithm for finding optimal alignments is included and also a Smith-Waterman local alignment algorithm. DNA sequences can be complemented and translated so that they can be compared at the protein level. This includes the facility to superimpose the results from the three reading frames, hence enabling frameshift errors to be spotted.
The main window of SIP has the menus: "File", "View", "Options", "Sequences", "Comparison" and "Help" at the top of the window. Beneath these are two text windows, the "Output window" which displays any text output from the program and the "Error window" which displays all error messages.
The "File" menu contains the commands to extract sequences from either a sequence library or a personal file ( see section Reading in sequences). It also contains the Sequence manager ( see section Sequence manager) option and the option to send a sequence to another program (See section Inter-program communication.).
The "View" menu contains the option to use the Results manager ( see section Result manager).
The "Options" menu allows the user to change the score matrices for DNA or protein sequences ( see section Changing the score matrix), change the maximum number of matches and the default number of matches that are plotted ( see section Changing the maximum number of matches). It also contains a selectable function to hide or display duplicate matches when comparing a sequence against itself ( see section Hide duplicate matches). The ability to change the fonts and colours can also be accessed from this menu.
The "Sequences" menu contains operations which may be performed on a sequence. Some of these operations will produce a new sequence. All these operations are also obtainable from a pop up menu in the sequence manager ( see section Sequence manager).
The "Comparison" menu contains the sequence comparison functions, including "Find similar spans" ( see section Find similar spans), "Find matching words" ( see section Find matching words), "Find best diagonals ( see section Find best diagonals), and "Align sequences", ( see section Align sequences). All of these functions display their results as points or lines in a two-dimensional plot called a "sip plot" ( see section Sip plot). The picture below shows the results after performing a "Find similar spans" comparison between the three reading frames of two DNA sequences, producing nine plots in all.