[CMMG logo]     CENTER for MOLECULAR MEDICINE & GENETICS

Wayne State University School of Medicine

CalendarDirectoryFacilitiesFacultyGreetingInfo
CMMG
Home Search CMMG


MBG 8680: Computer Applications in Molecular Genetics

[MBG 8680 Home Page]


MULTIPLE SEQUENCE ANALYSIS


In this section, we will use GCG to align several related protein 
sequences, then we will use the alignment for phylogenetic analysis.

START SEQLAB

Start the X Windows program on your PC
telnet to genetics, set DISPLAY: setenv DISPLAY ip.address:0

cd into your mbg8680/gcg folder: cd mbg8680/gcg
type gcg, then type seqlab &
Make sure your working dir is set to mbg8680/gcg
(see Introduction to GCG for a review if needed)

OPEN A DIFFERENT LIST FILE

Click File, Open List, hsp70a.list, OK.

The hsp70a.list file contains several heat shock-related 
proteins from various species, which we will use for the
following demonstrations.

PILEUP, THE MULTIPLE SEQUENCE ALIGNMENT PROGRAM

We will use the GCG program PileUp to align the heat
shock proteins.  PileUp does progressive pairwise 
comparisons on every possible pair of sequences to find 
the best alignment, then repeats the process until the 
alignment is complete.

Select all the entries in the hsp70a.list file.
Click Functions, Multiple Comparison, Pileup.
Note options, then click Run.
Click Windows, Job Manager, Open Output Mgr

Display the .msf multiple sequence file.
Display the .figure dendogram (plot of sequence similarity).
Add the .msf file to the Main List.
Close the Output and Job Manager Windows.
Note the .msf file added to the list, then save the list.

EXAMINE THE ALINGMENT IN THE MSF FILE

Load the .msf file into the Editor.  You can manually edit the 
alignment to make it better.  Return to the Main List.

Run Plotsimilarity on the .msf file to show a plot of the
overall similarity of the 5 protein sequences.

Use Pretty to calculate the consensus sequence, and display the
matching amino acids in upercase characters.

PHYLOGENETIC (EVOLUTIONARY) ANALYSIS

Run Distances on the .msf file.  Note 3 output files.
Display the .distances file (table of relatedness).
Display the .figure file (phylogram of relatedness).

Note the negative branch. That is because the sequences are 
not similar near their ends.  To run the analysis properly, 
the unrelated sequence ends should be left out by first 
selecting only the interior sequences.  In fact, the analysis 
can be done on several different interior segments in order to 
compare the results.


Send comments to: dwomble@genetics.wayne.edu

[MBG 8680 Home Page]

Copyright © 2003, David D. Womble.