HMMEREMIT*

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents
FUNCTION
DESCRIPTION
EXAMPLE
OUTPUT
INPUT FILES
RELATED PROGRAMS
RESTRICTIONS
ALGORITHM
COMMAND-LINE SUMMARY
ACKNOWLEDGEMENT
LOCAL DATA FILES
PARAMETER REFERENCE

FUNCTION

[ Top | Next ]

HmmerEmit generates sequences that match a profile hidden Markov model.

DESCRIPTION

[ Previous | Top | Next ]

HmmerEmit provides a GCG interface to the hmmemit program of Dr. Sean Eddy's HMMER package. It allows you to access most of hmmemit's parameters from the GCG command line.

HmmerEmit randomly generates sequences that match a given profile HMM. These are output in a single RSF file, and can be output as is, or with gaps inserted as necessary so that they are in alignment with each other. It can also create a single majority-rule consensus sequence for the profile HMM.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using HmmerEmit to generate a set of sequences from the HMM created in the HmmerBuild session:


% hmmeremit HMMEREMIT sequences representative of what profile HMM ? hsp70.hmm_g What should the output file be called (* hmmeremit.rsf *)? Creating temp file for input to hmmemit. Calling hmmemit to perform analysis ... hmmemit - generate sequences from a profile HMM HMMER 2.1.1 (Dec 1998) Copyright (C) 1992-1998 Washington University School of Medicine HMMER is freely distributed under the GNU General Public License (GPL). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /usr/users/share/smith/hsp70.hmm_g Number of seqs: 10 Random seed: 966634453 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - %

OUTPUT

[ Previous | Top | Next ]

Here is some of the output file:


!!RICH_SEQUENCE 1.0
Output generated by GCG HMMER
 ..
{
name  seq1
type    PROTEIN
checksum    8442
creation-date  9/13/1999 17: 5:55
strand  1
sequence
  SDYNSISSNGIGIDLGTTTSVVASMPDDRIQIIANDSGNRFTPNGVAFTEKETLIGMYSK
  DQSACMSKNTLFDAKRLIARKFNDDRVQPDSKCWPLVVIGRVKKPQIQVKVLGFAKEFAP
  PEMSSMIVTKMKEQAESFLNEVVGAALITIPAYFNDSQRTATIDAGTIAGLNVLRLVAEP
  TAAAAAYRLDKATRIKERNYLIFDLGGGTHDIELMTINRDGTFEVRSTSFDVHLGGEDFD
  RRLLPALCYEFKRKKADKDLETDTAAMKRLAYAAETAKVELSSSTSVCIELDALFKGGQG
  YKKVYPRVCRAKFESLNQDLFQRTLDPMEKALRDAEVDKAQVAKLVLVGGSTRLPMVLRL
  VNEFFNGKEQRKVINSDEAVSLPAAVQGGLLEGLAGDKDFLLLDVDPVAMGMMIQGGVMN
  ALEKRNSVVPCKKAQVFTTTSGNRTSMLVEASDGERDSCANELLGRFTLKGIMPAPAGVP
  QFVPQFDVEALGILYVSSKKKNAGKINKKTIANIKALSTVKIEKMIEEAIRYKTADEKNS
  KRIVGAEHLEHYAYNLKQTIKDMKEKLTENDYSIVESQIERIETLLDRNNDSIFKLKAQM
  DEAVIMPVIKLYKFRTLDAANSKDNQLDGSSARAGTVWNASSPDSNECLNPGGDKSGPMI
  EDID
}
{
name  seq2
type    PROTEIN
checksum    5720
creation-date  9/13/1999 17: 5:55
strand  1
sequence
  GTSGEKVGIDLLTTSSMKHSNSAVSKYVPRRVEMLENAQGNRSTPSLLIYTDEEALGDRI
  SIGAAAKNQVALNDSNTVFAAGRLMGRKFSDLSVQRDLKFFPANMQPDQQEGVKANIEVP
  FKEEPGKFGPKEISGEILSAMNEVAETYWGKEHVVGEAVVTVPTYVGDAQKQVGKDAGRL
  AELNVDRLIEEPTACCLAYGLDKSKKQPLHFDLGGGAFDVTLLHLDVNVFEVKATSGQTE
  LGATGFDNEMIPSHVHYVKRKNAHIDTGDNERSLQRIRTEDDQAKRDLASPGQTSIRVES
  VTEGVAFHSEFSIKITRAKYETLFLNSIAPVEQPPLRIASMSKQQMDEVMLVGGSTGIPK
  VQEDIVQYFSGKDPHKMIIPDEAVAYGAAIQAYILKGEEISRTKDILLDDIAPKSIGIST
  RQGVMTYYIAANLCISTEKSSLFSTVADNQPSVQNVSQDGEMQMVKDNHSLGTFGIGGIN
  PAGAMIEVTFGIQANPVLDMAYNDTNTGSTGNITIDNDRGSMSEMEIDDMVEEAVEFADD
  QEEARHQTAKTRLTEKYAQQTGTGVVENSKERMSGTDKKKIEDNIEGVTNALQVNDTPTQ
  KEEFRDAHATLENSVTKAFEKMHQGARSGKTISRAASYNDKRVVGVTEEINAVEWNAPNI
  TPIRSGGQGESVE
}

//////////////////////////////////////////////////////////////

{
name  seq10
type    PROTEIN
checksum    1351
creation-date  9/13/1999 17: 5:55
strand  1
sequence
  EEKAIGIQLGVAYACMGAYKAGTVVILPENHQGNRITPSYIAFTDDELLVGAAAKTQVAS
  NPQNTHFDARQLIGRKFKDKEVQAEIHHLQFKINSAGREEAKSNYQYKGSLFSAEQLSSL
  VVAKMKENAHAYLGKTVIQAVRTVPAYFNDLQRQTKYDAGLIAALNVLRFINEPTMAIAY
  GTDKVGGQSEETVILIYDDGEGKMEVSIKGSYENKAQGETHLGGDNYSDEMDQVLKQEFP
  TKNKGDDINSNAYASRRLRDAAERSERQLSSAPMNLIEIASFDDEADGPRLDTEGSRGTR
  AKLETLNRELFAGTLSPVNRALRDAKLDKAQITDVVKVGGSLRIAKVIKCVKDFSNGREF
  NKSINPEENVALGAPVQGAVKGQTVANNVADILLLDVTPLSLGLRTEGDVMTVMIFKNTT
  VPTKKDWTFSTYVRNQGMVYLKVFEGERTAQDQDLLGRWELCGIPPEKLKPRGFPQFEVR
  FDFAANGILDVKAGEKGTNKANRITITMYKGACKKKEIQKEVAEADSFQEGDSKENQLKG
  ATNNAESLSDYNLRNTCKDTAISTGVTEAKKTEAITDRCQPFLALVTELALREQYDTAFR
  ELEGVCKPSVFNRYCFHQQGTEAGRPKSHCTGSEKMEVPCDEKGPGLMSQKLRVKGTSSN
  PSTREVD
}

INPUT FILES

[ Previous | Top | Next ]

HmmerEmit requires as its only input a profile HMM file.

RELATED PROGRAMS

[ Previous | Top | Next ]

PileUp creates a multiple sequence alignment from a group of related sequences. LineUp is a multiple sequence editor used to create multiple sequence alignments. Pretty displays multiple sequence alignments.

ProfileMake makes a profile from a multiple sequence alignment. ProfileSearch uses the profile to search a database for sequences with similarity to the group of aligned sequences. ProfileSegments displays optimal alignments between each sequence in the ProfileSearch output list and the group of aligned sequences (represented by the profile consensus). ProfileGap makes optimal alignments between one or more sequences and a group of aligned sequences represented as a profile. ProfileScan finds structural and sequence motifs in protein sequences, using predetermined parameters to determine significance.

HmmerBuild makes a profile hidden Markov model from a multiple sequence alignment. HmmerAlign aligns one or more sequences to a profile HMM. HmmerPfam searches a database of profile HMMs with a sequence query in order to identify known domains within the sequence. HmmerSearch uses a profile HMM as a query to search a sequence database for sequences similar to the original aligned sequences. HmmerCalibrate calibrates a hidden Markov model so that database searches using it as a query will be more sensitive. HmmerIndex creates a binary GSI ("generic sequence index") for a database of profile HMMs. HmmerFetch retrieves a profile hidden Markov model by name from an indexed database of profile HMMs. HmmerEmit randomly generates sequences that match a profile HMM. HmmerConvert converts between different profile HMM file formats and from profile HMM to GCG profile file format.

MEME finds conserved motifs in a group of unaligned sequences and saves these motifs as a set of profiles. You can search a database of sequences with these profiles using the MotifSearch program.

RESTRICTIONS

[ Previous | Top | Next ]

Unknown.

ALGORITHM

[ Previous | Top | Next ]

See the Profile HMM Analysis Essay for an introduction to profile hidden Markov models and the terminology associated with them.

To generate sequences that match the model, HmmerEmit uses a random number generator that is initialized by a seed value. By default, this seed value is derived from the system clock of your computer, so each program run will use a different seed and thus generate a somewhat different set of sequences. If you want HmmerEmit to generate the same set of sequences each time, you can set the seed by means of the -SEED parameter.

When generating a single consensus sequence, HmmerEmit looks at the state probabilities at each node of the model. If the probability of a match state at a node is greater than or equal to 0.5, the most likely residue at that node (the one with the highest emission probability) is written to the consensus sequence. On the other hand, if the probability of an insert state at that node is greater than or equal to 0.5, one or more X characters are written to the consensus, the number of characters depending on the expectation value of the insert state at that node.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % hmmeremit [-INfile1=]hsp70.hmm_g -Default

Prompted Parameters:

[-OUTfile=]hsp70.rsf names the output file

Local Data Files: None

Optional Parameters:

-MULTIAlign writes the generated sequences so that they are in alignment -CONsensus creates a single majority-rule consensus sequence -SEQNUM=10 generates 10 sequences that match the model -SEED=13794 sets the seed for the random number generator to 13794 -NOMONitor doesn't display information about analysis parameters used

ACKNOWLEDGEMENT

[ Previous | Top | Next ]

The programs comprising the HMMER package are designed and implemented by Dr. Sean Eddy of the Washington University School of Medicine, St. Louis, Missouri. The GCG front-end programs were written by Christiane van Schlun in collaboration with Dr. Eddy.

LOCAL DATA FILES

[ Previous | Top | Next ]

None.

PARAMETER REFERENCE

[ Previous | Top | Next ]

You can set the parameters listed below from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

Following some of the parameters described below is a short expression in parentheses. These are the names of the corresponding parameters used in the native HMMER package. Some of the GCG parameters are identical to the original HMMER parameters, others have been changed to fit GCG's conventions.

-MULTIAlign (-a)

writes the generated sequences to the RSF file so that they are in alignment with each other. (Gaps are inserted into the sequences as needed in order to accomplish this.) This option can not be used together with parameter -CONsensus.

-CONsensus (-c)

creates a single majority-rule consensus sequence. Highly conserved residues (probability greater than or equal to 90 percent for DNA and greater than or equal to 50 percent for protein) are shown in uppercase; less conserved residues are shown in lowercase. If insert states are used in more than 50 percent of generated sequences, they may become part of the majority rule consensus. If so, the insert-generated residues are shown as X. This option can not be used together with parameter -MULTIAlign. Also if this option is used, parameter -SEQNUM will always be ignored.

-SEQNUM=10 (-n 10)

generates 10 sequences (10 is the default). This option will be ignored if parameter -CONsensus is specified as well.

-SEED=13794 (--seed 13794)

sets the seed for the random number generator to 13794 (must be a positive integer) instead of using the system clock to create the seed.

-NOMONitor

suppresses the display of the program's progress on the screen.

Printed: February 5, 2001 11:38 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]


Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Copyright (c) 1982-2001 Genetics Computer Group Inc. A subsidiary of Pharmacopeia, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group

www.gcg.com