Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Pyro ppl Pyro ProfileHMM

From Leeroopedia


Property Value
Implementation Type Pattern Doc
Source File examples/contrib/mue/ProfileHMM.py
Module pyro.contrib.mue
Pyro Features pyro.contrib.mue.models.ProfileHMM, pyro.contrib.mue.dataloaders.BiosequenceDataset, SVI, MultiStepLR scheduler
References Durbin et al. (1998), "Biological sequence analysis"; Weinstein & Marks (2021)

Overview

This file provides a training script for the Profile HMM model, a standard probabilistic model for biological sequence families. The Profile HMM corresponds to a constant (delta function) distribution with a MuE observation, making it a special case of the FactorMuE model with no latent factors.

Unlike the FactorMuE, the Profile HMM does not learn a latent representation. Instead, it directly models:

  • Consensus sequence positions with emission probabilities
  • Insertion states allowing extra characters between consensus positions
  • Deletion states allowing consensus positions to be skipped

The model handles variable-length sequences without requiring a pre-computed multiple sequence alignment, learning the alignment implicitly through the MuE observation distribution.

Code Reference

def main(args):
    dataset = BiosequenceDataset(args.file, "fasta", args.alphabet,
                                  include_stop=args.include_stop, device=device)

    latent_seq_length = args.latent_seq_length
    if latent_seq_length is None:
        latent_seq_length = int(dataset.max_length * 1.1)

    model = ProfileHMM(
        latent_seq_length, dataset.alphabet_length,
        prior_scale=args.prior_scale,
        indel_prior_bias=args.indel_prior_bias,
        cuda=args.cuda,
    )

    scheduler = MultiStepLR({"optimizer": Adam, "optim_args": {"lr": args.learning_rate},
                              "milestones": json.loads(args.milestones)})
    losses = model.fit_svi(dataset, n_epochs, args.batch_size, scheduler, args.jit)

I/O Contract

Parameter Type Description
-f / --file str Input FASTA file path
-a / --alphabet str Alphabet type: "amino-acid", "dna", or custom
-M / --latent-seq-length int Latent (consensus) sequence length (default: 1.1x max length)
--prior-scale float Prior scale for all parameters (default: 1.0)
--indel-prior-bias float Indel prior bias (default: 10.0)
--split float Train/test split fraction (default: 0.2)

Output:

  • Training and test log-probability and perplexity
  • Loss curve plot, insertion/deletion probability plots
  • Saved parameter store and evaluation results

Usage Examples

# Train ProfileHMM on protein data
# python ProfileHMM.py -f ve6_full.fasta -b 10 -M 174 --indel-prior-bias 10. \
#     -e 15 -lr 0.01 --jit --cuda

# Quick test with generated data
# python ProfileHMM.py --test --small -e 5

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment