Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Pytorch Serve MMFHandler

From Leeroopedia

Overview

MMFHandler is a TorchServe custom handler for MMF (Multimodal Framework) activity recognition models. It extends BaseHandler to support multimodal inference over video, audio, and text inputs, loading an MMF model with its associated configuration, processors, and activity label mappings from a CSV file. The handler uses omegaconf for configuration management, pandas for label loading, and the MMF framework for model and data processing.

Field Value
Implementation Name MMFHandler
Type Custom Handler
Workflow Multimodal_Inference
Domains Model_Serving, Multimodal_AI
Knowledge Sources Pytorch_Serve
Last Updated 2026-02-13 18:52 GMT

Description

The MMFHandler class demonstrates how to serve a multimodal model (specifically the MMFTransformer for activity recognition) through TorchServe. It overrides all four stages of the BaseHandler pipeline to handle the unique requirements of multimodal data processing.

Key Responsibilities

  • Model Loading: Loads an MMF model using the MMF framework's model registry, along with OmegaConf-based configuration for processors and data pipeline
  • Label Mapping: Reads activity labels from a CSV file using pandas, mapping numeric predictions to human-readable activity names
  • Multimodal Preprocessing: Processes video frames, audio waveforms, and text descriptions through MMF's processor pipeline, assembling them into a SampleList
  • Inference: Runs the MMFTransformer model on the assembled SampleList
  • Postprocessing: Maps model output indices to activity label strings

Dependencies

Dependency Purpose
mmf MMF framework for multimodal model loading and processing
omegaconf Configuration management for MMF model and processor configs
pandas Loading activity labels from CSV
torch PyTorch tensor operations
ts.torch_handler.base_handler Parent class providing the handler lifecycle

Code Reference

Source Location

File Lines Repository
examples/MMF-activity-recognition/handler.py L34-147 pytorch/serve

Signature

from ts.torch_handler.base_handler import BaseHandler


class MMFHandler(BaseHandler):
    """
    TorchServe handler for MMF multimodal activity recognition.

    Processes video, audio, and text inputs through MMF processors,
    runs inference with MMFTransformer, and maps outputs to activity labels.
    """

    def initialize(self, context):
        """
        Load MMF model, configuration, processors, and activity labels.

        Sets up:
            - self.model: MMFTransformer loaded from checkpoint
            - self.config: OmegaConf config for processors
            - self.processors: Dict of MMF data processors (video, audio, text)
            - self.activity_labels: List of activity label strings from CSV

        Args:
            context: TorchServe context with model_dir, manifest, etc.
        """
        ...

    def preprocess(self, data):
        """
        Process multimodal input data into an MMF SampleList.

        Accepts video, audio, and text data from the request body.
        Each modality is processed through its corresponding MMF processor.
        Results are assembled into a SampleList for model consumption.

        Args:
            data (list): List of request dicts containing multimodal input.

        Returns:
            SampleList: MMF SampleList with processed video, audio, text tensors.
        """
        ...

    def inference(self, data, *args, **kwargs):
        """
        Run MMFTransformer forward pass on the SampleList.

        Args:
            data (SampleList): Preprocessed multimodal data.

        Returns:
            dict: Model output dict containing logits and predictions.
        """
        ...

    def postprocess(self, data):
        """
        Map model predictions to activity label strings.

        Args:
            data (dict): Model output dict with prediction indices.

        Returns:
            list: List of predicted activity label strings.
        """
        ...

I/O Contract

Method Input Output Notes
initialize(context) TorchServe context with model artifacts None (sets self.model, self.config, self.processors, self.activity_labels) Loads MMF checkpoint, OmegaConf config, CSV labels
preprocess(data) List of request dicts with video/audio/text data SampleList with processed tensors Uses MMF processors for each modality
inference(data) SampleList Model output dict with logits Runs MMFTransformer forward pass
postprocess(data) Model output dict List of activity label strings Maps indices to CSV-loaded labels

Input Data Format

Field Type Description
video bytes Raw video data (frames)
audio bytes Raw audio waveform data
text string Text description or transcript

Usage Examples

Example 1: Packaging the handler into a MAR

# Package the MMF handler with model artifacts
# torch-model-archiver --model-name mmf_activity \
#   --version 1.0 \
#   --handler examples/MMF-activity-recognition/handler.py \
#   --extra-files "config.yaml,activity_labels.csv,checkpoint.pth" \
#   --export-path model_store

Example 2: Sending a multimodal inference request

import requests

# Send multimodal data for activity recognition
with open("video.mp4", "rb") as video_file:
    response = requests.post(
        "http://localhost:8080/predictions/mmf_activity",
        files={"data": video_file},
    )
    print(response.json())
    # Output: ["playing basketball"]

Example 3: Handler initialization flow

# During initialize(), the handler:
# 1. Calls super().initialize(context) for base setup
# 2. Loads OmegaConf config from extra files
# 3. Instantiates MMF processors for video, audio, text
# 4. Reads activity_labels.csv with pandas
# 5. Loads MMFTransformer model from checkpoint

import pandas as pd
from omegaconf import OmegaConf

# Activity labels loaded as:
labels_df = pd.read_csv("activity_labels.csv")
activity_labels = labels_df["label"].tolist()
# e.g., ["playing basketball", "cooking", "dancing", ...]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment