Implementation:Pytorch Serve MMFHandler

Overview

MMFHandler is a TorchServe custom handler for MMF (Multimodal Framework) activity recognition models. It extends BaseHandler to support multimodal inference over video, audio, and text inputs, loading an MMF model with its associated configuration, processors, and activity label mappings from a CSV file. The handler uses omegaconf for configuration management, pandas for label loading, and the MMF framework for model and data processing.

Field	Value
Implementation Name	MMFHandler
Type	Custom Handler
Workflow	Multimodal_Inference
Domains	Model_Serving, Multimodal_AI
Knowledge Sources	Pytorch_Serve
Last Updated	2026-02-13 18:52 GMT

Description

The MMFHandler class demonstrates how to serve a multimodal model (specifically the MMFTransformer for activity recognition) through TorchServe. It overrides all four stages of the BaseHandler pipeline to handle the unique requirements of multimodal data processing.

Key Responsibilities

Model Loading: Loads an MMF model using the MMF framework's model registry, along with OmegaConf-based configuration for processors and data pipeline
Label Mapping: Reads activity labels from a CSV file using pandas, mapping numeric predictions to human-readable activity names
Multimodal Preprocessing: Processes video frames, audio waveforms, and text descriptions through MMF's processor pipeline, assembling them into a SampleList
Inference: Runs the MMFTransformer model on the assembled SampleList
Postprocessing: Maps model output indices to activity label strings

Dependencies

Dependency	Purpose
`mmf`	MMF framework for multimodal model loading and processing
`omegaconf`	Configuration management for MMF model and processor configs
`pandas`	Loading activity labels from CSV
`torch`	PyTorch tensor operations
`ts.torch_handler.base_handler`	Parent class providing the handler lifecycle

Code Reference

Source Location

File	Lines	Repository
`examples/MMF-activity-recognition/handler.py`	L34-147	pytorch/serve

Signature

from ts.torch_handler.base_handler import BaseHandler


class MMFHandler(BaseHandler):
    """
    TorchServe handler for MMF multimodal activity recognition.

    Processes video, audio, and text inputs through MMF processors,
    runs inference with MMFTransformer, and maps outputs to activity labels.
    """

    def initialize(self, context):
        """
        Load MMF model, configuration, processors, and activity labels.

        Sets up:
            - self.model: MMFTransformer loaded from checkpoint
            - self.config: OmegaConf config for processors
            - self.processors: Dict of MMF data processors (video, audio, text)
            - self.activity_labels: List of activity label strings from CSV

        Args:
            context: TorchServe context with model_dir, manifest, etc.
        """
        ...

    def preprocess(self, data):
        """
        Process multimodal input data into an MMF SampleList.

        Accepts video, audio, and text data from the request body.
        Each modality is processed through its corresponding MMF processor.
        Results are assembled into a SampleList for model consumption.

        Args:
            data (list): List of request dicts containing multimodal input.

        Returns:
            SampleList: MMF SampleList with processed video, audio, text tensors.
        """
        ...

    def inference(self, data, *args, **kwargs):
        """
        Run MMFTransformer forward pass on the SampleList.

        Args:
            data (SampleList): Preprocessed multimodal data.

        Returns:
            dict: Model output dict containing logits and predictions.
        """
        ...

    def postprocess(self, data):
        """
        Map model predictions to activity label strings.

        Args:
            data (dict): Model output dict with prediction indices.

        Returns:
            list: List of predicted activity label strings.
        """
        ...

I/O Contract

Method	Input	Output	Notes
`initialize(context)`	TorchServe context with model artifacts	None (sets `self.model`, `self.config`, `self.processors`, `self.activity_labels`)	Loads MMF checkpoint, OmegaConf config, CSV labels
`preprocess(data)`	List of request dicts with video/audio/text data	`SampleList` with processed tensors	Uses MMF processors for each modality
`inference(data)`	`SampleList`	Model output dict with logits	Runs `MMFTransformer` forward pass
`postprocess(data)`	Model output dict	List of activity label strings	Maps indices to CSV-loaded labels

Input Data Format

Field	Type	Description
`video`	bytes	Raw video data (frames)
`audio`	bytes	Raw audio waveform data
`text`	string	Text description or transcript

Usage Examples

Example 1: Packaging the handler into a MAR

# Package the MMF handler with model artifacts
# torch-model-archiver --model-name mmf_activity \
#   --version 1.0 \
#   --handler examples/MMF-activity-recognition/handler.py \
#   --extra-files "config.yaml,activity_labels.csv,checkpoint.pth" \
#   --export-path model_store

Example 2: Sending a multimodal inference request

import requests

# Send multimodal data for activity recognition
with open("video.mp4", "rb") as video_file:
    response = requests.post(
        "http://localhost:8080/predictions/mmf_activity",
        files={"data": video_file},
    )
    print(response.json())
    # Output: ["playing basketball"]

Example 3: Handler initialization flow

# During initialize(), the handler:
# 1. Calls super().initialize(context) for base setup
# 2. Loads OmegaConf config from extra files
# 3. Instantiates MMF processors for video, audio, text
# 4. Reads activity_labels.csv with pandas
# 5. Loads MMFTransformer model from checkpoint

import pandas as pd
from omegaconf import OmegaConf

# Activity labels loaded as:
labels_df = pd.read_csv("activity_labels.csv")
activity_labels = labels_df["label"].tolist()
# e.g., ["playing basketball", "cooking", "dancing", ...]

Related Pages

Principle:Pytorch_Serve_Multimodal_Inference - Multimodal inference principle this handler implements
Implementation:Pytorch_Serve_BaseHandler - Parent class providing the handler lifecycle pattern
Implementation:Pytorch_Serve_Generate_Model_Archive - Packages this handler into a .mar archive
Implementation:Pytorch_Serve_Service_Predict - Service layer that invokes handle() on this handler

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment