Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Pytorch Serve DeepSpeed MII Handler

From Leeroopedia
Revision as of 13:45, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Pytorch_Serve_DeepSpeed_MII_Handler.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Overview

DeepSpeed_MII_Handler is a TorchServe handler that integrates Microsoft DeepSpeed Model Implementations for Inference (MII) for high-throughput model serving. It extends BaseHandler and ABC, loading models from ZIP archives configured via setup_config.json and running inference through the MII pipeline.

Field Value
Page Type Implementation
Implementation Type API Doc
Domains Large_Model_Inference, DeepSpeed
Knowledge Sources Pytorch_Serve
Workflow LLM_Deployment
Last Updated 2026-02-13 18:52 GMT

Description

The DeepSpeedMIIHandler class bridges TorchServe with Microsoft's DeepSpeed MII library for optimized large model inference. It reads model configuration from a setup_config.json file within the model archive, initializes the MII inference pipeline, and handles the full request lifecycle from text prompt extraction through inference to output conversion.

Key Responsibilities

  • Model Loading: Extracts a ZIP archive, reads setup_config.json for model parameters, and initializes the MII pipeline
  • Text Processing: Extracts text prompts from incoming request data
  • MII Inference: Runs the MII pipeline for optimized large model inference
  • Output Conversion: Converts generated images or text to numpy arrays for serialization

Dependencies

Library Usage
mii Microsoft DeepSpeed Model Implementations for Inference
numpy Output array conversion
abc.ABC Abstract base class mixin

Code Reference

Source Location

File Lines Repository
examples/large_models/deepspeed_mii/DeepSpeed_mii_handler.py L1-118 pytorch/serve

Key Class

class DeepSpeedMIIHandler(BaseHandler, ABC):
    """
    DeepSpeed MII inference handler.
    Lines 17-119.

    Extends BaseHandler and ABC for DeepSpeed MII pipeline integration.
    """

    def initialize(self, context):
        """
        Load model from ZIP archive and initialize MII pipeline.

        1. Extracts the model ZIP archive
        2. Reads setup_config.json for model parameters
        3. Initializes the MII inference pipeline

        Parameters:
            context: TorchServe context with system_properties and manifest.
        """
        ...

    def preprocess(self, data):
        """
        Extract text prompts from request data.

        Parses incoming request bodies to retrieve text prompt
        strings for the MII pipeline.

        Parameters:
            data (list): List of request input dicts.

        Returns:
            list: Extracted text prompt strings.
        """
        ...

    def inference(self, data, *args, **kwargs):
        """
        Run the MII pipeline for inference.

        Passes preprocessed prompts through the DeepSpeed MII
        pipeline for optimized inference execution.

        Parameters:
            data (list): Text prompts from preprocess.

        Returns:
            Pipeline output (images or text depending on model type).
        """
        ...

    def postprocess(self, data):
        """
        Convert pipeline output to numpy arrays.

        Parameters:
            data: Output from the MII pipeline.

        Returns:
            list: Numpy array representations of the output.
        """
        ...

Import

from ts.torch_handler.base_handler import BaseHandler
from abc import ABC
import mii
import numpy as np

I/O Contract

Method Input Output Notes
initialize(context) Context with system_properties, manifest None (sets self.pipeline) Reads setup_config.json, initializes MII pipeline
preprocess(data) list of request dicts with text prompts list of prompt strings Extracts text from request body
inference(data) list of prompt strings MII pipeline output Runs DeepSpeed MII optimized inference
postprocess(data) MII pipeline output list of numpy arrays Converts output for serialization

setup_config.json Structure

Field Type Description
model_name str Name or path of the model to load
task str Inference task type (e.g., text-generation, text-to-image)
Additional fields varies Model-specific MII configuration parameters

Usage Examples

Example 1: Handler Initialization

# The handler reads setup_config.json from the model archive
handler = DeepSpeedMIIHandler()
handler.initialize(context)
# MII pipeline is now ready for inference

Example 2: Inference Request

# Request with text prompt
data = [{"body": "Generate an image of a mountain landscape"}]

prompts = handler.preprocess(data)
results = handler.inference(prompts)
output = handler.postprocess(results)

Example 3: setup_config.json Example

{
    "model_name": "stabilityai/stable-diffusion-2-1",
    "task": "text-to-image"
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment