Implementation:Pytorch Serve DeepSpeed MII Handler

Overview

DeepSpeed_MII_Handler is a TorchServe handler that integrates Microsoft DeepSpeed Model Implementations for Inference (MII) for high-throughput model serving. It extends BaseHandler and ABC, loading models from ZIP archives configured via setup_config.json and running inference through the MII pipeline.

Field	Value
Page Type	Implementation
Implementation Type	API Doc
Domains	Large_Model_Inference, DeepSpeed
Knowledge Sources	Pytorch_Serve
Workflow	LLM_Deployment
Last Updated	2026-02-13 18:52 GMT

Description

The DeepSpeedMIIHandler class bridges TorchServe with Microsoft's DeepSpeed MII library for optimized large model inference. It reads model configuration from a setup_config.json file within the model archive, initializes the MII inference pipeline, and handles the full request lifecycle from text prompt extraction through inference to output conversion.

Key Responsibilities

Model Loading: Extracts a ZIP archive, reads setup_config.json for model parameters, and initializes the MII pipeline
Text Processing: Extracts text prompts from incoming request data
MII Inference: Runs the MII pipeline for optimized large model inference
Output Conversion: Converts generated images or text to numpy arrays for serialization

Dependencies

Library	Usage
`mii`	Microsoft DeepSpeed Model Implementations for Inference
`numpy`	Output array conversion
`abc.ABC`	Abstract base class mixin

Code Reference

Source Location

File	Lines	Repository
`examples/large_models/deepspeed_mii/DeepSpeed_mii_handler.py`	L1-118	pytorch/serve

Key Class

class DeepSpeedMIIHandler(BaseHandler, ABC):
    """
    DeepSpeed MII inference handler.
    Lines 17-119.

    Extends BaseHandler and ABC for DeepSpeed MII pipeline integration.
    """

    def initialize(self, context):
        """
        Load model from ZIP archive and initialize MII pipeline.

        1. Extracts the model ZIP archive
        2. Reads setup_config.json for model parameters
        3. Initializes the MII inference pipeline

        Parameters:
            context: TorchServe context with system_properties and manifest.
        """
        ...

    def preprocess(self, data):
        """
        Extract text prompts from request data.

        Parses incoming request bodies to retrieve text prompt
        strings for the MII pipeline.

        Parameters:
            data (list): List of request input dicts.

        Returns:
            list: Extracted text prompt strings.
        """
        ...

    def inference(self, data, *args, **kwargs):
        """
        Run the MII pipeline for inference.

        Passes preprocessed prompts through the DeepSpeed MII
        pipeline for optimized inference execution.

        Parameters:
            data (list): Text prompts from preprocess.

        Returns:
            Pipeline output (images or text depending on model type).
        """
        ...

    def postprocess(self, data):
        """
        Convert pipeline output to numpy arrays.

        Parameters:
            data: Output from the MII pipeline.

        Returns:
            list: Numpy array representations of the output.
        """
        ...

Import

from ts.torch_handler.base_handler import BaseHandler
from abc import ABC
import mii
import numpy as np

I/O Contract

Method	Input	Output	Notes
`initialize(context)`	Context with `system_properties`, `manifest`	None (sets `self.pipeline`)	Reads setup_config.json, initializes MII pipeline
`preprocess(data)`	list of request dicts with text prompts	list of prompt strings	Extracts text from request body
`inference(data)`	list of prompt strings	MII pipeline output	Runs DeepSpeed MII optimized inference
`postprocess(data)`	MII pipeline output	list of numpy arrays	Converts output for serialization

setup_config.json Structure

Field	Type	Description
`model_name`	str	Name or path of the model to load
`task`	str	Inference task type (e.g., text-generation, text-to-image)
Additional fields	varies	Model-specific MII configuration parameters

Usage Examples

Example 1: Handler Initialization

# The handler reads setup_config.json from the model archive
handler = DeepSpeedMIIHandler()
handler.initialize(context)
# MII pipeline is now ready for inference

Example 2: Inference Request

# Request with text prompt
data = [{"body": "Generate an image of a mountain landscape"}]

prompts = handler.preprocess(data)
results = handler.inference(prompts)
output = handler.postprocess(results)

Example 3: setup_config.json Example

{
    "model_name": "stabilityai/stable-diffusion-2-1",
    "task": "text-to-image"
}

Related Pages

Principle:Pytorch_Serve_DeepSpeed_Inference -- The DeepSpeed inference optimization pattern this handler implements
Implementation:Pytorch_Serve_BaseHandler - Base handler class extended by DeepSpeedMIIHandler
Implementation:Pytorch_Serve_BaseDeepSpeedHandler - Alternative DeepSpeed handler base class
Environment:Pytorch_Serve_Python_PyTorch_Runtime - Core Python/PyTorch runtime
Environment:Pytorch_Serve_CUDA_GPU_Environment - NVIDIA GPU with CUDA for DeepSpeed inference

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment