Overview
DeepSpeed_MII_Handler is a TorchServe handler that integrates Microsoft DeepSpeed Model Implementations for Inference (MII) for high-throughput model serving. It extends BaseHandler and ABC, loading models from ZIP archives configured via setup_config.json and running inference through the MII pipeline.
Description
The DeepSpeedMIIHandler class bridges TorchServe with Microsoft's DeepSpeed MII library for optimized large model inference. It reads model configuration from a setup_config.json file within the model archive, initializes the MII inference pipeline, and handles the full request lifecycle from text prompt extraction through inference to output conversion.
Key Responsibilities
- Model Loading: Extracts a ZIP archive, reads
setup_config.json for model parameters, and initializes the MII pipeline
- Text Processing: Extracts text prompts from incoming request data
- MII Inference: Runs the MII pipeline for optimized large model inference
- Output Conversion: Converts generated images or text to numpy arrays for serialization
Dependencies
| Library |
Usage
|
mii |
Microsoft DeepSpeed Model Implementations for Inference
|
numpy |
Output array conversion
|
abc.ABC |
Abstract base class mixin
|
Code Reference
Source Location
| File |
Lines |
Repository
|
examples/large_models/deepspeed_mii/DeepSpeed_mii_handler.py |
L1-118 |
pytorch/serve
|
Key Class
class DeepSpeedMIIHandler(BaseHandler, ABC):
"""
DeepSpeed MII inference handler.
Lines 17-119.
Extends BaseHandler and ABC for DeepSpeed MII pipeline integration.
"""
def initialize(self, context):
"""
Load model from ZIP archive and initialize MII pipeline.
1. Extracts the model ZIP archive
2. Reads setup_config.json for model parameters
3. Initializes the MII inference pipeline
Parameters:
context: TorchServe context with system_properties and manifest.
"""
...
def preprocess(self, data):
"""
Extract text prompts from request data.
Parses incoming request bodies to retrieve text prompt
strings for the MII pipeline.
Parameters:
data (list): List of request input dicts.
Returns:
list: Extracted text prompt strings.
"""
...
def inference(self, data, *args, **kwargs):
"""
Run the MII pipeline for inference.
Passes preprocessed prompts through the DeepSpeed MII
pipeline for optimized inference execution.
Parameters:
data (list): Text prompts from preprocess.
Returns:
Pipeline output (images or text depending on model type).
"""
...
def postprocess(self, data):
"""
Convert pipeline output to numpy arrays.
Parameters:
data: Output from the MII pipeline.
Returns:
list: Numpy array representations of the output.
"""
...
Import
from ts.torch_handler.base_handler import BaseHandler
from abc import ABC
import mii
import numpy as np
I/O Contract
| Method |
Input |
Output |
Notes
|
initialize(context) |
Context with system_properties, manifest |
None (sets self.pipeline) |
Reads setup_config.json, initializes MII pipeline
|
preprocess(data) |
list of request dicts with text prompts |
list of prompt strings |
Extracts text from request body
|
inference(data) |
list of prompt strings |
MII pipeline output |
Runs DeepSpeed MII optimized inference
|
postprocess(data) |
MII pipeline output |
list of numpy arrays |
Converts output for serialization
|
setup_config.json Structure
| Field |
Type |
Description
|
model_name |
str |
Name or path of the model to load
|
task |
str |
Inference task type (e.g., text-generation, text-to-image)
|
| Additional fields |
varies |
Model-specific MII configuration parameters
|
Usage Examples
Example 1: Handler Initialization
# The handler reads setup_config.json from the model archive
handler = DeepSpeedMIIHandler()
handler.initialize(context)
# MII pipeline is now ready for inference
Example 2: Inference Request
# Request with text prompt
data = [{"body": "Generate an image of a mountain landscape"}]
prompts = handler.preprocess(data)
results = handler.inference(prompts)
output = handler.postprocess(results)
Example 3: setup_config.json Example
{
"model_name": "stabilityai/stable-diffusion-2-1",
"task": "text-to-image"
}
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.