Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Deepspeedai DeepSpeed HybridEngine Init

From Leeroopedia


Overview

Concrete tool for creating a DeepSpeed Hybrid Engine for RLHF training with inference/training mode switching provided by the DeepSpeed library.

Description

DeepSpeedHybridEngine is created by deepspeed.initialize() when config.hybrid_engine.enabled=True. It inherits from DeepSpeedEngine and adds inference containers, LoRA fusion capability, and mode switching methods. The __init__ method creates inference containers by matching the model architecture to known replace policies via create_inference_module().

During initialization, the Hybrid Engine performs the following steps beyond the base DeepSpeedEngine.__init__:

  • Synchronizes the random number generator state across all GPUs to ensure consistent generation.
  • Detects whether ZeRO Stage 3 is active (self.Z3_enabled) and whether parameters should be pinned (self.gather_all_layers).
  • Calls create_inference_module(), which populates inference policies from known architectures, collects all layer parameters, and creates inference containers for each matched transformer layer.
  • Sets up tensor parallelism groups if inference_tp_size > 1.
  • Initializes the WorkspaceOp for managing inference cache memory.

The HybridEngineConfig controls the behavior with the following fields:

Field Type Default Description
enabled bool False Whether to use the Hybrid Engine
max_out_tokens int 512 Maximum output tokens for generation
inference_tp_size int 1 Tensor parallelism degree for inference
release_inference_cache bool False Release inference cache between generations
pin_parameters bool True Gather all ZeRO-3 parameters at once before generation
tp_gather_partition_size int 8 Number of layers to gather per partition with TP

Code Reference

Property Value
Repository https://github.com/deepspeedai/DeepSpeed
File deepspeed/runtime/hybrid_engine.py (L34-64, __init__), deepspeed/__init__.py (L201-212, selection logic), deepspeed/runtime/config.py (L491-503, HybridEngineConfig)
Class DeepSpeedHybridEngine(DeepSpeedEngine)
Import Created automatically by deepspeed.initialize()

I/O Contract

Inputs

Name Type Required Description
model torch.nn.Module Yes Actor model (pretrained or SFT checkpoint)
config dict or str Yes DeepSpeed config with hybrid_engine.enabled=true
optimizer Optimizer No Custom optimizer
model_parameters iterable No Parameters to optimize
mpu object No Model parallelism unit for existing TP setups

Outputs

Name Type Description
engine DeepSpeedHybridEngine Engine with inference containers, LoRA support, and mode switching
optimizer Optimizer Wrapped optimizer instance
dataloader DataLoader DataLoader if training_data was provided, otherwise None
lr_scheduler LRScheduler Learning rate scheduler if configured, otherwise None

Usage Example

import deepspeed

rlhf_config = {
    "train_batch_size": 8,
    "zero_optimization": {"stage": 3},
    "hybrid_engine": {
        "enabled": True,
        "max_out_tokens": 512,
        "inference_tp_size": 1,
        "pin_parameters": True
    },
    "bf16": {"enabled": True}
}
engine, _, _, _ = deepspeed.initialize(
    model=actor_model,
    config=rlhf_config
)
# engine is now a DeepSpeedHybridEngine
# with inference containers ready for generation

Related Pages

Knowledge Sources

Last updated: 2026-02-09 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment