Implementation:Deepspeedai DeepSpeed HybridEngine Init

Overview

Concrete tool for creating a DeepSpeed Hybrid Engine for RLHF training with inference/training mode switching provided by the DeepSpeed library.

Description

DeepSpeedHybridEngine is created by deepspeed.initialize() when config.hybrid_engine.enabled=True. It inherits from DeepSpeedEngine and adds inference containers, LoRA fusion capability, and mode switching methods. The __init__ method creates inference containers by matching the model architecture to known replace policies via create_inference_module().

During initialization, the Hybrid Engine performs the following steps beyond the base DeepSpeedEngine.__init__:

Synchronizes the random number generator state across all GPUs to ensure consistent generation.
Detects whether ZeRO Stage 3 is active (self.Z3_enabled) and whether parameters should be pinned (self.gather_all_layers).
Calls create_inference_module(), which populates inference policies from known architectures, collects all layer parameters, and creates inference containers for each matched transformer layer.
Sets up tensor parallelism groups if inference_tp_size > 1.
Initializes the WorkspaceOp for managing inference cache memory.

The HybridEngineConfig controls the behavior with the following fields:

Field	Type	Default	Description
enabled	bool	False	Whether to use the Hybrid Engine
max_out_tokens	int	512	Maximum output tokens for generation
inference_tp_size	int	1	Tensor parallelism degree for inference
release_inference_cache	bool	False	Release inference cache between generations
pin_parameters	bool	True	Gather all ZeRO-3 parameters at once before generation
tp_gather_partition_size	int	8	Number of layers to gather per partition with TP

Code Reference

Property	Value
Repository	https://github.com/deepspeedai/DeepSpeed
File	`deepspeed/runtime/hybrid_engine.py` (L34-64, `__init__`), `deepspeed/__init__.py` (L201-212, selection logic), `deepspeed/runtime/config.py` (L491-503, `HybridEngineConfig`)
Class	`DeepSpeedHybridEngine(DeepSpeedEngine)`
Import	Created automatically by `deepspeed.initialize()`

I/O Contract

Inputs

Name	Type	Required	Description
model	torch.nn.Module	Yes	Actor model (pretrained or SFT checkpoint)
config	dict or str	Yes	DeepSpeed config with `hybrid_engine.enabled=true`
optimizer	Optimizer	No	Custom optimizer
model_parameters	iterable	No	Parameters to optimize
mpu	object	No	Model parallelism unit for existing TP setups

Outputs

Name	Type	Description
engine	DeepSpeedHybridEngine	Engine with inference containers, LoRA support, and mode switching
optimizer	Optimizer	Wrapped optimizer instance
dataloader	DataLoader	DataLoader if training_data was provided, otherwise None
lr_scheduler	LRScheduler	Learning rate scheduler if configured, otherwise None

Usage Example

import deepspeed

rlhf_config = {
    "train_batch_size": 8,
    "zero_optimization": {"stage": 3},
    "hybrid_engine": {
        "enabled": True,
        "max_out_tokens": 512,
        "inference_tp_size": 1,
        "pin_parameters": True
    },
    "bf16": {"enabled": True}
}
engine, _, _, _ = deepspeed.initialize(
    model=actor_model,
    config=rlhf_config
)
# engine is now a DeepSpeedHybridEngine
# with inference containers ready for generation

Related Pages

Knowledge Sources

Last updated: 2026-02-09 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment