Implementation:Deepspeedai DeepSpeed HybridEngine Init
Overview
Concrete tool for creating a DeepSpeed Hybrid Engine for RLHF training with inference/training mode switching provided by the DeepSpeed library.
Description
DeepSpeedHybridEngine is created by deepspeed.initialize() when config.hybrid_engine.enabled=True. It inherits from DeepSpeedEngine and adds inference containers, LoRA fusion capability, and mode switching methods. The __init__ method creates inference containers by matching the model architecture to known replace policies via create_inference_module().
During initialization, the Hybrid Engine performs the following steps beyond the base DeepSpeedEngine.__init__:
- Synchronizes the random number generator state across all GPUs to ensure consistent generation.
- Detects whether ZeRO Stage 3 is active (
self.Z3_enabled) and whether parameters should be pinned (self.gather_all_layers). - Calls
create_inference_module(), which populates inference policies from known architectures, collects all layer parameters, and creates inference containers for each matched transformer layer. - Sets up tensor parallelism groups if
inference_tp_size > 1. - Initializes the
WorkspaceOpfor managing inference cache memory.
The HybridEngineConfig controls the behavior with the following fields:
| Field | Type | Default | Description |
|---|---|---|---|
| enabled | bool | False | Whether to use the Hybrid Engine |
| max_out_tokens | int | 512 | Maximum output tokens for generation |
| inference_tp_size | int | 1 | Tensor parallelism degree for inference |
| release_inference_cache | bool | False | Release inference cache between generations |
| pin_parameters | bool | True | Gather all ZeRO-3 parameters at once before generation |
| tp_gather_partition_size | int | 8 | Number of layers to gather per partition with TP |
Code Reference
| Property | Value |
|---|---|
| Repository | https://github.com/deepspeedai/DeepSpeed |
| File | deepspeed/runtime/hybrid_engine.py (L34-64, __init__), deepspeed/__init__.py (L201-212, selection logic), deepspeed/runtime/config.py (L491-503, HybridEngineConfig)
|
| Class | DeepSpeedHybridEngine(DeepSpeedEngine)
|
| Import | Created automatically by deepspeed.initialize()
|
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | torch.nn.Module | Yes | Actor model (pretrained or SFT checkpoint) |
| config | dict or str | Yes | DeepSpeed config with hybrid_engine.enabled=true
|
| optimizer | Optimizer | No | Custom optimizer |
| model_parameters | iterable | No | Parameters to optimize |
| mpu | object | No | Model parallelism unit for existing TP setups |
Outputs
| Name | Type | Description |
|---|---|---|
| engine | DeepSpeedHybridEngine | Engine with inference containers, LoRA support, and mode switching |
| optimizer | Optimizer | Wrapped optimizer instance |
| dataloader | DataLoader | DataLoader if training_data was provided, otherwise None |
| lr_scheduler | LRScheduler | Learning rate scheduler if configured, otherwise None |
Usage Example
import deepspeed
rlhf_config = {
"train_batch_size": 8,
"zero_optimization": {"stage": 3},
"hybrid_engine": {
"enabled": True,
"max_out_tokens": 512,
"inference_tp_size": 1,
"pin_parameters": True
},
"bf16": {"enabled": True}
}
engine, _, _, _ = deepspeed.initialize(
model=actor_model,
config=rlhf_config
)
# engine is now a DeepSpeedHybridEngine
# with inference containers ready for generation
Related Pages
- Principle:Deepspeedai_DeepSpeed_Hybrid_Engine_Init
- Environment:Deepspeedai_DeepSpeed_CUDA_GPU_Environment
Knowledge Sources
Last updated: 2026-02-09 00:00 GMT