Implementation:Deepspeedai DeepSpeed DeepSpeedEngine Save For SP
Overview
Concrete tool for saving sequence-parallel model checkpoints and configuring evaluation bypass provided by the DeepSpeed library.
Description
DeepSpeedEngine.save_checkpoint() saves the model normally since sequence parallelism does not partition weights. The checkpoint contains complete model weights that can be loaded directly for inference or further training on any GPU configuration. The disable_in_eval flag on UlyssesSPAttentionHF bypasses SP all-to-all communication during evaluation, allowing simpler inference.
The save process follows the standard DeepSpeed checkpoint flow:
- Validates the checkpoint tag across all ranks for consistency
- Creates the checkpoint directory structure
- Saves model state dict (complete weights, not SP-partitioned)
- Saves optimizer state (partitioned by ZeRO stage across DP ranks)
- Saves ZeRO checkpoint files if applicable
- Writes a
latestfile pointing to the most recent checkpoint tag
Because SP only partitions activations and not weights, the model state dict saved by any rank within a DP group is identical and complete. This means no special checkpoint merging or conversion is needed for the model portion.
The disable_in_eval behavior is configured during UlyssesSPAttentionHF setup (either via register_with_transformers(disable_in_eval=True) or direct construction). At forward time, the check at line L255-257 of ulysses_sp.py determines whether to bypass SP:
if not module.training and self.disable_in_eval:
return self.attn(module, query, key, value, attention_mask, *args, **kwargs)
Code Reference
- Repository: https://github.com/deepspeedai/DeepSpeed
- File:
deepspeed/runtime/engine.py(L3695-3789,save_checkpoint),deepspeed/runtime/sequence_parallel/ulysses_sp.py(L255-258,disable_in_evallogic)
save_checkpoint Signature
def save_checkpoint(
self,
save_dir: str,
tag: str = None,
client_state: dict = {},
save_latest: bool = True,
exclude_frozen_parameters: bool = False,
) -> bool
Import
# Accessed via the DeepSpeed engine object
engine.save_checkpoint(save_dir, tag)
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
| save_dir | str | Yes | Directory path for saving checkpoint files |
| tag | str | No | Unique identifier for the checkpoint; defaults to global_step{N}
|
| client_state | dict | No | Additional training state to save (e.g., epoch, custom metrics) |
| save_latest | bool | No | Whether to write a latest file pointing to this checkpoint (default: True)
|
| exclude_frozen_parameters | bool | No | Whether to exclude frozen parameters from the saved state (default: False)
|
Outputs
| Output | Type | Description |
|---|---|---|
| success | bool | Returns True on successful save
|
| Checkpoint files | files on disk | Complete model weights (no SP-specific conversion needed), optimizer state, and metadata |
Evaluation Bypass Configuration
| Parameter | Where Set | Description |
|---|---|---|
| disable_in_eval | UlyssesSPAttentionHF.__init__ or register_with_transformers() |
When True, SP all-to-all communication is skipped during model.eval()
|
Usage Example
# During training: save checkpoint (weights are complete, not sequence-partitioned)
engine.save_checkpoint("sp_checkpoints/", tag="step_10000")
# For single-GPU inference: load normally with HuggingFace
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("sp_checkpoints/step_10000")
# No SP-specific conversion needed
# Or evaluate with SP bypassed during training
# (requires disable_in_eval=True during register_with_transformers)
engine.eval() # Sets module.training = False; SP all-to-all is skipped
with torch.no_grad():
outputs = engine(eval_batch)
# Resume training
engine.train() # SP all-to-all resumes
Related Pages
Knowledge Sources
- https://github.com/deepspeedai/DeepSpeed
- https://www.deepspeed.ai/tutorials/ulysses-alst-sequence-parallelism/
Last updated: 2026-02-09 00:00 GMT