Implementation:Microsoft DeepSpeedExamples Fast Torch Serialization
| Knowledge Sources | |
|---|---|
| Domains | Deep Learning, Checkpointing |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
Patched version of PyTorch 2.6.0 serialization module with FastPersist optimizations for accelerated model checkpoint writing via DeepNVMe.
Description
serialization_fast_v2.6.0.py is a modified copy of PyTorch's torch.serialization module that enables DeepSpeed FastPersist integration for high-throughput NVMe and GDS (GPU Direct Storage) writes during model checkpointing. The module provides the standard save() and load() functions used by PyTorch for tensor serialization, along with supporting utilities for endianness control, CRC32 options, memory-mapped loading, and safe globals management.
The key difference from the original PyTorch serialization is in the save() function's storage writing path. When the file object has a save_torch_storage_object_list method (indicating a FastFileWriter handle), the module batches all storage objects together and writes them in a single optimized call rather than writing each storage individually. This batched approach enables the DeepNVMe backend to perform direct NVMe writes with optimal throughput, achieving 25X+ speedup over standard filesystem writes.
The module also provides thread-local state management via _SerializationLocal for map_location propagation, skip_data support for metadata-only saves, and fake tensor materialization. It supports both the legacy pickle-based format and the modern zipfile-based serialization format introduced in PyTorch 1.6.
Usage
This module is used as a drop-in replacement for torch.serialization when FastPersist checkpointing is enabled. It is transparently swapped in by the DeepNVMe model checkpoint infrastructure to accelerate checkpoint saves without requiring changes to user code that calls torch.save().
Code Reference
Source Location
- Repository: Microsoft_DeepSpeedExamples
- File:
deepnvme/model_checkpoint/torch/serialization_fast_v2.6.0.py - Lines: 1-1979
Signature
def save(
obj: object,
f: FILE_LIKE,
pickle_module: Any = pickle,
pickle_protocol: int = DEFAULT_PROTOCOL,
_use_new_zipfile_serialization: bool = True,
_disable_byteorder_record: bool = False,
) -> None:
...
def load(
f: FILE_LIKE,
map_location: MAP_LOCATION = None,
pickle_module: Any = None,
*,
weights_only: Optional[bool] = None,
mmap: Optional[bool] = None,
**pickle_load_args: Any,
) -> Any:
...
Import
from deepnvme.model_checkpoint.torch.serialization_fast_v2_6_0 import save, load
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| obj | object | Yes | The Python object to serialize (typically a model state_dict or tensor) |
| f | FILE_LIKE | Yes | File path (str/PathLike) or file-like object with write capability; may be a FastFileWriter for NVMe acceleration |
| pickle_module | Any | No | Module used for pickling metadata (default: pickle) |
| pickle_protocol | int | No | Protocol version for pickle (default: 2) |
| _use_new_zipfile_serialization | bool | No | Use zipfile-based format (default: True) |
Outputs
| Name | Type | Description |
|---|---|---|
| (save) | None | Writes serialized object to the specified file |
| (load) | Any | The deserialized Python object (typically dict, Tensor, or Module state) |
Usage Examples
Saving a Model Checkpoint with FastPersist
import torch
from deepnvme.model_checkpoint.torch import serialization_fast_v2_6_0 as fast_serial
# Standard usage - transparent acceleration when f supports FastFileWriter
model_state = model.state_dict()
fast_serial.save(model_state, "/mnt/nvme/checkpoint.pt")
# Loading remains standard
state_dict = fast_serial.load("/mnt/nvme/checkpoint.pt", map_location="cpu")
model.load_state_dict(state_dict)