Implementation:Deepspeedai DeepSpeed Py AIO
| Knowledge Sources | |
|---|---|
| Domains | Async_IO, NVMe_Offload |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Simplified Python-callable functions for single-operation synchronous asynchronous I/O on PyTorch tensors without handle management.
Description
This module provides standalone C++ functions for performing one-off asynchronous read and write operations on PyTorch tensors. Unlike the handle-based interface, these functions (deepspeed_py_aio_read and deepspeed_py_aio_write) create temporary AIO contexts for each operation, making them suitable for simple use cases where creating and managing an I/O handle is unnecessary overhead. The functions support configurable block sizes, queue depths, submission modes, and optional validation.
These are convenience wrappers around the core AIO operations that handle all setup and teardown automatically. They measure and report timing statistics for both the AIO operation itself and the complete function call including validation. The implementation supports both sequential and overlapped event modes.
Usage
Use these functions for simple checkpoint loading/saving scenarios where you need to perform a single read or write operation without the overhead of creating an I/O handle. For repeated operations or when you need asynchronous execution with manual wait control, use the handle-based interface instead.
Code Reference
Source Location
- Repository: DeepSpeed
- File: csrc/aio/py_lib/deepspeed_py_aio.cpp
Signature
int deepspeed_py_aio_write(const torch::Tensor& buffer,
const char* filename,
const int block_size,
const int queue_depth,
const bool single_submit,
const bool overlap_events,
const bool validate);
int deepspeed_py_aio_read(torch::Tensor& buffer,
const char* filename,
const int block_size,
const int queue_depth,
const bool single_submit,
const bool overlap_events,
const bool validate);
Import
#include "deepspeed_py_aio.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| buffer | torch::Tensor | Yes | PyTorch tensor containing data to write or buffer for reading |
| filename | const char* | Yes | Path to file for I/O operation |
| block_size | int | Yes | Size of each I/O block in bytes |
| queue_depth | int | Yes | Maximum number of concurrent I/O operations |
| single_submit | bool | Yes | Submit iocbs individually (true) or as batch (false) |
| overlap_events | bool | Yes | Overlap submission and completion for better performance |
| validate | bool | Yes | Validate I/O correctness by comparing with regular read |
Outputs
| Name | Type | Description |
|---|---|---|
| return_code | int | 0 on success, -1 on error |
| buffer | torch::Tensor | Populated with file data (for read operations) |
| timing_stats | stdout | Prints elapsed time for AIO and total function call |
Usage Examples
import torch
from deepspeed.ops.aio import aio_write, aio_read
# Write tensor to NVMe
tensor = torch.randn(1024, 1024).cuda()
tensor_cpu = tensor.cpu()
aio_write(tensor_cpu,
"/nvme/checkpoint.pt",
block_size=1024*1024, # 1MB blocks
queue_depth=128,
single_submit=False,
overlap_events=True,
validate=False)
# Read tensor from NVMe
buffer = torch.empty_like(tensor_cpu)
aio_read(buffer,
"/nvme/checkpoint.pt",
block_size=1024*1024,
queue_depth=128,
single_submit=False,
overlap_events=True,
validate=True) # Verify correctness
# Simple write with default settings
aio_write(tensor_cpu, "/nvme/state.pt", 128*1024, 32, False, False, False)
// C++ usage
auto tensor = torch::randn({1024, 1024});
const char* filename = "/nvme/checkpoint.pt";
// Write operation
int ret = deepspeed_py_aio_write(tensor, filename,
1024*1024, // 1MB block size
128, // queue depth
false, // batch submit
true, // overlap events
false); // no validation
// Read operation
auto buffer = torch::empty_like(tensor);
ret = deepspeed_py_aio_read(buffer, filename,
1024*1024, 128, false, true, true);