Implementation:Deepspeedai DeepSpeed Py AIO

Knowledge Sources	DeepSpeed
Domains	Async_IO, NVMe_Offload
Last Updated	2026-02-09 00:00 GMT

Overview

Simplified Python-callable functions for single-operation synchronous asynchronous I/O on PyTorch tensors without handle management.

Description

This module provides standalone C++ functions for performing one-off asynchronous read and write operations on PyTorch tensors. Unlike the handle-based interface, these functions (deepspeed_py_aio_read and deepspeed_py_aio_write) create temporary AIO contexts for each operation, making them suitable for simple use cases where creating and managing an I/O handle is unnecessary overhead. The functions support configurable block sizes, queue depths, submission modes, and optional validation.

These are convenience wrappers around the core AIO operations that handle all setup and teardown automatically. They measure and report timing statistics for both the AIO operation itself and the complete function call including validation. The implementation supports both sequential and overlapped event modes.

Usage

Use these functions for simple checkpoint loading/saving scenarios where you need to perform a single read or write operation without the overhead of creating an I/O handle. For repeated operations or when you need asynchronous execution with manual wait control, use the handle-based interface instead.

Code Reference

Source Location

Repository: DeepSpeed
File: csrc/aio/py_lib/deepspeed_py_aio.cpp

Signature

int deepspeed_py_aio_write(const torch::Tensor& buffer,
                           const char* filename,
                           const int block_size,
                           const int queue_depth,
                           const bool single_submit,
                           const bool overlap_events,
                           const bool validate);

int deepspeed_py_aio_read(torch::Tensor& buffer,
                          const char* filename,
                          const int block_size,
                          const int queue_depth,
                          const bool single_submit,
                          const bool overlap_events,
                          const bool validate);

Import

#include "deepspeed_py_aio.h"

I/O Contract

Inputs

Name	Type	Required	Description
buffer	torch::Tensor	Yes	PyTorch tensor containing data to write or buffer for reading
filename	const char*	Yes	Path to file for I/O operation
block_size	int	Yes	Size of each I/O block in bytes
queue_depth	int	Yes	Maximum number of concurrent I/O operations
single_submit	bool	Yes	Submit iocbs individually (true) or as batch (false)
overlap_events	bool	Yes	Overlap submission and completion for better performance
validate	bool	Yes	Validate I/O correctness by comparing with regular read

Outputs

Name	Type	Description
return_code	int	0 on success, -1 on error
buffer	torch::Tensor	Populated with file data (for read operations)
timing_stats	stdout	Prints elapsed time for AIO and total function call

Usage Examples

import torch
from deepspeed.ops.aio import aio_write, aio_read

# Write tensor to NVMe
tensor = torch.randn(1024, 1024).cuda()
tensor_cpu = tensor.cpu()
aio_write(tensor_cpu,
         "/nvme/checkpoint.pt",
         block_size=1024*1024,      # 1MB blocks
         queue_depth=128,
         single_submit=False,
         overlap_events=True,
         validate=False)

# Read tensor from NVMe
buffer = torch.empty_like(tensor_cpu)
aio_read(buffer,
        "/nvme/checkpoint.pt",
        block_size=1024*1024,
        queue_depth=128,
        single_submit=False,
        overlap_events=True,
        validate=True)  # Verify correctness

# Simple write with default settings
aio_write(tensor_cpu, "/nvme/state.pt", 128*1024, 32, False, False, False)

// C++ usage
auto tensor = torch::randn({1024, 1024});
const char* filename = "/nvme/checkpoint.pt";

// Write operation
int ret = deepspeed_py_aio_write(tensor, filename,
                                 1024*1024,  // 1MB block size
                                 128,        // queue depth
                                 false,      // batch submit
                                 true,       // overlap events
                                 false);     // no validation

// Read operation
auto buffer = torch::empty_like(tensor);
ret = deepspeed_py_aio_read(buffer, filename,
                            1024*1024, 128, false, true, true);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment