Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Deepspeedai DeepSpeed Py AIO

From Leeroopedia


Knowledge Sources
Domains Async_IO, NVMe_Offload
Last Updated 2026-02-09 00:00 GMT

Overview

Simplified Python-callable functions for single-operation synchronous asynchronous I/O on PyTorch tensors without handle management.

Description

This module provides standalone C++ functions for performing one-off asynchronous read and write operations on PyTorch tensors. Unlike the handle-based interface, these functions (deepspeed_py_aio_read and deepspeed_py_aio_write) create temporary AIO contexts for each operation, making them suitable for simple use cases where creating and managing an I/O handle is unnecessary overhead. The functions support configurable block sizes, queue depths, submission modes, and optional validation.

These are convenience wrappers around the core AIO operations that handle all setup and teardown automatically. They measure and report timing statistics for both the AIO operation itself and the complete function call including validation. The implementation supports both sequential and overlapped event modes.

Usage

Use these functions for simple checkpoint loading/saving scenarios where you need to perform a single read or write operation without the overhead of creating an I/O handle. For repeated operations or when you need asynchronous execution with manual wait control, use the handle-based interface instead.

Code Reference

Source Location

Signature

int deepspeed_py_aio_write(const torch::Tensor& buffer,
                           const char* filename,
                           const int block_size,
                           const int queue_depth,
                           const bool single_submit,
                           const bool overlap_events,
                           const bool validate);

int deepspeed_py_aio_read(torch::Tensor& buffer,
                          const char* filename,
                          const int block_size,
                          const int queue_depth,
                          const bool single_submit,
                          const bool overlap_events,
                          const bool validate);

Import

#include "deepspeed_py_aio.h"

I/O Contract

Inputs

Name Type Required Description
buffer torch::Tensor Yes PyTorch tensor containing data to write or buffer for reading
filename const char* Yes Path to file for I/O operation
block_size int Yes Size of each I/O block in bytes
queue_depth int Yes Maximum number of concurrent I/O operations
single_submit bool Yes Submit iocbs individually (true) or as batch (false)
overlap_events bool Yes Overlap submission and completion for better performance
validate bool Yes Validate I/O correctness by comparing with regular read

Outputs

Name Type Description
return_code int 0 on success, -1 on error
buffer torch::Tensor Populated with file data (for read operations)
timing_stats stdout Prints elapsed time for AIO and total function call

Usage Examples

import torch
from deepspeed.ops.aio import aio_write, aio_read

# Write tensor to NVMe
tensor = torch.randn(1024, 1024).cuda()
tensor_cpu = tensor.cpu()
aio_write(tensor_cpu,
         "/nvme/checkpoint.pt",
         block_size=1024*1024,      # 1MB blocks
         queue_depth=128,
         single_submit=False,
         overlap_events=True,
         validate=False)

# Read tensor from NVMe
buffer = torch.empty_like(tensor_cpu)
aio_read(buffer,
        "/nvme/checkpoint.pt",
        block_size=1024*1024,
        queue_depth=128,
        single_submit=False,
        overlap_events=True,
        validate=True)  # Verify correctness

# Simple write with default settings
aio_write(tensor_cpu, "/nvme/state.pt", 128*1024, 32, False, False, False)
// C++ usage
auto tensor = torch::randn({1024, 1024});
const char* filename = "/nvme/checkpoint.pt";

// Write operation
int ret = deepspeed_py_aio_write(tensor, filename,
                                 1024*1024,  // 1MB block size
                                 128,        // queue depth
                                 false,      // batch submit
                                 true,       // overlap events
                                 false);     // no validation

// Read operation
auto buffer = torch::empty_like(tensor);
ret = deepspeed_py_aio_read(buffer, filename,
                            1024*1024, 128, false, true, true);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment