Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:FMInference FlexLLMGen DeepSpeed AIO Common

From Leeroopedia


Knowledge Sources
Domains Async IO, NVMe Storage, Systems Programming
Last Updated 2026-02-09 12:00 GMT

Overview

C++ common layer implementing Linux AIO-based asynchronous read and write operations for swapping optimizer tensors to and from NVMe storage devices.

Description

This file provides the core asynchronous I/O implementation used by DeepSpeed's NVMe offloading subsystem. It implements two primary operation modes: sequential I/O (do_aio_operation_sequential) which submits I/O blocks in queue-depth-sized batches and waits for each batch to complete before proceeding, and overlapped I/O (do_aio_operation_overlap) which maintains a pipeline of submitted I/O requests and reaps completions as they arrive, enabling overlap between submission and completion.

The implementation uses the Linux AIO (libaio) interface directly via io_submit and io_getevents system calls, with O_DIRECT file access to bypass the kernel page cache for predictable I/O performance. It supports both single-submit mode (one iocb per io_submit call) and block-submit mode (multiple iocbs in a single call). Performance statistics including submit latency, completion latency, end-to-end throughput, and transfer rate in GB/s are collected and reported.

Utility functions for file operations (open_file, regular_read) and validation (validate_aio_operation) are also provided to support testing and correctness verification.

Usage

This module is a vendored dependency within the FlexLLMGen benchmark infrastructure. It provides the low-level I/O primitives that enable DeepSpeed's NVMe tensor swapping, which FlexLLMGen benchmarks against as a baseline for its own offloading implementation.

Code Reference

Source Location

Signature

// Sequential AIO operation: submits I/O in batches, waits for each batch
void do_aio_operation_sequential(const bool read_op,
                                 std::unique_ptr<aio_context>& aio_ctxt,
                                 std::unique_ptr<io_xfer_ctxt>& xfer_ctxt,
                                 deepspeed_aio_config_t* config,
                                 deepspeed_aio_perf_t* perf);

// Overlapped AIO operation: pipelines submissions and completions
void do_aio_operation_overlap(const bool read_op,
                              std::unique_ptr<aio_context>& aio_ctxt,
                              std::unique_ptr<io_xfer_ctxt>& xfer_ctxt,
                              deepspeed_aio_config_t* config,
                              deepspeed_aio_perf_t* perf);

// File utilities
int open_file(const char* filename, const bool read_op);
int regular_read(const char* filename, std::vector<char>& buffer);
bool validate_aio_operation(const bool read_op, const char* filename,
                            void* aio_buffer, const long long int num_bytes);
void report_file_error(const char* filename, const std::string file_op,
                       const int error_code);

Import

#include "deepspeed_aio_common.h"

I/O Contract

Inputs

Name Type Required Description
read_op bool Yes True for read operations, false for write operations
aio_ctxt unique_ptr<aio_context> Yes AIO context holding the Linux io_context_t, iocb array, io_events, block size, and queue depth
xfer_ctxt unique_ptr<io_xfer_ctxt> Yes Transfer context specifying the memory buffer, file descriptor, and number of bytes
config deepspeed_aio_config_t* Yes Configuration with single_submit flag and other AIO parameters
perf deepspeed_aio_perf_t* No Optional performance output structure for latency and throughput metrics

Outputs

Name Type Description
perf->_submit deepspeed_aio_latency_t Min/max/avg submit latency in microseconds
perf->_complete deepspeed_aio_latency_t Min/max/avg completion latency in microseconds
perf->_e2e_usec double End-to-end operation time in microseconds
perf->_e2e_rate_GB double End-to-end transfer rate in GB/s

Usage Examples

// Example: Sequential read from NVMe
auto aio_ctxt = std::make_unique<aio_context>(block_size, queue_depth);
auto xfer_ctxt = std::make_unique<io_xfer_ctxt>(fd, buffer, num_bytes);
deepspeed_aio_config_t config;
config._single_submit = false;
deepspeed_aio_perf_t perf;

do_aio_operation_sequential(/*read_op=*/true, aio_ctxt, xfer_ctxt, &config, &perf);

printf("Transfer rate: %.2f GB/s, Latency: %.2f usec\n",
       perf._e2e_rate_GB, perf._e2e_usec);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment