Implementation:FMInference FlexLLMGen DeepSpeed AIO Common
| Knowledge Sources | |
|---|---|
| Domains | Async IO, NVMe Storage, Systems Programming |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
C++ common layer implementing Linux AIO-based asynchronous read and write operations for swapping optimizer tensors to and from NVMe storage devices.
Description
This file provides the core asynchronous I/O implementation used by DeepSpeed's NVMe offloading subsystem. It implements two primary operation modes: sequential I/O (do_aio_operation_sequential) which submits I/O blocks in queue-depth-sized batches and waits for each batch to complete before proceeding, and overlapped I/O (do_aio_operation_overlap) which maintains a pipeline of submitted I/O requests and reaps completions as they arrive, enabling overlap between submission and completion.
The implementation uses the Linux AIO (libaio) interface directly via io_submit and io_getevents system calls, with O_DIRECT file access to bypass the kernel page cache for predictable I/O performance. It supports both single-submit mode (one iocb per io_submit call) and block-submit mode (multiple iocbs in a single call). Performance statistics including submit latency, completion latency, end-to-end throughput, and transfer rate in GB/s are collected and reported.
Utility functions for file operations (open_file, regular_read) and validation (validate_aio_operation) are also provided to support testing and correctness verification.
Usage
This module is a vendored dependency within the FlexLLMGen benchmark infrastructure. It provides the low-level I/O primitives that enable DeepSpeed's NVMe tensor swapping, which FlexLLMGen benchmarks against as a baseline for its own offloading implementation.
Code Reference
Source Location
- Repository: FMInference_FlexLLMGen
- File: benchmark/third_party/DeepSpeed/csrc/aio/common/deepspeed_aio_common.cpp
- Lines: 1-333
Signature
// Sequential AIO operation: submits I/O in batches, waits for each batch
void do_aio_operation_sequential(const bool read_op,
std::unique_ptr<aio_context>& aio_ctxt,
std::unique_ptr<io_xfer_ctxt>& xfer_ctxt,
deepspeed_aio_config_t* config,
deepspeed_aio_perf_t* perf);
// Overlapped AIO operation: pipelines submissions and completions
void do_aio_operation_overlap(const bool read_op,
std::unique_ptr<aio_context>& aio_ctxt,
std::unique_ptr<io_xfer_ctxt>& xfer_ctxt,
deepspeed_aio_config_t* config,
deepspeed_aio_perf_t* perf);
// File utilities
int open_file(const char* filename, const bool read_op);
int regular_read(const char* filename, std::vector<char>& buffer);
bool validate_aio_operation(const bool read_op, const char* filename,
void* aio_buffer, const long long int num_bytes);
void report_file_error(const char* filename, const std::string file_op,
const int error_code);
Import
#include "deepspeed_aio_common.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| read_op | bool | Yes | True for read operations, false for write operations |
| aio_ctxt | unique_ptr<aio_context> | Yes | AIO context holding the Linux io_context_t, iocb array, io_events, block size, and queue depth |
| xfer_ctxt | unique_ptr<io_xfer_ctxt> | Yes | Transfer context specifying the memory buffer, file descriptor, and number of bytes |
| config | deepspeed_aio_config_t* | Yes | Configuration with single_submit flag and other AIO parameters |
| perf | deepspeed_aio_perf_t* | No | Optional performance output structure for latency and throughput metrics |
Outputs
| Name | Type | Description |
|---|---|---|
| perf->_submit | deepspeed_aio_latency_t | Min/max/avg submit latency in microseconds |
| perf->_complete | deepspeed_aio_latency_t | Min/max/avg completion latency in microseconds |
| perf->_e2e_usec | double | End-to-end operation time in microseconds |
| perf->_e2e_rate_GB | double | End-to-end transfer rate in GB/s |
Usage Examples
// Example: Sequential read from NVMe
auto aio_ctxt = std::make_unique<aio_context>(block_size, queue_depth);
auto xfer_ctxt = std::make_unique<io_xfer_ctxt>(fd, buffer, num_bytes);
deepspeed_aio_config_t config;
config._single_submit = false;
deepspeed_aio_perf_t perf;
do_aio_operation_sequential(/*read_op=*/true, aio_ctxt, xfer_ctxt, &config, &perf);
printf("Transfer rate: %.2f GB/s, Latency: %.2f usec\n",
perf._e2e_rate_GB, perf._e2e_usec);