Implementation:Deepspeedai DeepSpeed AIO Bench Perf Sweep

Knowledge Sources	DeepSpeed
Domains	Async_IO, NVMe_Offload
Last Updated	2026-02-09 00:00 GMT

Overview

Performance benchmarking tool that sweeps through multiple configuration parameters to characterize DeepSpeed AIO library performance.

Description

This Python script provides a comprehensive performance evaluation framework for the DeepSpeed asynchronous I/O library. It performs parameter sweeps across multiple dimensions including block size, queue depth, sequential vs. overlapped request modes, single vs. batch submission, and intra-operation parallelism levels. The tool generates performance logs for both read and write operations, optionally flushes the page cache between runs for accurate read measurements, and supports both CPU and GPU tensor transfers as well as GPUDirect Storage (GDS) operations.

The sweep creates a Cartesian product of all parameter combinations, runs each configuration through the test harness, and logs results to organized directories. It's designed to help users identify optimal configuration parameters for their specific hardware and workload characteristics.

Usage

Use this tool to benchmark and tune DeepSpeed AIO performance on your NVMe hardware before deploying large-scale training jobs. Run parameter sweeps to find optimal block sizes, queue depths, and parallelism settings for your specific storage devices and model checkpointing patterns.

Code Reference

Source Location

Repository: DeepSpeed
File: csrc/aio/py_test/aio_bench_perf_sweep.py

Signature

class SweepConfig:
    def __init__(self, args):
        # Configures sweep parameters from command-line arguments

def get_sweep_cmd_lines(sweep_config_dict):
    # Generates command line combinations for parameter sweep

def run_read_sweep(sweep_config, flush_cache_job, sync_job, cmd_lines):
    # Executes read performance sweep

def run_write_sweep(sweep_config, flush_cache_job, sync_job, cmd_lines):
    # Executes write performance sweep

Import

from ds_aio_job import Job, run_job
from perf_sweep_utils import READ_OP_DESC, WRITE_OP_DESC, BENCH_LOG_DIR
from deepspeed.ops.op_builder import AsyncIOBuilder

I/O Contract

Inputs

Name	Type	Required	Description
nvme_dir	list[str]	Yes	List of NVMe mount points for testing
sweep_config	str	No	JSON file with custom sweep parameters
no_read	bool	No	Disable read performance measurements
no_write	bool	No	Disable write performance measurements
io_size	str	No	Size of I/O operations (default: 400M)
gpu	bool	No	Test GPU-to-NVMe tensor transfers
gds	bool	No	Use GPUDirect Storage operator
no_sudo	bool	No	Run without page cache flushing
log_dir	str	No	Output directory for logs
loops	int	No	Number of repetitions per configuration

Outputs

Name	Type	Description
log_files	text files	Performance results organized by operation and configuration
read_logs	directory	Contains read operation benchmarks
write_logs	directory	Contains write operation benchmarks

Usage Examples

# Basic performance sweep on two NVMe devices
python aio_bench_perf_sweep.py \
    --nvme_dir /mnt/nvme0 /mnt/nvme1

# Custom sweep with GPU tensors and larger I/O
python aio_bench_perf_sweep.py \
    --nvme_dir /mnt/nvme0 \
    --gpu \
    --io_size 1G \
    --loops 5

# Sweep with custom configuration JSON
python aio_bench_perf_sweep.py \
    --nvme_dir /mnt/nvme0 \
    --sweep_config custom_sweep.json

# GPUDirect Storage sweep
python aio_bench_perf_sweep.py \
    --nvme_dir /mnt/nvme0 \
    --gds \
    --gpu

# Write-only sweep without sudo (no cache flushing)
python aio_bench_perf_sweep.py \
    --nvme_dir /mnt/nvme0 \
    --no_read \
    --no_sudo

# Example custom sweep configuration JSON:
# {
#   "block_size": ["128K", "256K", "1M"],
#   "queue_depth": [32, 64, 128, 256],
#   "sequential_requests": [true, false],
#   "single_submit": [false],
#   "io_parallel": [1, 2, 4, 8]
# }

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment