Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Deepspeedai DeepSpeed AIO Bench Perf Sweep

From Leeroopedia


Knowledge Sources
Domains Async_IO, NVMe_Offload
Last Updated 2026-02-09 00:00 GMT

Overview

Performance benchmarking tool that sweeps through multiple configuration parameters to characterize DeepSpeed AIO library performance.

Description

This Python script provides a comprehensive performance evaluation framework for the DeepSpeed asynchronous I/O library. It performs parameter sweeps across multiple dimensions including block size, queue depth, sequential vs. overlapped request modes, single vs. batch submission, and intra-operation parallelism levels. The tool generates performance logs for both read and write operations, optionally flushes the page cache between runs for accurate read measurements, and supports both CPU and GPU tensor transfers as well as GPUDirect Storage (GDS) operations.

The sweep creates a Cartesian product of all parameter combinations, runs each configuration through the test harness, and logs results to organized directories. It's designed to help users identify optimal configuration parameters for their specific hardware and workload characteristics.

Usage

Use this tool to benchmark and tune DeepSpeed AIO performance on your NVMe hardware before deploying large-scale training jobs. Run parameter sweeps to find optimal block sizes, queue depths, and parallelism settings for your specific storage devices and model checkpointing patterns.

Code Reference

Source Location

Signature

class SweepConfig:
    def __init__(self, args):
        # Configures sweep parameters from command-line arguments

def get_sweep_cmd_lines(sweep_config_dict):
    # Generates command line combinations for parameter sweep

def run_read_sweep(sweep_config, flush_cache_job, sync_job, cmd_lines):
    # Executes read performance sweep

def run_write_sweep(sweep_config, flush_cache_job, sync_job, cmd_lines):
    # Executes write performance sweep

Import

from ds_aio_job import Job, run_job
from perf_sweep_utils import READ_OP_DESC, WRITE_OP_DESC, BENCH_LOG_DIR
from deepspeed.ops.op_builder import AsyncIOBuilder

I/O Contract

Inputs

Name Type Required Description
nvme_dir list[str] Yes List of NVMe mount points for testing
sweep_config str No JSON file with custom sweep parameters
no_read bool No Disable read performance measurements
no_write bool No Disable write performance measurements
io_size str No Size of I/O operations (default: 400M)
gpu bool No Test GPU-to-NVMe tensor transfers
gds bool No Use GPUDirect Storage operator
no_sudo bool No Run without page cache flushing
log_dir str No Output directory for logs
loops int No Number of repetitions per configuration

Outputs

Name Type Description
log_files text files Performance results organized by operation and configuration
read_logs directory Contains read operation benchmarks
write_logs directory Contains write operation benchmarks

Usage Examples

# Basic performance sweep on two NVMe devices
python aio_bench_perf_sweep.py \
    --nvme_dir /mnt/nvme0 /mnt/nvme1

# Custom sweep with GPU tensors and larger I/O
python aio_bench_perf_sweep.py \
    --nvme_dir /mnt/nvme0 \
    --gpu \
    --io_size 1G \
    --loops 5

# Sweep with custom configuration JSON
python aio_bench_perf_sweep.py \
    --nvme_dir /mnt/nvme0 \
    --sweep_config custom_sweep.json

# GPUDirect Storage sweep
python aio_bench_perf_sweep.py \
    --nvme_dir /mnt/nvme0 \
    --gds \
    --gpu

# Write-only sweep without sudo (no cache flushing)
python aio_bench_perf_sweep.py \
    --nvme_dir /mnt/nvme0 \
    --no_read \
    --no_sudo

# Example custom sweep configuration JSON:
# {
#   "block_size": ["128K", "256K", "1M"],
#   "queue_depth": [32, 64, 128, 256],
#   "sequential_requests": [true, false],
#   "single_submit": [false],
#   "io_parallel": [1, 2, 4, 8]
# }

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment