Implementation:Deepspeedai DeepSpeed AIO Bench Perf Sweep
| Knowledge Sources | |
|---|---|
| Domains | Async_IO, NVMe_Offload |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Performance benchmarking tool that sweeps through multiple configuration parameters to characterize DeepSpeed AIO library performance.
Description
This Python script provides a comprehensive performance evaluation framework for the DeepSpeed asynchronous I/O library. It performs parameter sweeps across multiple dimensions including block size, queue depth, sequential vs. overlapped request modes, single vs. batch submission, and intra-operation parallelism levels. The tool generates performance logs for both read and write operations, optionally flushes the page cache between runs for accurate read measurements, and supports both CPU and GPU tensor transfers as well as GPUDirect Storage (GDS) operations.
The sweep creates a Cartesian product of all parameter combinations, runs each configuration through the test harness, and logs results to organized directories. It's designed to help users identify optimal configuration parameters for their specific hardware and workload characteristics.
Usage
Use this tool to benchmark and tune DeepSpeed AIO performance on your NVMe hardware before deploying large-scale training jobs. Run parameter sweeps to find optimal block sizes, queue depths, and parallelism settings for your specific storage devices and model checkpointing patterns.
Code Reference
Source Location
- Repository: DeepSpeed
- File: csrc/aio/py_test/aio_bench_perf_sweep.py
Signature
class SweepConfig:
def __init__(self, args):
# Configures sweep parameters from command-line arguments
def get_sweep_cmd_lines(sweep_config_dict):
# Generates command line combinations for parameter sweep
def run_read_sweep(sweep_config, flush_cache_job, sync_job, cmd_lines):
# Executes read performance sweep
def run_write_sweep(sweep_config, flush_cache_job, sync_job, cmd_lines):
# Executes write performance sweep
Import
from ds_aio_job import Job, run_job
from perf_sweep_utils import READ_OP_DESC, WRITE_OP_DESC, BENCH_LOG_DIR
from deepspeed.ops.op_builder import AsyncIOBuilder
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| nvme_dir | list[str] | Yes | List of NVMe mount points for testing |
| sweep_config | str | No | JSON file with custom sweep parameters |
| no_read | bool | No | Disable read performance measurements |
| no_write | bool | No | Disable write performance measurements |
| io_size | str | No | Size of I/O operations (default: 400M) |
| gpu | bool | No | Test GPU-to-NVMe tensor transfers |
| gds | bool | No | Use GPUDirect Storage operator |
| no_sudo | bool | No | Run without page cache flushing |
| log_dir | str | No | Output directory for logs |
| loops | int | No | Number of repetitions per configuration |
Outputs
| Name | Type | Description |
|---|---|---|
| log_files | text files | Performance results organized by operation and configuration |
| read_logs | directory | Contains read operation benchmarks |
| write_logs | directory | Contains write operation benchmarks |
Usage Examples
# Basic performance sweep on two NVMe devices
python aio_bench_perf_sweep.py \
--nvme_dir /mnt/nvme0 /mnt/nvme1
# Custom sweep with GPU tensors and larger I/O
python aio_bench_perf_sweep.py \
--nvme_dir /mnt/nvme0 \
--gpu \
--io_size 1G \
--loops 5
# Sweep with custom configuration JSON
python aio_bench_perf_sweep.py \
--nvme_dir /mnt/nvme0 \
--sweep_config custom_sweep.json
# GPUDirect Storage sweep
python aio_bench_perf_sweep.py \
--nvme_dir /mnt/nvme0 \
--gds \
--gpu
# Write-only sweep without sudo (no cache flushing)
python aio_bench_perf_sweep.py \
--nvme_dir /mnt/nvme0 \
--no_read \
--no_sudo
# Example custom sweep configuration JSON:
# {
# "block_size": ["128K", "256K", "1M"],
# "queue_depth": [32, 64, 128, 256],
# "sequential_requests": [true, false],
# "single_submit": [false],
# "io_parallel": [1, 2, 4, 8]
# }