Implementation:FMInference FlexLLMGen DeepSpeed AIO Perf Sweep
| Knowledge Sources | |
|---|---|
| Domains | Benchmarking, NVMe Storage, Performance Testing |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
Python-based performance sweep harness that systematically benchmarks DeepSpeed's async I/O library across a combinatorial search space of configuration parameters.
Description
This script automates the process of finding optimal async I/O configurations for NVMe tensor swapping. It generates a combinatorial sweep across tunable parameters including block size, queue depth, overlap events mode, I/O parallelism, and single-submit mode. For each configuration point, it runs the underlying test_ds_aio.py benchmark script and captures performance results to structured log files.
The sweep is organized into two phases: a read sweep that creates a test file via dd and measures read throughput, and a write sweep that measures write throughput. Between each test, the system calls sync and optionally flushes the page cache (via sudo echo 1 > /proc/sys/vm/drop_caches) to ensure measurements reflect true storage device performance rather than cached data. Results are organized into a directory structure with descriptive log filenames encoding the configuration parameters.
Key classes include Job (encapsulating a subprocess command with output redirection) and SweepConfig (managing the sweep parameter space, I/O size, and output directories).
Usage
Run this sweep to characterize NVMe I/O performance on a specific hardware configuration before deploying offloaded inference. The optimal parameters discovered by this sweep inform the AIO configuration used in DeepSpeed's ZeRO-Inference and FlexLLMGen's offloading pipeline.
Code Reference
Source Location
- Repository: FMInference_FlexLLMGen
- File: benchmark/third_party/DeepSpeed/csrc/aio/py_test/aio_bench_perf_sweep.py
- Lines: 1-396
Signature
def main():
"""Entry point: runs read and write performance sweeps."""
class Job(object):
def __init__(self, cmd_line, output_file=None, work_dir=None):
"""Encapsulates a subprocess command with optional output redirection."""
class SweepConfig(object):
def __init__(self, args):
"""Manages sweep parameters: nvme_dir, io_size, search_space, etc."""
def get_sweep_cmd_lines(sweep_config_dict):
"""Generates combinatorial command lines from sweep parameter space."""
def run_read_sweep(sweep_config, flush_cache_job, sync_job, cmd_lines):
"""Executes read performance sweep with page cache flushing."""
def run_write_sweep(sweep_config, flush_cache_job, sync_job, cmd_lines):
"""Executes write performance sweep."""
Import
# Standalone script. Run from the DeepSpeed AIO py_test directory:
# python aio_bench_perf_sweep.py --nvme_dir /mnt/nvme/test
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| --nvme_dir | str | Yes | Writable directory on an NVMe device for I/O tests |
| --sweep_config | str | No | JSON file specifying the parameter search space (defaults to built-in config) |
| --io_size | str | No (default: 400M) | Total bytes to read/write per test point |
| --no_read | flag | No | Disable read performance measurements |
| --no_write | flag | No | Disable write performance measurements |
| --no_sudo | flag | No | Skip page cache flushing (may inflate read speeds) |
| --log_dir | str | No (default: aio_perf_sweep) | Output directory for performance log files |
| --loops | int | No (default: 1) | Number of repetitions per configuration point |
Outputs
| Name | Type | Description |
|---|---|---|
| log files | text files | One log per configuration point, with filenames encoding parameters (e.g., read_single_overlap_d16_bs256K.txt)
|
| stdout | text | Progress messages and executed command lines |
Usage Examples
# Basic sweep with default configuration
python aio_bench_perf_sweep.py --nvme_dir /mnt/nvme/test_io
# Custom sweep with larger I/O size and multiple loops
python aio_bench_perf_sweep.py \
--nvme_dir /mnt/nvme/test_io \
--io_size 1G \
--loops 3 \
--log_dir /results/aio_perf
# Write-only sweep without sudo (no cache flushing)
python aio_bench_perf_sweep.py \
--nvme_dir /mnt/nvme/test_io \
--no_read \
--no_sudo
# Custom parameter space via JSON config
# sweep_config.json:
# {
# "block_size": ["128K", "256K", "512K", "1M"],
# "queue_depth": [4, 8, 16, 32, 64],
# "overlap_events": [true, false],
# "io_parallel": [1, 2, 4, 8],
# "single_submit": [false]
# }
python aio_bench_perf_sweep.py \
--nvme_dir /mnt/nvme/test_io \
--sweep_config sweep_config.json