Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:FMInference FlexLLMGen DeepSpeed AIO Perf Sweep

From Leeroopedia


Knowledge Sources
Domains Benchmarking, NVMe Storage, Performance Testing
Last Updated 2026-02-09 12:00 GMT

Overview

Python-based performance sweep harness that systematically benchmarks DeepSpeed's async I/O library across a combinatorial search space of configuration parameters.

Description

This script automates the process of finding optimal async I/O configurations for NVMe tensor swapping. It generates a combinatorial sweep across tunable parameters including block size, queue depth, overlap events mode, I/O parallelism, and single-submit mode. For each configuration point, it runs the underlying test_ds_aio.py benchmark script and captures performance results to structured log files.

The sweep is organized into two phases: a read sweep that creates a test file via dd and measures read throughput, and a write sweep that measures write throughput. Between each test, the system calls sync and optionally flushes the page cache (via sudo echo 1 > /proc/sys/vm/drop_caches) to ensure measurements reflect true storage device performance rather than cached data. Results are organized into a directory structure with descriptive log filenames encoding the configuration parameters.

Key classes include Job (encapsulating a subprocess command with output redirection) and SweepConfig (managing the sweep parameter space, I/O size, and output directories).

Usage

Run this sweep to characterize NVMe I/O performance on a specific hardware configuration before deploying offloaded inference. The optimal parameters discovered by this sweep inform the AIO configuration used in DeepSpeed's ZeRO-Inference and FlexLLMGen's offloading pipeline.

Code Reference

Source Location

Signature

def main():
    """Entry point: runs read and write performance sweeps."""

class Job(object):
    def __init__(self, cmd_line, output_file=None, work_dir=None):
        """Encapsulates a subprocess command with optional output redirection."""

class SweepConfig(object):
    def __init__(self, args):
        """Manages sweep parameters: nvme_dir, io_size, search_space, etc."""

def get_sweep_cmd_lines(sweep_config_dict):
    """Generates combinatorial command lines from sweep parameter space."""

def run_read_sweep(sweep_config, flush_cache_job, sync_job, cmd_lines):
    """Executes read performance sweep with page cache flushing."""

def run_write_sweep(sweep_config, flush_cache_job, sync_job, cmd_lines):
    """Executes write performance sweep."""

Import

# Standalone script. Run from the DeepSpeed AIO py_test directory:
# python aio_bench_perf_sweep.py --nvme_dir /mnt/nvme/test

I/O Contract

Inputs

Name Type Required Description
--nvme_dir str Yes Writable directory on an NVMe device for I/O tests
--sweep_config str No JSON file specifying the parameter search space (defaults to built-in config)
--io_size str No (default: 400M) Total bytes to read/write per test point
--no_read flag No Disable read performance measurements
--no_write flag No Disable write performance measurements
--no_sudo flag No Skip page cache flushing (may inflate read speeds)
--log_dir str No (default: aio_perf_sweep) Output directory for performance log files
--loops int No (default: 1) Number of repetitions per configuration point

Outputs

Name Type Description
log files text files One log per configuration point, with filenames encoding parameters (e.g., read_single_overlap_d16_bs256K.txt)
stdout text Progress messages and executed command lines

Usage Examples

# Basic sweep with default configuration
python aio_bench_perf_sweep.py --nvme_dir /mnt/nvme/test_io

# Custom sweep with larger I/O size and multiple loops
python aio_bench_perf_sweep.py \
    --nvme_dir /mnt/nvme/test_io \
    --io_size 1G \
    --loops 3 \
    --log_dir /results/aio_perf

# Write-only sweep without sudo (no cache flushing)
python aio_bench_perf_sweep.py \
    --nvme_dir /mnt/nvme/test_io \
    --no_read \
    --no_sudo

# Custom parameter space via JSON config
# sweep_config.json:
# {
#   "block_size": ["128K", "256K", "512K", "1M"],
#   "queue_depth": [4, 8, 16, 32, 64],
#   "overlap_events": [true, false],
#   "io_parallel": [1, 2, 4, 8],
#   "single_submit": [false]
# }
python aio_bench_perf_sweep.py \
    --nvme_dir /mnt/nvme/test_io \
    --sweep_config sweep_config.json

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment