Implementation:FMInference FlexLLMGen DeepSpeed Setup

Field	Value
Sources	Repo: FlexLLMGen, Upstream: DeepSpeed
Domains	Build_System, Package_Distribution
Last Updated	2026-02-09 00:00 GMT

Overview

Vendored DeepSpeed package setup script that configures the build system for compiling and installing DeepSpeed with optional pre-compiled CUDA operator extensions, dependency management, and version string generation.

Description

The setup.py file (316 lines) is a vendored copy of DeepSpeed's package build configuration. It orchestrates the compilation of optional CUDA operator extensions, manages dependencies, and produces distributable Python packages.

Key components include:

CUDA operator pre-compilation:
- Iterates over ALL_OPS (registered operator builders) and checks each for compatibility and enabled status via environment variables (DS_BUILD_OPS, DS_BUILD_<op_name>).
- Compatible and enabled ops are compiled as setuptools extensions using BuildExtension with ninja disabled for stability.
- ROCm ops are hipified before compilation on AMD platforms.
- Default behavior differs by platform: JIT on Linux (DS_BUILD_OPS=0), pre-compile on Windows (DS_BUILD_OPS=1).

Dependency management:
- Core dependencies loaded from requirements/requirements.txt.
- Optional extras for 1-bit communication (with CUDA/ROCm-specific cupy), MPI, readthedocs, dev, autotuning, sparse attention, inference, and Stable Diffusion.
- CuPy version is automatically matched to the installed CUDA/ROCm version.

Version string generation:
- Base version read from version.txt.
- Build string from DS_BUILD_STRING env var (for distribution), build.txt (from distribution install), or git hash (for source install).
- Git hash and branch recorded in deepspeed/git_version_info_installed.py for runtime version reporting.

Windows support:
- Creates symbolic links for csrc and op_builder directories.
- Uses a custom MANIFEST template (MANIFEST_win.in).

Cross-compilation support:
- If CUDA is unavailable at build time, sets TORCH_CUDA_ARCH_LIST to default compute capabilities to allow Docker-based builds.

Usage

This setup script is invoked by pip install or python setup.py during the FlexLLMGen benchmark dependency installation. It is part of the vendored third-party DeepSpeed package.

Code Reference

Field	Value
Repository	FlexLLMGen
File	benchmark/third_party/DeepSpeed/setup.py
Lines	1-316
Type	AUTO_KEEP (vendored dependency)

Key configuration:

install_requires = fetch_requirements('requirements/requirements.txt')
extras_require = {
    '1bit': [],
    '1bit_mpi': fetch_requirements('requirements/requirements-1bit-mpi.txt'),
    'dev': fetch_requirements('requirements/requirements-dev.txt'),
    'autotuning': fetch_requirements('requirements/requirements-autotuning.txt'),
    'sparse_attn': fetch_requirements('requirements/requirements-sparse_attn.txt'),
    'inf': fetch_requirements('requirements/requirements-inf.txt'),
    'sd': fetch_requirements('requirements/requirements-sd.txt'),
}

I/O Contract

Inputs

Parameter	Type	Required	Description
DS_BUILD_OPS	env var	No	Enable pre-compilation of CUDA ops (0=JIT, 1=pre-compile)
DS_BUILD_STRING	env var	No	Build string for distribution versioning
TORCH_CUDA_ARCH_LIST	env var	No	CUDA compute capabilities for cross-compilation

Outputs

Output	Type	Description
deepspeed package	wheel/sdist	Installable Python package with optional pre-compiled CUDA extensions
git_version_info	Python module	Runtime-accessible version, git hash, and branch information

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment