Implementation:FMInference FlexLLMGen DeepSpeed Setup
| Field | Value |
|---|---|
| Sources | Repo: FlexLLMGen, Upstream: DeepSpeed |
| Domains | Build_System, Package_Distribution |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Vendored DeepSpeed package setup script that configures the build system for compiling and installing DeepSpeed with optional pre-compiled CUDA operator extensions, dependency management, and version string generation.
Description
The setup.py file (316 lines) is a vendored copy of DeepSpeed's package build configuration. It orchestrates the compilation of optional CUDA operator extensions, manages dependencies, and produces distributable Python packages.
Key components include:
- CUDA operator pre-compilation:
- Iterates over ALL_OPS (registered operator builders) and checks each for compatibility and enabled status via environment variables (DS_BUILD_OPS, DS_BUILD_<op_name>).
- Compatible and enabled ops are compiled as setuptools extensions using BuildExtension with ninja disabled for stability.
- ROCm ops are hipified before compilation on AMD platforms.
- Default behavior differs by platform: JIT on Linux (DS_BUILD_OPS=0), pre-compile on Windows (DS_BUILD_OPS=1).
- Dependency management:
- Core dependencies loaded from requirements/requirements.txt.
- Optional extras for 1-bit communication (with CUDA/ROCm-specific cupy), MPI, readthedocs, dev, autotuning, sparse attention, inference, and Stable Diffusion.
- CuPy version is automatically matched to the installed CUDA/ROCm version.
- Version string generation:
- Base version read from version.txt.
- Build string from DS_BUILD_STRING env var (for distribution), build.txt (from distribution install), or git hash (for source install).
- Git hash and branch recorded in deepspeed/git_version_info_installed.py for runtime version reporting.
- Windows support:
- Creates symbolic links for csrc and op_builder directories.
- Uses a custom MANIFEST template (MANIFEST_win.in).
- Cross-compilation support:
- If CUDA is unavailable at build time, sets TORCH_CUDA_ARCH_LIST to default compute capabilities to allow Docker-based builds.
Usage
This setup script is invoked by pip install or python setup.py during the FlexLLMGen benchmark dependency installation. It is part of the vendored third-party DeepSpeed package.
Code Reference
| Field | Value |
|---|---|
| Repository | FlexLLMGen |
| File | benchmark/third_party/DeepSpeed/setup.py |
| Lines | 1-316 |
| Type | AUTO_KEEP (vendored dependency) |
Key configuration:
install_requires = fetch_requirements('requirements/requirements.txt')
extras_require = {
'1bit': [],
'1bit_mpi': fetch_requirements('requirements/requirements-1bit-mpi.txt'),
'dev': fetch_requirements('requirements/requirements-dev.txt'),
'autotuning': fetch_requirements('requirements/requirements-autotuning.txt'),
'sparse_attn': fetch_requirements('requirements/requirements-sparse_attn.txt'),
'inf': fetch_requirements('requirements/requirements-inf.txt'),
'sd': fetch_requirements('requirements/requirements-sd.txt'),
}
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
| DS_BUILD_OPS | env var | No | Enable pre-compilation of CUDA ops (0=JIT, 1=pre-compile) |
| DS_BUILD_STRING | env var | No | Build string for distribution versioning |
| TORCH_CUDA_ARCH_LIST | env var | No | CUDA compute capabilities for cross-compilation |
Outputs
| Output | Type | Description |
|---|---|---|
| deepspeed package | wheel/sdist | Installable Python package with optional pre-compiled CUDA extensions |
| git_version_info | Python module | Runtime-accessible version, git hash, and branch information |