Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:FMInference FlexLLMGen DeepSpeed Setup

From Leeroopedia


Field Value
Sources Repo: FlexLLMGen, Upstream: DeepSpeed
Domains Build_System, Package_Distribution
Last Updated 2026-02-09 00:00 GMT

Overview

Vendored DeepSpeed package setup script that configures the build system for compiling and installing DeepSpeed with optional pre-compiled CUDA operator extensions, dependency management, and version string generation.

Description

The setup.py file (316 lines) is a vendored copy of DeepSpeed's package build configuration. It orchestrates the compilation of optional CUDA operator extensions, manages dependencies, and produces distributable Python packages.

Key components include:

  • CUDA operator pre-compilation:
    • Iterates over ALL_OPS (registered operator builders) and checks each for compatibility and enabled status via environment variables (DS_BUILD_OPS, DS_BUILD_<op_name>).
    • Compatible and enabled ops are compiled as setuptools extensions using BuildExtension with ninja disabled for stability.
    • ROCm ops are hipified before compilation on AMD platforms.
    • Default behavior differs by platform: JIT on Linux (DS_BUILD_OPS=0), pre-compile on Windows (DS_BUILD_OPS=1).
  • Dependency management:
    • Core dependencies loaded from requirements/requirements.txt.
    • Optional extras for 1-bit communication (with CUDA/ROCm-specific cupy), MPI, readthedocs, dev, autotuning, sparse attention, inference, and Stable Diffusion.
    • CuPy version is automatically matched to the installed CUDA/ROCm version.
  • Version string generation:
    • Base version read from version.txt.
    • Build string from DS_BUILD_STRING env var (for distribution), build.txt (from distribution install), or git hash (for source install).
    • Git hash and branch recorded in deepspeed/git_version_info_installed.py for runtime version reporting.
  • Windows support:
    • Creates symbolic links for csrc and op_builder directories.
    • Uses a custom MANIFEST template (MANIFEST_win.in).
  • Cross-compilation support:
    • If CUDA is unavailable at build time, sets TORCH_CUDA_ARCH_LIST to default compute capabilities to allow Docker-based builds.

Usage

This setup script is invoked by pip install or python setup.py during the FlexLLMGen benchmark dependency installation. It is part of the vendored third-party DeepSpeed package.

Code Reference

Field Value
Repository FlexLLMGen
File benchmark/third_party/DeepSpeed/setup.py
Lines 1-316
Type AUTO_KEEP (vendored dependency)

Key configuration:

install_requires = fetch_requirements('requirements/requirements.txt')
extras_require = {
    '1bit': [],
    '1bit_mpi': fetch_requirements('requirements/requirements-1bit-mpi.txt'),
    'dev': fetch_requirements('requirements/requirements-dev.txt'),
    'autotuning': fetch_requirements('requirements/requirements-autotuning.txt'),
    'sparse_attn': fetch_requirements('requirements/requirements-sparse_attn.txt'),
    'inf': fetch_requirements('requirements/requirements-inf.txt'),
    'sd': fetch_requirements('requirements/requirements-sd.txt'),
}

I/O Contract

Inputs

Parameter Type Required Description
DS_BUILD_OPS env var No Enable pre-compilation of CUDA ops (0=JIT, 1=pre-compile)
DS_BUILD_STRING env var No Build string for distribution versioning
TORCH_CUDA_ARCH_LIST env var No CUDA compute capabilities for cross-compilation

Outputs

Output Type Description
deepspeed package wheel/sdist Installable Python package with optional pre-compiled CUDA extensions
git_version_info Python module Runtime-accessible version, git hash, and branch information

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment