Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:FMInference FlexLLMGen DeepSpeed Op Builder

From Leeroopedia


Field Value
Sources Repo: FlexLLMGen, Upstream: DeepSpeed
Domains Build_System, CUDA_Operations
Last Updated 2026-02-09 00:00 GMT

Overview

Vendored DeepSpeed module providing the abstract base class and build infrastructure for compiling custom CUDA/C++ operator extensions, supporting both ahead-of-time compilation and JIT (just-in-time) compilation.

Description

The builder.py file (699 lines) is a vendored copy of DeepSpeed's operator builder system. It defines the OpBuilder abstract base class and supporting infrastructure for compiling custom CUDA operators that accelerate DeepSpeed's kernels (fused Adam, transformer inference, quantization, etc.).

Key components include:

  • OpBuilder (abstract base class) -- Defines the interface for all operator builders:
    • absolute_name() -- Returns the fully-qualified module path for pre-installed ops (e.g., deepspeed.ops.adam.cpu_adam).
    • sources() -- Returns list of C++/CUDA source files to compile.
    • include_paths() -- Returns list of include directories.
    • nvcc_args() / cxx_args() -- Returns compiler flags for CUDA and C++ compilation respectively.
    • is_compatible() -- Checks whether the op can be compiled on the current system (checking for required tools and libraries).
    • load() -- Attempts to load a pre-compiled op, falling back to JIT compilation via jit_load().
    • builder() -- Returns a setuptools Extension object for ahead-of-time compilation via setup.py.
  • CUDA version management:
    • installed_cuda_version() -- Detects the system CUDA version from nvcc.
    • assert_no_cuda_mismatch() -- Validates that the system CUDA version matches the PyTorch CUDA version, with tolerance for compatible minor versions.
    • get_default_compute_capabilities() -- Returns CUDA compute capability targets (6.0, 6.1, 7.0 for pre-11.x; adding 8.0, 8.6 for 11.x+).
    • cuda_minor_mismatch_ok -- Lookup table of compatible CUDA minor versions within major versions.
  • ROCm support:
    • is_rocm_pytorch() -- Detects AMD ROCm PyTorch builds.
    • installed_rocm_version() -- Detects the system ROCm version.
    • hipify_extension() -- Hook for converting CUDA sources to HIP for AMD GPUs.
  • Build path management:
    • DEFAULT_TORCH_EXTENSION_PATH -- Default JIT compilation cache directory (/tmp/torch_extensions).
    • Proper handling of version mismatches between compile-time and runtime PyTorch/CUDA versions.

Usage

Concrete op builders (e.g., FusedAdamBuilder, TransformerBuilder, QuantizerBuilder) inherit from OpBuilder and are registered in the ALL_OPS dictionary. They are invoked during setup.py (ahead-of-time) or at first use (JIT). This module is part of the vendored benchmark dependencies in FlexLLMGen.

Code Reference

Field Value
Repository FlexLLMGen
File benchmark/third_party/DeepSpeed/op_builder/builder.py
Lines 1-699
Type AUTO_KEEP (vendored dependency)

Key class signature:

class OpBuilder(ABC):
    def __init__(self, name):
        self.name = name
        self.jit_mode = False
        self.error_log = None

    @abstractmethod
    def absolute_name(self): ...

    @abstractmethod
    def sources(self): ...

    def load(self, verbose=True): ...
    def jit_load(self, verbose=True): ...
    def builder(self): ...
    def is_compatible(self, verbose=True): ...

I/O Contract

Inputs

Parameter Type Required Description
name str Yes Name of the operator (e.g., 'cpu_adam', 'transformer')
sources List[str] Yes Paths to C++/CUDA source files (abstract, provided by subclass)
verbose bool No Enable verbose build logging (default: True)

Outputs

Output Type Description
loaded module module Compiled and loaded Python extension module
Extension setuptools.Extension Build extension for ahead-of-time compilation
is_compatible bool Whether the op can be compiled on the current system

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment