Implementation:FMInference FlexLLMGen DeepSpeed Op Builder
| Field | Value |
|---|---|
| Sources | Repo: FlexLLMGen, Upstream: DeepSpeed |
| Domains | Build_System, CUDA_Operations |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Vendored DeepSpeed module providing the abstract base class and build infrastructure for compiling custom CUDA/C++ operator extensions, supporting both ahead-of-time compilation and JIT (just-in-time) compilation.
Description
The builder.py file (699 lines) is a vendored copy of DeepSpeed's operator builder system. It defines the OpBuilder abstract base class and supporting infrastructure for compiling custom CUDA operators that accelerate DeepSpeed's kernels (fused Adam, transformer inference, quantization, etc.).
Key components include:
- OpBuilder (abstract base class) -- Defines the interface for all operator builders:
- absolute_name() -- Returns the fully-qualified module path for pre-installed ops (e.g., deepspeed.ops.adam.cpu_adam).
- sources() -- Returns list of C++/CUDA source files to compile.
- include_paths() -- Returns list of include directories.
- nvcc_args() / cxx_args() -- Returns compiler flags for CUDA and C++ compilation respectively.
- is_compatible() -- Checks whether the op can be compiled on the current system (checking for required tools and libraries).
- load() -- Attempts to load a pre-compiled op, falling back to JIT compilation via jit_load().
- builder() -- Returns a setuptools Extension object for ahead-of-time compilation via setup.py.
- CUDA version management:
- installed_cuda_version() -- Detects the system CUDA version from nvcc.
- assert_no_cuda_mismatch() -- Validates that the system CUDA version matches the PyTorch CUDA version, with tolerance for compatible minor versions.
- get_default_compute_capabilities() -- Returns CUDA compute capability targets (6.0, 6.1, 7.0 for pre-11.x; adding 8.0, 8.6 for 11.x+).
- cuda_minor_mismatch_ok -- Lookup table of compatible CUDA minor versions within major versions.
- ROCm support:
- is_rocm_pytorch() -- Detects AMD ROCm PyTorch builds.
- installed_rocm_version() -- Detects the system ROCm version.
- hipify_extension() -- Hook for converting CUDA sources to HIP for AMD GPUs.
- Build path management:
- DEFAULT_TORCH_EXTENSION_PATH -- Default JIT compilation cache directory (/tmp/torch_extensions).
- Proper handling of version mismatches between compile-time and runtime PyTorch/CUDA versions.
Usage
Concrete op builders (e.g., FusedAdamBuilder, TransformerBuilder, QuantizerBuilder) inherit from OpBuilder and are registered in the ALL_OPS dictionary. They are invoked during setup.py (ahead-of-time) or at first use (JIT). This module is part of the vendored benchmark dependencies in FlexLLMGen.
Code Reference
| Field | Value |
|---|---|
| Repository | FlexLLMGen |
| File | benchmark/third_party/DeepSpeed/op_builder/builder.py |
| Lines | 1-699 |
| Type | AUTO_KEEP (vendored dependency) |
Key class signature:
class OpBuilder(ABC):
def __init__(self, name):
self.name = name
self.jit_mode = False
self.error_log = None
@abstractmethod
def absolute_name(self): ...
@abstractmethod
def sources(self): ...
def load(self, verbose=True): ...
def jit_load(self, verbose=True): ...
def builder(self): ...
def is_compatible(self, verbose=True): ...
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
| name | str | Yes | Name of the operator (e.g., 'cpu_adam', 'transformer') |
| sources | List[str] | Yes | Paths to C++/CUDA source files (abstract, provided by subclass) |
| verbose | bool | No | Enable verbose build logging (default: True) |
Outputs
| Output | Type | Description |
|---|---|---|
| loaded module | module | Compiled and loaded Python extension module |
| Extension | setuptools.Extension | Build extension for ahead-of-time compilation |
| is_compatible | bool | Whether the op can be compiled on the current system |