Implementation:Vllm project Vllm Machete Generate
| Knowledge Sources | |
|---|---|
| Domains | Quantization, Machete, Code_Generation |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Python code generator that produces CUTLASS-based Machete mixed-precision GEMM kernel instantiations and dispatch logic via Jinja2 templates.
Description
This script uses dataclasses (TypeConfig, ScheduleConfig, ImplConfig, PrepackTypeConfig) combined with Jinja2 templates to generate C++ source files for the Machete kernel library. It produces dispatch functions (mm_dispatch), kernel implementations, schedule selection heuristics, and weight prepacking routines for various combinations of activation types (FP16, BF16, FP8), weight types (INT4, INT8, FP4, FP8), and quantization configurations (group scales, zero points, channel scales, token scales). The generator reduces compile time through selective instantiation while covering diverse quantization schemes.
Usage
This script is executed during the vLLM build process to auto-generate the Machete kernel C++ source files. It is invoked as a standalone Python script and writes generated kernel files to the Machete source directory.
Code Reference
Source Location
- Repository: vllm
- File: csrc/quantization/machete/generate.py
- Lines: 1-694
Signature
@dataclass(frozen=True)
class ScheduleConfig:
tile_shape_mn: tuple[int, int]
cluster_shape_mnk: tuple[int, int, int]
kernel_schedule: MixedInputKernelScheduleType
epilogue_schedule: EpilogueScheduleType
tile_scheduler: TileSchedulerType
@dataclass(frozen=True)
class TypeConfig:
a: DataType
b: DataType | VLLMDataType
b_group_scale: DataType
b_group_zeropoint: DataType
b_channel_scale: DataType
a_token_scale: DataType
out: DataType
accumulator: DataType
@dataclass(frozen=True)
class PrepackTypeConfig:
a: DataType
b_num_bits: int
convert: DataType
accumulator: DataType
@dataclass
class ImplConfig:
types: TypeConfig
schedules: list[ScheduleConfig]
heuristic: list[tuple[str | None, ScheduleConfig]]
def generate_sch_sig(schedule_config: ScheduleConfig) -> str: ...
def generate_type_signature(kernel_types: TypeConfig) -> str: ...
def generate_type_option_name(kernel_types: TypeConfig) -> str: ...
Import
# This is a build-time code generator script; it is not imported at runtime.
# It is executed via:
python csrc/quantization/machete/generate.py
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| DISPATCH_TEMPLATE | str (Jinja2) | Yes | Jinja2 template string for generating dispatch logic (mm_dispatch, supported_schedules_dispatch) |
| IMPL_TEMPLATE | str (Jinja2) | Yes | Jinja2 template string for generating kernel implementation files with schedule structs |
| PREPACK_TEMPLATE | str (Jinja2) | Yes | Jinja2 template string for generating weight prepacking dispatch logic |
| impl_configs | list[ImplConfig] | Yes | List of implementation configurations specifying type combinations and schedules |
Outputs
| Name | Type | Description |
|---|---|---|
| Generated C++ dispatch file | .cuh file | Contains mm_dispatch() and supported_schedules_dispatch() functions |
| Generated C++ impl files | .cuh files | Contains kernel template instantiations for each type/schedule combination |
| Generated C++ prepack file | .cuh file | Contains prepack_B_dispatch() for weight prepacking |
Usage Examples
# Build-time generation (typically called by CMake or setup.py)
import subprocess
subprocess.run(["python", "csrc/quantization/machete/generate.py"])
# The script generates files like:
# machete_mm_dispatch.cuh
# machete_mm_impl_*.cuh
# machete_prepack_dispatch.cuh