Implementation:Vllm project Vllm Machete Generate

Knowledge Sources	vllm
Domains	Quantization, Machete, Code_Generation
Last Updated	2026-02-08 00:00 GMT

Overview

Python code generator that produces CUTLASS-based Machete mixed-precision GEMM kernel instantiations and dispatch logic via Jinja2 templates.

Description

This script uses dataclasses (TypeConfig, ScheduleConfig, ImplConfig, PrepackTypeConfig) combined with Jinja2 templates to generate C++ source files for the Machete kernel library. It produces dispatch functions (mm_dispatch), kernel implementations, schedule selection heuristics, and weight prepacking routines for various combinations of activation types (FP16, BF16, FP8), weight types (INT4, INT8, FP4, FP8), and quantization configurations (group scales, zero points, channel scales, token scales). The generator reduces compile time through selective instantiation while covering diverse quantization schemes.

Usage

This script is executed during the vLLM build process to auto-generate the Machete kernel C++ source files. It is invoked as a standalone Python script and writes generated kernel files to the Machete source directory.

Code Reference

Source Location

Repository: vllm
File: csrc/quantization/machete/generate.py
Lines: 1-694

Signature

@dataclass(frozen=True)
class ScheduleConfig:
    tile_shape_mn: tuple[int, int]
    cluster_shape_mnk: tuple[int, int, int]
    kernel_schedule: MixedInputKernelScheduleType
    epilogue_schedule: EpilogueScheduleType
    tile_scheduler: TileSchedulerType

@dataclass(frozen=True)
class TypeConfig:
    a: DataType
    b: DataType | VLLMDataType
    b_group_scale: DataType
    b_group_zeropoint: DataType
    b_channel_scale: DataType
    a_token_scale: DataType
    out: DataType
    accumulator: DataType

@dataclass(frozen=True)
class PrepackTypeConfig:
    a: DataType
    b_num_bits: int
    convert: DataType
    accumulator: DataType

@dataclass
class ImplConfig:
    types: TypeConfig
    schedules: list[ScheduleConfig]
    heuristic: list[tuple[str | None, ScheduleConfig]]

def generate_sch_sig(schedule_config: ScheduleConfig) -> str: ...
def generate_type_signature(kernel_types: TypeConfig) -> str: ...
def generate_type_option_name(kernel_types: TypeConfig) -> str: ...

Import

# This is a build-time code generator script; it is not imported at runtime.
# It is executed via:
python csrc/quantization/machete/generate.py

I/O Contract

Inputs

Name	Type	Required	Description
DISPATCH_TEMPLATE	str (Jinja2)	Yes	Jinja2 template string for generating dispatch logic (mm_dispatch, supported_schedules_dispatch)
IMPL_TEMPLATE	str (Jinja2)	Yes	Jinja2 template string for generating kernel implementation files with schedule structs
PREPACK_TEMPLATE	str (Jinja2)	Yes	Jinja2 template string for generating weight prepacking dispatch logic
impl_configs	list[ImplConfig]	Yes	List of implementation configurations specifying type combinations and schedules

Outputs

Name	Type	Description
Generated C++ dispatch file	.cuh file	Contains mm_dispatch() and supported_schedules_dispatch() functions
Generated C++ impl files	.cuh files	Contains kernel template instantiations for each type/schedule combination
Generated C++ prepack file	.cuh file	Contains prepack_B_dispatch() for weight prepacking

Usage Examples

# Build-time generation (typically called by CMake or setup.py)
import subprocess
subprocess.run(["python", "csrc/quantization/machete/generate.py"])

# The script generates files like:
#   machete_mm_dispatch.cuh
#   machete_mm_impl_*.cuh
#   machete_prepack_dispatch.cuh

Related Pages

Environment:Vllm_project_Vllm_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment