Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vllm project Vllm Machete Generate

From Leeroopedia


Knowledge Sources
Domains Quantization, Machete, Code_Generation
Last Updated 2026-02-08 00:00 GMT

Overview

Python code generator that produces CUTLASS-based Machete mixed-precision GEMM kernel instantiations and dispatch logic via Jinja2 templates.

Description

This script uses dataclasses (TypeConfig, ScheduleConfig, ImplConfig, PrepackTypeConfig) combined with Jinja2 templates to generate C++ source files for the Machete kernel library. It produces dispatch functions (mm_dispatch), kernel implementations, schedule selection heuristics, and weight prepacking routines for various combinations of activation types (FP16, BF16, FP8), weight types (INT4, INT8, FP4, FP8), and quantization configurations (group scales, zero points, channel scales, token scales). The generator reduces compile time through selective instantiation while covering diverse quantization schemes.

Usage

This script is executed during the vLLM build process to auto-generate the Machete kernel C++ source files. It is invoked as a standalone Python script and writes generated kernel files to the Machete source directory.

Code Reference

Source Location

Signature

@dataclass(frozen=True)
class ScheduleConfig:
    tile_shape_mn: tuple[int, int]
    cluster_shape_mnk: tuple[int, int, int]
    kernel_schedule: MixedInputKernelScheduleType
    epilogue_schedule: EpilogueScheduleType
    tile_scheduler: TileSchedulerType

@dataclass(frozen=True)
class TypeConfig:
    a: DataType
    b: DataType | VLLMDataType
    b_group_scale: DataType
    b_group_zeropoint: DataType
    b_channel_scale: DataType
    a_token_scale: DataType
    out: DataType
    accumulator: DataType

@dataclass(frozen=True)
class PrepackTypeConfig:
    a: DataType
    b_num_bits: int
    convert: DataType
    accumulator: DataType

@dataclass
class ImplConfig:
    types: TypeConfig
    schedules: list[ScheduleConfig]
    heuristic: list[tuple[str | None, ScheduleConfig]]

def generate_sch_sig(schedule_config: ScheduleConfig) -> str: ...
def generate_type_signature(kernel_types: TypeConfig) -> str: ...
def generate_type_option_name(kernel_types: TypeConfig) -> str: ...

Import

# This is a build-time code generator script; it is not imported at runtime.
# It is executed via:
python csrc/quantization/machete/generate.py

I/O Contract

Inputs

Name Type Required Description
DISPATCH_TEMPLATE str (Jinja2) Yes Jinja2 template string for generating dispatch logic (mm_dispatch, supported_schedules_dispatch)
IMPL_TEMPLATE str (Jinja2) Yes Jinja2 template string for generating kernel implementation files with schedule structs
PREPACK_TEMPLATE str (Jinja2) Yes Jinja2 template string for generating weight prepacking dispatch logic
impl_configs list[ImplConfig] Yes List of implementation configurations specifying type combinations and schedules

Outputs

Name Type Description
Generated C++ dispatch file .cuh file Contains mm_dispatch() and supported_schedules_dispatch() functions
Generated C++ impl files .cuh files Contains kernel template instantiations for each type/schedule combination
Generated C++ prepack file .cuh file Contains prepack_B_dispatch() for weight prepacking

Usage Examples

# Build-time generation (typically called by CMake or setup.py)
import subprocess
subprocess.run(["python", "csrc/quantization/machete/generate.py"])

# The script generates files like:
#   machete_mm_dispatch.cuh
#   machete_mm_impl_*.cuh
#   machete_prepack_dispatch.cuh

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment