Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Unslothai Unsloth MoE Autotune Cache

From Leeroopedia


Knowledge Sources
Domains MoE, Kernel_Optimization
Last Updated 2026-02-07 08:40 GMT

Overview

Concrete tool for caching and retrieving MoE Triton kernel autotuning configurations to avoid repeated expensive autotuning across sessions.

Description

The autotune_cache module provides a persistent caching system for Triton grouped GEMM kernel configurations. It generates MD5 cache keys from model parameters (num_experts, hidden_dim, intermediate_dim, top_k, dtype, device capability), stores optimized kernel configs as JSON under ~/.cache/unsloth/moe_autotune/, and maintains in-memory caches for runtime access. When autotuning fails or is disabled via UNSLOTH_MOE_DISABLE_AUTOTUNE=1, it falls back to heuristic or default configurations.

Usage

Import the get_or_autotune_moe_kernels function when initializing MoE model layers to obtain optimized kernel configurations for forward and backward passes without re-running autotuning.

Code Reference

Source Location

Signature

def get_or_autotune_moe_kernels(
    num_experts: int,
    hidden_dim: int,
    intermediate_dim: int,
    top_k: int,
    dtype: torch.dtype,
    force_autotune: bool = False,
    seq_len: int = 8192,
) -> Tuple[Any, Any, Any]:
    """
    Returns (config_fwd, config_bwd_dx, config_bwd_dw) kernel configs.
    Checks in-memory cache, disk cache, runs autotuning, or falls back to defaults.
    """

def clear_cache() -> None:
    """Clears in-memory kernel config caches."""

def load_cached_config(cache_key: str) -> Optional[Dict[str, Any]]:
    """Loads config from disk cache file."""

def save_cached_config(
    cache_key: str,
    config_fwd: Any,
    config_bwd_dx: Any,
    config_bwd_dw: Any,
    metadata: Dict[str, Any] = None,
) -> None:
    """Saves kernel configs to disk as JSON."""

Import

from unsloth.kernels.moe.autotune_cache import get_or_autotune_moe_kernels

I/O Contract

Inputs

Name Type Required Description
num_experts int Yes Number of experts in the MoE layer
hidden_dim int Yes Hidden dimension size
intermediate_dim int Yes Intermediate (FFN) dimension size
top_k int Yes Number of experts per token
dtype torch.dtype Yes Data type (e.g., torch.bfloat16)
force_autotune bool No Skip caches and re-run autotuning (default: False)
seq_len int No Sequence length for dummy tensors (default: 8192)

Outputs

Name Type Description
config_fwd KernelConfigForward Forward pass kernel configuration
config_bwd_dx KernelConfigBackward_dX Input gradient kernel configuration
config_bwd_dw KernelConfigBackward_dW Weight gradient kernel configuration

Usage Examples

Get Cached or Autotuned Configs

from unsloth.kernels.moe.autotune_cache import get_or_autotune_moe_kernels
import torch

# Get kernel configs (cached or autotuned)
config_fwd, config_bwd_dx, config_bwd_dw = get_or_autotune_moe_kernels(
    num_experts=8,
    hidden_dim=4096,
    intermediate_dim=14336,
    top_k=2,
    dtype=torch.bfloat16,
)

Disable Autotuning via Environment

# Use heuristic configs instead of autotuning
export UNSLOTH_MOE_DISABLE_AUTOTUNE=1

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment