Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Sgl project Sglang Sgl Kernel Init

From Leeroopedia


Knowledge Sources
Domains Kernel, Package Initialization, API Surface
Last Updated 2026-02-10 00:00 GMT

Overview

Package initialization file that loads architecture-specific C++ extensions and re-exports all kernel Python APIs into a flat namespace.

Description

The sgl_kernel/__init__.py module serves as the public entry point for the sgl_kernel package. On import, it calls _load_architecture_specific_ops() to load the correct common_ops shared library for the detected GPU architecture (e.g., SM90 vs other), then preloads the CUDA runtime library via _preload_cuda_library() if CUDA is available. It re-exports all public APIs from submodules including: allreduce, attention, cutlass_moe, elementwise, expert_specialization, fused_moe, gemm, grammar, hadamard, kvcacheio, mamba, marlin, memory, moe, quantization, sampling, speculative, top_k, and version. Additionally, it provides lazy-loaded wrappers for create_greenctx_stream_by_value and get_sm_available from the spatial submodule, and conditionally imports gelu_quick on ROCm platforms.

Usage

Import sgl_kernel directly to access any kernel operation via the flat namespace, such as sgl_kernel.rmsnorm, sgl_kernel.fp8_scaled_mm, or sgl_kernel.moe_align_block_size.

Code Reference

Source Location

Signature

# Architecture-specific ops loaded at import time
common_ops = _load_architecture_specific_ops()

# Lazy-loaded spatial functions
def create_greenctx_stream_by_value(*args, **kwargs) -> Any: ...
def get_sm_available(*args, **kwargs) -> Any: ...

Import

import sgl_kernel

# Or import specific operations
from sgl_kernel import rmsnorm, fp8_scaled_mm, merge_state

I/O Contract

Inputs

Name Type Required Description
(none) - - Module is initialized on import; no direct input parameters

Outputs

Name Type Description
common_ops module Architecture-specific C++ extension module loaded at import
__version__ str Package version string from sgl_kernel.version
(exported functions) callable All kernel operations re-exported from submodules

Usage Examples

import sgl_kernel

# Access kernel operations directly
output = sgl_kernel.rmsnorm(input_tensor, weight, eps)

# Access attention operations
v_merged, s_merged = sgl_kernel.merge_state(v_a, s_a, v_b, s_b)

# Access quantization operations
result = sgl_kernel.fp8_scaled_mm(a, b, scale_a, scale_b)

# Check version
print(sgl_kernel.__version__)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment