Implementation:Sgl project Sglang Sgl Kernel Init

Knowledge Sources	Sgl_project_Sglang
Domains	Kernel, Package Initialization, API Surface
Last Updated	2026-02-10 00:00 GMT

Overview

Package initialization file that loads architecture-specific C++ extensions and re-exports all kernel Python APIs into a flat namespace.

Description

The sgl_kernel/__init__.py module serves as the public entry point for the sgl_kernel package. On import, it calls _load_architecture_specific_ops() to load the correct common_ops shared library for the detected GPU architecture (e.g., SM90 vs other), then preloads the CUDA runtime library via _preload_cuda_library() if CUDA is available. It re-exports all public APIs from submodules including: allreduce, attention, cutlass_moe, elementwise, expert_specialization, fused_moe, gemm, grammar, hadamard, kvcacheio, mamba, marlin, memory, moe, quantization, sampling, speculative, top_k, and version. Additionally, it provides lazy-loaded wrappers for create_greenctx_stream_by_value and get_sm_available from the spatial submodule, and conditionally imports gelu_quick on ROCm platforms.

Usage

Import sgl_kernel directly to access any kernel operation via the flat namespace, such as sgl_kernel.rmsnorm, sgl_kernel.fp8_scaled_mm, or sgl_kernel.moe_align_block_size.

Code Reference

Source Location

Repository: Sgl_project_Sglang
File: sgl-kernel/python/sgl_kernel/__init__.py
Lines: 1-154

Signature

# Architecture-specific ops loaded at import time
common_ops = _load_architecture_specific_ops()

# Lazy-loaded spatial functions
def create_greenctx_stream_by_value(*args, **kwargs) -> Any: ...
def get_sm_available(*args, **kwargs) -> Any: ...

Import

import sgl_kernel

# Or import specific operations
from sgl_kernel import rmsnorm, fp8_scaled_mm, merge_state

I/O Contract

Inputs

Name	Type	Required	Description
(none)	-	-	Module is initialized on import; no direct input parameters

Outputs

Name	Type	Description
common_ops	module	Architecture-specific C++ extension module loaded at import
__version__	str	Package version string from sgl_kernel.version
(exported functions)	callable	All kernel operations re-exported from submodules

Usage Examples

import sgl_kernel

# Access kernel operations directly
output = sgl_kernel.rmsnorm(input_tensor, weight, eps)

# Access attention operations
v_merged, s_merged = sgl_kernel.merge_state(v_a, s_a, v_b, s_b)

# Access quantization operations
result = sgl_kernel.fp8_scaled_mm(a, b, scale_a, scale_b)

# Check version
print(sgl_kernel.__version__)

Related Pages

Environment:Sgl_project_Sglang_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment