Implementation:Sgl project Sglang Sgl Kernel Init
| Knowledge Sources | |
|---|---|
| Domains | Kernel, Package Initialization, API Surface |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Package initialization file that loads architecture-specific C++ extensions and re-exports all kernel Python APIs into a flat namespace.
Description
The sgl_kernel/__init__.py module serves as the public entry point for the sgl_kernel package. On import, it calls _load_architecture_specific_ops() to load the correct common_ops shared library for the detected GPU architecture (e.g., SM90 vs other), then preloads the CUDA runtime library via _preload_cuda_library() if CUDA is available. It re-exports all public APIs from submodules including: allreduce, attention, cutlass_moe, elementwise, expert_specialization, fused_moe, gemm, grammar, hadamard, kvcacheio, mamba, marlin, memory, moe, quantization, sampling, speculative, top_k, and version. Additionally, it provides lazy-loaded wrappers for create_greenctx_stream_by_value and get_sm_available from the spatial submodule, and conditionally imports gelu_quick on ROCm platforms.
Usage
Import sgl_kernel directly to access any kernel operation via the flat namespace, such as sgl_kernel.rmsnorm, sgl_kernel.fp8_scaled_mm, or sgl_kernel.moe_align_block_size.
Code Reference
Source Location
- Repository: Sgl_project_Sglang
- File: sgl-kernel/python/sgl_kernel/__init__.py
- Lines: 1-154
Signature
# Architecture-specific ops loaded at import time
common_ops = _load_architecture_specific_ops()
# Lazy-loaded spatial functions
def create_greenctx_stream_by_value(*args, **kwargs) -> Any: ...
def get_sm_available(*args, **kwargs) -> Any: ...
Import
import sgl_kernel
# Or import specific operations
from sgl_kernel import rmsnorm, fp8_scaled_mm, merge_state
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| (none) | - | - | Module is initialized on import; no direct input parameters |
Outputs
| Name | Type | Description |
|---|---|---|
| common_ops | module | Architecture-specific C++ extension module loaded at import |
| __version__ | str | Package version string from sgl_kernel.version |
| (exported functions) | callable | All kernel operations re-exported from submodules |
Usage Examples
import sgl_kernel
# Access kernel operations directly
output = sgl_kernel.rmsnorm(input_tensor, weight, eps)
# Access attention operations
v_merged, s_merged = sgl_kernel.merge_state(v_a, s_a, v_b, s_b)
# Access quantization operations
result = sgl_kernel.fp8_scaled_mm(a, b, scale_a, scale_b)
# Check version
print(sgl_kernel.__version__)