Implementation:Sgl project Sglang CPU Common Header
| Knowledge Sources | |
|---|---|
| Domains | Kernel Infrastructure, CPU Compute |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Central header file providing shared macros, type dispatch utilities, validation helpers, and common functions used across all CPU kernel implementations in the sgl-kernel package.
Description
common.h is the foundation header for the entire CPU kernel subsystem. Every CPU kernel file (decode, extend, flash_attn, gemm, etc.) includes this header for shared definitions.
The file defines several key macro categories:
- Boolean Dispatch Macros: AT_DISPATCH_BOOL and AT_DISPATCH_BOOL2 enable compile-time boolean template specialization from runtime boolean values. These generate two code paths (true/false) as constexpr, enabling the compiler to optimize away unused branches.
- Type Dispatch Macros: CPU_DISPATCH_PACKED_TYPES dispatches across BFloat16, Half, int8_t, and Float8_e4m3fn types using a packed_t typedef. CPU_DISPATCH_REDUCED_FLOATING_TYPES_EXT handles mixed-dtype dispatch where the primary type (scalar_t for input/output/weight) and secondary type (param_t for bias) can differ.
- Validation Macros: CHECK_CPU, CHECK_CONTIGUOUS, CHECK_INPUT, CHECK_DIM, CHECK_EQ, CHECK_GE, and CHECK_INPUT_SHAPE_DTYPE provide tensor validation with informative error messages.
- Parallel Routines: Documents and provides parallel_for (with balance211 partitioning) alongside at::parallel_for for parallelizing kernel workloads across threads.
- Helper Functions: data_index_init and data_index_step for multi-dimensional index decomposition, and div_up for ceiling integer division.
Usage
Include this header in any new CPU kernel implementation file. Use the dispatch macros to handle multiple data types, the validation macros to check input tensors, and the parallel routines for thread-level parallelism.
Code Reference
Source Location
- Repository: Sgl_project_Sglang
- File: sgl-kernel/csrc/cpu/common.h
- Lines: 1-365
Signature
// Boolean dispatch macros
#define AT_DISPATCH_BOOL(BOOL_V, BOOL_NAME, ...)
#define AT_DISPATCH_BOOL2(BOOL_V1, BOOL_NAME1, BOOL_V2, BOOL_NAME2, ...)
// Type dispatch macros
#define CPU_DISPATCH_PACKED_TYPES(TYPE, ...)
#define CPU_DISPATCH_REDUCED_FLOATING_TYPES_EXT(TYPE1, TYPE2, ...)
// Validation macros
#define CHECK_CPU(x)
#define CHECK_CONTIGUOUS(x)
#define CHECK_INPUT(x)
#define CHECK_DIM(d, x)
#define CHECK_EQ(a, b)
#define CHECK_GE(a, b)
template <bool is_only_lastdim_contiguous>
static inline void CHECK_INPUT_SHAPE_DTYPE(
const at::Tensor& tensor,
const at::IntArrayRef sizes,
at::ScalarType st);
Import
#include "common.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| ATen/ATen.h | Header | Yes | Core ATen tensor library |
| ATen/Parallel.h | Header | Yes | Parallel execution utilities (at::parallel_for) |
| ATen/record_function.h | Header | Yes | Performance profiling hooks |
| omp.h | Header | Conditional | OpenMP support, included when _OPENMP is defined |
Outputs
| Name | Type | Description |
|---|---|---|
| Dispatch macros | Preprocessor | Generate type-specialized code paths from runtime type info |
| Validation macros | Preprocessor | Provide tensor shape, dtype, device, and contiguity checks |
| Helper functions | Inline | Common utilities for indexing and parallelism |
Usage Examples
Type Dispatch for BFloat16/Half/INT8/FP8
CPU_DISPATCH_PACKED_TYPES(input.scalar_type(), [&] {
// packed_t is now one of: at::BFloat16, at::Half, int8_t, at::Float8_e4m3fn
my_kernel<packed_t>(input.data_ptr<packed_t>(), output.data_ptr<packed_t>(), N);
});
Input Validation
CHECK_INPUT(tensor); // Checks CPU device + contiguous
CHECK_DIM(2, tensor); // Checks tensor is 2D
CHECK_EQ(tensor.size(1), hidden_size); // Checks dimension size