Implementation:Sgl project Sglang CPU Common Header

Knowledge Sources	Sgl_project_Sglang
Domains	Kernel Infrastructure, CPU Compute
Last Updated	2026-02-10 00:00 GMT

Overview

Central header file providing shared macros, type dispatch utilities, validation helpers, and common functions used across all CPU kernel implementations in the sgl-kernel package.

Description

common.h is the foundation header for the entire CPU kernel subsystem. Every CPU kernel file (decode, extend, flash_attn, gemm, etc.) includes this header for shared definitions.

The file defines several key macro categories:

Boolean Dispatch Macros: AT_DISPATCH_BOOL and AT_DISPATCH_BOOL2 enable compile-time boolean template specialization from runtime boolean values. These generate two code paths (true/false) as constexpr, enabling the compiler to optimize away unused branches.

Type Dispatch Macros: CPU_DISPATCH_PACKED_TYPES dispatches across BFloat16, Half, int8_t, and Float8_e4m3fn types using a packed_t typedef. CPU_DISPATCH_REDUCED_FLOATING_TYPES_EXT handles mixed-dtype dispatch where the primary type (scalar_t for input/output/weight) and secondary type (param_t for bias) can differ.

Validation Macros: CHECK_CPU, CHECK_CONTIGUOUS, CHECK_INPUT, CHECK_DIM, CHECK_EQ, CHECK_GE, and CHECK_INPUT_SHAPE_DTYPE provide tensor validation with informative error messages.

Parallel Routines: Documents and provides parallel_for (with balance211 partitioning) alongside at::parallel_for for parallelizing kernel workloads across threads.

Helper Functions: data_index_init and data_index_step for multi-dimensional index decomposition, and div_up for ceiling integer division.

Usage

Include this header in any new CPU kernel implementation file. Use the dispatch macros to handle multiple data types, the validation macros to check input tensors, and the parallel routines for thread-level parallelism.

Code Reference

Source Location

Repository: Sgl_project_Sglang
File: sgl-kernel/csrc/cpu/common.h
Lines: 1-365

Signature

// Boolean dispatch macros
#define AT_DISPATCH_BOOL(BOOL_V, BOOL_NAME, ...)
#define AT_DISPATCH_BOOL2(BOOL_V1, BOOL_NAME1, BOOL_V2, BOOL_NAME2, ...)

// Type dispatch macros
#define CPU_DISPATCH_PACKED_TYPES(TYPE, ...)
#define CPU_DISPATCH_REDUCED_FLOATING_TYPES_EXT(TYPE1, TYPE2, ...)

// Validation macros
#define CHECK_CPU(x)
#define CHECK_CONTIGUOUS(x)
#define CHECK_INPUT(x)
#define CHECK_DIM(d, x)
#define CHECK_EQ(a, b)
#define CHECK_GE(a, b)

template <bool is_only_lastdim_contiguous>
static inline void CHECK_INPUT_SHAPE_DTYPE(
    const at::Tensor& tensor,
    const at::IntArrayRef sizes,
    at::ScalarType st);

Import

#include "common.h"

I/O Contract

Inputs

Name	Type	Required	Description
ATen/ATen.h	Header	Yes	Core ATen tensor library
ATen/Parallel.h	Header	Yes	Parallel execution utilities (at::parallel_for)
ATen/record_function.h	Header	Yes	Performance profiling hooks
omp.h	Header	Conditional	OpenMP support, included when _OPENMP is defined

Outputs

Name	Type	Description
Dispatch macros	Preprocessor	Generate type-specialized code paths from runtime type info
Validation macros	Preprocessor	Provide tensor shape, dtype, device, and contiguity checks
Helper functions	Inline	Common utilities for indexing and parallelism

Usage Examples

Type Dispatch for BFloat16/Half/INT8/FP8

CPU_DISPATCH_PACKED_TYPES(input.scalar_type(), [&] {
  // packed_t is now one of: at::BFloat16, at::Half, int8_t, at::Float8_e4m3fn
  my_kernel<packed_t>(input.data_ptr<packed_t>(), output.data_ptr<packed_t>(), N);
});

Input Validation

CHECK_INPUT(tensor);        // Checks CPU device + contiguous
CHECK_DIM(2, tensor);       // Checks tensor is 2D
CHECK_EQ(tensor.size(1), hidden_size);  // Checks dimension size

Related Pages

Environment:Sgl_project_Sglang_CPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment