Implementation:InternLM Lmdeploy AttentionTemplate

Knowledge Sources	InternLM_Lmdeploy
Domains	GPU_Kernels, Attention
Last Updated	2026-02-07 15:00 GMT

Overview

Template function that orchestrates the launch of prefill attention kernels, including shared memory configuration, occupancy-based split-K computation, and optional post-kernel reduction.

Description

invokeAttention<Kernel> is the host-side entry point for launching fused multi-head attention kernels during the prefill phase. It computes the required dynamic shared memory size from Kernel::SharedStorage, queries device occupancy to determine an optimal split count, constructs the CTA map and cache iterator factory, launches the kernel, and conditionally invokes a split-K reduction pass when the workload is distributed across multiple splits or context-parallel ranks.

Usage

Used by the TurboMind attention dispatch layer to launch prefill attention for a given architecture-specific kernel configuration. The Kernel template parameter is typically an AttentionUniversal specialization assembled from AttentionConfig.

Code Reference

Source Location

Repository: InternLM_Lmdeploy
File: src/turbomind/kernels/attention/attention_template.h
Lines: 1-103

Signature

template<class Kernel>
void invokeAttention(const typename Kernel::ParamType& params);

Import

#include "src/turbomind/kernels/attention/attention_template.h"

I/O Contract

Inputs

Name	Type	Required	Description
params	Kernel::ParamType (AttentionParams<T>)	Yes	Fully populated attention parameters struct

Outputs

Name	Type	Description
params.out	T*	Output written in-place via the params struct
params.partial_O	float*	Partial outputs when split-K > 1

Usage Examples

// Launch prefill attention for SM80 with block cache
using Config = AttentionConfig<arch::Sm80, half, 128, CacheType::kBlock>;
invokeAttention<Config::Kernel>(params);

Related Pages

Environment:InternLM_Lmdeploy_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment