Implementation:InternLM Lmdeploy Batch

Knowledge Sources	InternLM_Lmdeploy
Domains	Inference Engine, Batch Processing
Last Updated	2026-02-07 15:00 GMT

Overview

Defines the batch operation lifecycle enum and the BatchData struct used to manage per-batch state during inference execution in the TurboMind engine.

Description

The batch.h header establishes the core batch processing primitives for TurboMind's inference pipeline. It provides two key components:

BatchOp enum: Enumerates the eight distinct operations in a batch's lifecycle: kAdd (submit request from session to request cache), kSetup (copy host data to device), kPrepare (transition from device state to stepping state), kForward (execute the model forward pass), kUnprep (reverse prepare), kFetch (copy device results back to host), kUpdate (update request cache from batch), and kDel (remove request from cache to session). This state machine governs the flow: Se -> Rc -> (B -> D) -> St -> (D -> B) -> Rc -> Se.

BatchData struct: Holds the mutable state for an active batch, including batch sizes (bs0, bsz), a permutation buffer (perm), a vector of RequestCache shared pointers, token counts, and synchronization events (ready, done, next). The struct uses a self-pointer pattern for buffer creation and provides a Notify() method that records events to the stream and signals completion via a promise.

Usage

Used internally by the TurboMind engine to orchestrate batch-level operations. Engine components (InputProcessor, OutputProcessor, Generation modules) receive a BatchOp to determine which phase of processing to execute, and access BatchData through a TensorMap environment to read and modify batch state.

Code Reference

Source Location

Repository: InternLM_Lmdeploy
File: src/turbomind/engine/batch.h
Lines: 1-83

Signature

enum class BatchOp
{
    kAdd,      //  Se ->  Rc         H
    kSetup,    //  Rc -> (B  -> D)   H2D
    kPrepare,  // (D  ->  St)        D
    kForward,  //  St ->  St         D
    kUnprep,   // (St ->  D)         D
    kFetch,    // (D  ->  B)         D2H
    kUpdate,   //  B  ->  Rc         H
    kDel,      //  Rc ->  Se         H
};

struct BatchData {
    explicit BatchData(int phase);

    BatchData(const BatchData&)     = delete;
    BatchData(BatchData&&) noexcept = delete;
    BatchData& operator=(const BatchData&) = delete;
    BatchData& operator=(BatchData&&) noexcept = delete;

    BatchData* self;
    const int phase;
    int bs0 = 0;
    int bsz = 0;
    Buffer_<int> perm;
    std::vector<std::shared_ptr<RequestCache>> rc;
    std::vector<int> local_token_num;
    int global_token_num = 0;
    Event ready;
    Event done;
    Event next;
    std::promise<Event> promise;

    Buffer buf();
    void Notify();
};

Import

#include "src/turbomind/engine/batch.h"

I/O Contract

Inputs

Name	Type	Required	Description
phase	int	Yes	The phase index this batch belongs to (set at construction)

Outputs

Name	Type	Description
bs0	int	Original batch size before permutation
bsz	int	Current active batch size
perm	Buffer_<int>	Permutation indices mapping batch positions
rc	std::vector<std::shared_ptr<RequestCache>>	Per-request cached state for the batch
local_token_num	std::vector<int>	Token counts per local rank
global_token_num	int	Total token count across all ranks

Usage Examples

// Creating a BatchData for phase 0
BatchData batch(0);

// Accessing batch in a TensorMap environment (common pattern in processors)
auto& b = *env.at("batch").data<BatchData*>()[0];
int batch_size = b.bsz;
for (const auto& r : b.rc) {
    // Process each request in the batch
}

// Notify completion of a batch operation
batch.Notify();

Related Pages

Environment:InternLM_Lmdeploy_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment