Implementation:InternLM Lmdeploy Batch
| Knowledge Sources | |
|---|---|
| Domains | Inference Engine, Batch Processing |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Defines the batch operation lifecycle enum and the BatchData struct used to manage per-batch state during inference execution in the TurboMind engine.
Description
The batch.h header establishes the core batch processing primitives for TurboMind's inference pipeline. It provides two key components:
BatchOp enum: Enumerates the eight distinct operations in a batch's lifecycle: kAdd (submit request from session to request cache), kSetup (copy host data to device), kPrepare (transition from device state to stepping state), kForward (execute the model forward pass), kUnprep (reverse prepare), kFetch (copy device results back to host), kUpdate (update request cache from batch), and kDel (remove request from cache to session). This state machine governs the flow: Se -> Rc -> (B -> D) -> St -> (D -> B) -> Rc -> Se.
BatchData struct: Holds the mutable state for an active batch, including batch sizes (bs0, bsz), a permutation buffer (perm), a vector of RequestCache shared pointers, token counts, and synchronization events (ready, done, next). The struct uses a self-pointer pattern for buffer creation and provides a Notify() method that records events to the stream and signals completion via a promise.
Usage
Used internally by the TurboMind engine to orchestrate batch-level operations. Engine components (InputProcessor, OutputProcessor, Generation modules) receive a BatchOp to determine which phase of processing to execute, and access BatchData through a TensorMap environment to read and modify batch state.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: src/turbomind/engine/batch.h
- Lines: 1-83
Signature
enum class BatchOp
{
kAdd, // Se -> Rc H
kSetup, // Rc -> (B -> D) H2D
kPrepare, // (D -> St) D
kForward, // St -> St D
kUnprep, // (St -> D) D
kFetch, // (D -> B) D2H
kUpdate, // B -> Rc H
kDel, // Rc -> Se H
};
struct BatchData {
explicit BatchData(int phase);
BatchData(const BatchData&) = delete;
BatchData(BatchData&&) noexcept = delete;
BatchData& operator=(const BatchData&) = delete;
BatchData& operator=(BatchData&&) noexcept = delete;
BatchData* self;
const int phase;
int bs0 = 0;
int bsz = 0;
Buffer_<int> perm;
std::vector<std::shared_ptr<RequestCache>> rc;
std::vector<int> local_token_num;
int global_token_num = 0;
Event ready;
Event done;
Event next;
std::promise<Event> promise;
Buffer buf();
void Notify();
};
Import
#include "src/turbomind/engine/batch.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| phase | int | Yes | The phase index this batch belongs to (set at construction) |
Outputs
| Name | Type | Description |
|---|---|---|
| bs0 | int | Original batch size before permutation |
| bsz | int | Current active batch size |
| perm | Buffer_<int> | Permutation indices mapping batch positions |
| rc | std::vector<std::shared_ptr<RequestCache>> | Per-request cached state for the batch |
| local_token_num | std::vector<int> | Token counts per local rank |
| global_token_num | int | Total token count across all ranks |
Usage Examples
// Creating a BatchData for phase 0
BatchData batch(0);
// Accessing batch in a TensorMap environment (common pattern in processors)
auto& b = *env.at("batch").data<BatchData*>()[0];
int batch_size = b.bsz;
for (const auto& r : b.rc) {
// Process each request in the batch
}
// Notify completion of a batch operation
batch.Notify();