Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy Batch

From Leeroopedia


Knowledge Sources
Domains Inference Engine, Batch Processing
Last Updated 2026-02-07 15:00 GMT

Overview

Defines the batch operation lifecycle enum and the BatchData struct used to manage per-batch state during inference execution in the TurboMind engine.

Description

The batch.h header establishes the core batch processing primitives for TurboMind's inference pipeline. It provides two key components:

BatchOp enum: Enumerates the eight distinct operations in a batch's lifecycle: kAdd (submit request from session to request cache), kSetup (copy host data to device), kPrepare (transition from device state to stepping state), kForward (execute the model forward pass), kUnprep (reverse prepare), kFetch (copy device results back to host), kUpdate (update request cache from batch), and kDel (remove request from cache to session). This state machine governs the flow: Se -> Rc -> (B -> D) -> St -> (D -> B) -> Rc -> Se.

BatchData struct: Holds the mutable state for an active batch, including batch sizes (bs0, bsz), a permutation buffer (perm), a vector of RequestCache shared pointers, token counts, and synchronization events (ready, done, next). The struct uses a self-pointer pattern for buffer creation and provides a Notify() method that records events to the stream and signals completion via a promise.

Usage

Used internally by the TurboMind engine to orchestrate batch-level operations. Engine components (InputProcessor, OutputProcessor, Generation modules) receive a BatchOp to determine which phase of processing to execute, and access BatchData through a TensorMap environment to read and modify batch state.

Code Reference

Source Location

Signature

enum class BatchOp
{
    kAdd,      //  Se ->  Rc         H
    kSetup,    //  Rc -> (B  -> D)   H2D
    kPrepare,  // (D  ->  St)        D
    kForward,  //  St ->  St         D
    kUnprep,   // (St ->  D)         D
    kFetch,    // (D  ->  B)         D2H
    kUpdate,   //  B  ->  Rc         H
    kDel,      //  Rc ->  Se         H
};

struct BatchData {
    explicit BatchData(int phase);

    BatchData(const BatchData&)     = delete;
    BatchData(BatchData&&) noexcept = delete;
    BatchData& operator=(const BatchData&) = delete;
    BatchData& operator=(BatchData&&) noexcept = delete;

    BatchData* self;
    const int phase;
    int bs0 = 0;
    int bsz = 0;
    Buffer_<int> perm;
    std::vector<std::shared_ptr<RequestCache>> rc;
    std::vector<int> local_token_num;
    int global_token_num = 0;
    Event ready;
    Event done;
    Event next;
    std::promise<Event> promise;

    Buffer buf();
    void Notify();
};

Import

#include "src/turbomind/engine/batch.h"

I/O Contract

Inputs

Name Type Required Description
phase int Yes The phase index this batch belongs to (set at construction)

Outputs

Name Type Description
bs0 int Original batch size before permutation
bsz int Current active batch size
perm Buffer_<int> Permutation indices mapping batch positions
rc std::vector<std::shared_ptr<RequestCache>> Per-request cached state for the batch
local_token_num std::vector<int> Token counts per local rank
global_token_num int Total token count across all ranks

Usage Examples

// Creating a BatchData for phase 0
BatchData batch(0);

// Accessing batch in a TensorMap environment (common pattern in processors)
auto& b = *env.at("batch").data<BatchData*>()[0];
int batch_size = b.bsz;
for (const auto& r : b.rc) {
    // Process each request in the batch
}

// Notify completion of a batch operation
batch.Notify();

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment