Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ggml org Llama cpp Common Batch Add

From Leeroopedia
Field Value
Implementation Name Common Batch Add
Doc Type API Doc
Domain Batch Construction, Token Management
Description common_batch_add(batch, id, pos, seq_ids, logits) and common_batch_clear(batch) functions for constructing token batches
Related Workflow Embedding_Extraction

Overview

Description

The Common Batch Add implementation provides two utility functions for managing llama_batch structures: common_batch_add() appends a single token with its metadata to the batch, and common_batch_clear() resets the batch for reuse. These functions abstract the low-level array manipulation required to populate the batch structure, ensuring correct field assignment for token ID, position, sequence IDs, and the logits flag.

Usage

#include "common.h"

struct llama_batch batch = llama_batch_init(512, 0, 1);

// Add tokens for sequence 0
common_batch_add(batch, token_id, /*pos=*/0, {0}, /*logits=*/true);
common_batch_add(batch, token_id2, /*pos=*/1, {0}, /*logits=*/true);

// Process batch...
llama_decode(ctx, batch);

// Clear and reuse
common_batch_clear(batch);

Code Reference

Field Value
Source Location (header) common/common.h:775-780
Source Location (implementation) common/common.cpp:1438-1455
Signature (batch_add) void common_batch_add(struct llama_batch & batch, llama_token id, llama_pos pos, const std::vector<llama_seq_id> & seq_ids, bool logits)
Signature (batch_clear) void common_batch_clear(struct llama_batch & batch)
Import #include "common.h"

common_batch_clear implementation:

void common_batch_clear(struct llama_batch & batch) {
    batch.n_tokens = 0;
}

common_batch_add implementation:

void common_batch_add(
                 struct llama_batch & batch,
                        llama_token   id,
                          llama_pos   pos,
    const std::vector<llama_seq_id> & seq_ids,
                               bool   logits) {
    GGML_ASSERT(batch.seq_id[batch.n_tokens] && "llama_batch size exceeded");

    batch.token   [batch.n_tokens] = id;
    batch.pos     [batch.n_tokens] = pos;
    batch.n_seq_id[batch.n_tokens] = seq_ids.size();
    for (size_t i = 0; i < seq_ids.size(); ++i) {
        batch.seq_id[batch.n_tokens][i] = seq_ids[i];
    }
    batch.logits  [batch.n_tokens] = logits;

    batch.n_tokens++;
}

Helper function used in embedding example:

static void batch_add_seq(llama_batch & batch, const std::vector<int32_t> & tokens, llama_seq_id seq_id) {
    size_t n_tokens = tokens.size();
    for (size_t i = 0; i < n_tokens; i++) {
        common_batch_add(batch, tokens[i], i, { seq_id }, true);
    }
}

I/O Contract

Direction Description
Input (batch_add) Mutable reference to llama_batch; token ID (llama_token); position (llama_pos); sequence ID vector (std::vector<llama_seq_id>); logits flag (bool)
Output (batch_add) Batch modified in-place with one additional token at index n_tokens; n_tokens incremented by 1
Input (batch_clear) Mutable reference to llama_batch
Output (batch_clear) batch.n_tokens set to 0 (token arrays are not zeroed, only the count is reset)
Preconditions Batch must have been initialized with llama_batch_init(); batch.n_tokens must be less than the allocated capacity (asserted)
Error Handling GGML_ASSERT failure if batch capacity is exceeded (checks that batch.seq_id[batch.n_tokens] is non-null)

Parameter details for common_batch_add:

Parameter Type Description
batch struct llama_batch & The batch to append to (modified in-place)
id llama_token Token ID from vocabulary
pos llama_pos Position index within the sequence (starts at 0 for each sequence)
seq_ids const std::vector<llama_seq_id> & Sequence IDs this token belongs to (typically a single-element vector)
logits bool Whether to compute output (embeddings/logits) for this token position

Usage Examples

Adding a single sequence for embedding:

std::vector<llama_token> tokens = common_tokenize(vocab, "Hello world", true, true);
struct llama_batch batch = llama_batch_init(512, 0, 1);

for (size_t i = 0; i < tokens.size(); i++) {
    common_batch_add(batch, tokens[i], i, {0}, true);  // seq_id = 0
}

llama_decode(ctx, batch);
float * emb = llama_get_embeddings_seq(ctx, 0);

Adding multiple sequences in one batch:

std::vector<std::vector<llama_token>> inputs = {
    common_tokenize(vocab, "First text", true, true),
    common_tokenize(vocab, "Second text", true, true),
    common_tokenize(vocab, "Third text", true, true),
};

struct llama_batch batch = llama_batch_init(2048, 0, 1);

for (int seq = 0; seq < (int)inputs.size(); seq++) {
    for (size_t i = 0; i < inputs[seq].size(); i++) {
        common_batch_add(batch, inputs[seq][i], i, {seq}, true);
    }
}

llama_decode(ctx, batch);

// Retrieve embeddings per sequence
for (int seq = 0; seq < (int)inputs.size(); seq++) {
    float * emb = llama_get_embeddings_seq(ctx, seq);
    // process embedding...
}

Batch overflow handling pattern (from embedding example):

int s = 0; // number of prompts in current batch
for (int k = 0; k < n_prompts; k++) {
    auto & inp = inputs[k];
    const uint64_t n_toks = inp.size();

    // encode if at capacity
    if (batch.n_tokens + n_toks > n_batch || s >= n_seq_max) {
        batch_decode(ctx, batch, out, s, n_embd_out, params.embd_normalize);
        s = 0;
        common_batch_clear(batch);
    }

    // add to batch
    batch_add_seq(batch, inp, s);
    s += 1;
}

// final batch
batch_decode(ctx, batch, out, s, n_embd_out, params.embd_normalize);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment