Implementation:Ggml org Llama cpp Common Batch Add

Field	Value
Implementation Name	Common Batch Add
Doc Type	API Doc
Domain	Batch Construction, Token Management
Description	`common_batch_add(batch, id, pos, seq_ids, logits)` and `common_batch_clear(batch)` functions for constructing token batches
Related Workflow	Embedding_Extraction

Overview

Description

The Common Batch Add implementation provides two utility functions for managing llama_batch structures: common_batch_add() appends a single token with its metadata to the batch, and common_batch_clear() resets the batch for reuse. These functions abstract the low-level array manipulation required to populate the batch structure, ensuring correct field assignment for token ID, position, sequence IDs, and the logits flag.

Usage

#include "common.h"

struct llama_batch batch = llama_batch_init(512, 0, 1);

// Add tokens for sequence 0
common_batch_add(batch, token_id, /*pos=*/0, {0}, /*logits=*/true);
common_batch_add(batch, token_id2, /*pos=*/1, {0}, /*logits=*/true);

// Process batch...
llama_decode(ctx, batch);

// Clear and reuse
common_batch_clear(batch);

Code Reference

Field	Value
Source Location (header)	`common/common.h:775-780`
Source Location (implementation)	`common/common.cpp:1438-1455`
Signature (batch_add)	`void common_batch_add(struct llama_batch & batch, llama_token id, llama_pos pos, const std::vector<llama_seq_id> & seq_ids, bool logits)`
Signature (batch_clear)	`void common_batch_clear(struct llama_batch & batch)`
Import	`#include "common.h"`

common_batch_clear implementation:

void common_batch_clear(struct llama_batch & batch) {
    batch.n_tokens = 0;
}

common_batch_add implementation:

void common_batch_add(
                 struct llama_batch & batch,
                        llama_token   id,
                          llama_pos   pos,
    const std::vector<llama_seq_id> & seq_ids,
                               bool   logits) {
    GGML_ASSERT(batch.seq_id[batch.n_tokens] && "llama_batch size exceeded");

    batch.token   [batch.n_tokens] = id;
    batch.pos     [batch.n_tokens] = pos;
    batch.n_seq_id[batch.n_tokens] = seq_ids.size();
    for (size_t i = 0; i < seq_ids.size(); ++i) {
        batch.seq_id[batch.n_tokens][i] = seq_ids[i];
    }
    batch.logits  [batch.n_tokens] = logits;

    batch.n_tokens++;
}

Helper function used in embedding example:

static void batch_add_seq(llama_batch & batch, const std::vector<int32_t> & tokens, llama_seq_id seq_id) {
    size_t n_tokens = tokens.size();
    for (size_t i = 0; i < n_tokens; i++) {
        common_batch_add(batch, tokens[i], i, { seq_id }, true);
    }
}

I/O Contract

Direction	Description
Input (batch_add)	Mutable reference to `llama_batch`; token ID (`llama_token`); position (`llama_pos`); sequence ID vector (`std::vector<llama_seq_id>`); logits flag (`bool`)
Output (batch_add)	Batch modified in-place with one additional token at index `n_tokens`; `n_tokens` incremented by 1
Input (batch_clear)	Mutable reference to `llama_batch`
Output (batch_clear)	`batch.n_tokens` set to 0 (token arrays are not zeroed, only the count is reset)
Preconditions	Batch must have been initialized with `llama_batch_init()`; `batch.n_tokens` must be less than the allocated capacity (asserted)
Error Handling	`GGML_ASSERT` failure if batch capacity is exceeded (checks that `batch.seq_id[batch.n_tokens]` is non-null)

Parameter details for common_batch_add:

Parameter	Type	Description
`batch`	`struct llama_batch &`	The batch to append to (modified in-place)
`id`	`llama_token`	Token ID from vocabulary
`pos`	`llama_pos`	Position index within the sequence (starts at 0 for each sequence)
`seq_ids`	`const std::vector<llama_seq_id> &`	Sequence IDs this token belongs to (typically a single-element vector)
`logits`	`bool`	Whether to compute output (embeddings/logits) for this token position

Usage Examples

Adding a single sequence for embedding:

std::vector<llama_token> tokens = common_tokenize(vocab, "Hello world", true, true);
struct llama_batch batch = llama_batch_init(512, 0, 1);

for (size_t i = 0; i < tokens.size(); i++) {
    common_batch_add(batch, tokens[i], i, {0}, true);  // seq_id = 0
}

llama_decode(ctx, batch);
float * emb = llama_get_embeddings_seq(ctx, 0);

Adding multiple sequences in one batch:

std::vector<std::vector<llama_token>> inputs = {
    common_tokenize(vocab, "First text", true, true),
    common_tokenize(vocab, "Second text", true, true),
    common_tokenize(vocab, "Third text", true, true),
};

struct llama_batch batch = llama_batch_init(2048, 0, 1);

for (int seq = 0; seq < (int)inputs.size(); seq++) {
    for (size_t i = 0; i < inputs[seq].size(); i++) {
        common_batch_add(batch, inputs[seq][i], i, {seq}, true);
    }
}

llama_decode(ctx, batch);

// Retrieve embeddings per sequence
for (int seq = 0; seq < (int)inputs.size(); seq++) {
    float * emb = llama_get_embeddings_seq(ctx, seq);
    // process embedding...
}

Batch overflow handling pattern (from embedding example):

int s = 0; // number of prompts in current batch
for (int k = 0; k < n_prompts; k++) {
    auto & inp = inputs[k];
    const uint64_t n_toks = inp.size();

    // encode if at capacity
    if (batch.n_tokens + n_toks > n_batch || s >= n_seq_max) {
        batch_decode(ctx, batch, out, s, n_embd_out, params.embd_normalize);
        s = 0;
        common_batch_clear(batch);
    }

    // add to batch
    batch_add_seq(batch, inp, s);
    s += 1;
}

// final batch
batch_decode(ctx, batch, out, s, n_embd_out, params.embd_normalize);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment