Implementation:Ggml org Llama cpp Common Batch Add
| Field | Value |
|---|---|
| Implementation Name | Common Batch Add |
| Doc Type | API Doc |
| Domain | Batch Construction, Token Management |
| Description | common_batch_add(batch, id, pos, seq_ids, logits) and common_batch_clear(batch) functions for constructing token batches
|
| Related Workflow | Embedding_Extraction |
Overview
Description
The Common Batch Add implementation provides two utility functions for managing llama_batch structures: common_batch_add() appends a single token with its metadata to the batch, and common_batch_clear() resets the batch for reuse. These functions abstract the low-level array manipulation required to populate the batch structure, ensuring correct field assignment for token ID, position, sequence IDs, and the logits flag.
Usage
#include "common.h"
struct llama_batch batch = llama_batch_init(512, 0, 1);
// Add tokens for sequence 0
common_batch_add(batch, token_id, /*pos=*/0, {0}, /*logits=*/true);
common_batch_add(batch, token_id2, /*pos=*/1, {0}, /*logits=*/true);
// Process batch...
llama_decode(ctx, batch);
// Clear and reuse
common_batch_clear(batch);
Code Reference
| Field | Value |
|---|---|
| Source Location (header) | common/common.h:775-780
|
| Source Location (implementation) | common/common.cpp:1438-1455
|
| Signature (batch_add) | void common_batch_add(struct llama_batch & batch, llama_token id, llama_pos pos, const std::vector<llama_seq_id> & seq_ids, bool logits)
|
| Signature (batch_clear) | void common_batch_clear(struct llama_batch & batch)
|
| Import | #include "common.h"
|
common_batch_clear implementation:
void common_batch_clear(struct llama_batch & batch) {
batch.n_tokens = 0;
}
common_batch_add implementation:
void common_batch_add(
struct llama_batch & batch,
llama_token id,
llama_pos pos,
const std::vector<llama_seq_id> & seq_ids,
bool logits) {
GGML_ASSERT(batch.seq_id[batch.n_tokens] && "llama_batch size exceeded");
batch.token [batch.n_tokens] = id;
batch.pos [batch.n_tokens] = pos;
batch.n_seq_id[batch.n_tokens] = seq_ids.size();
for (size_t i = 0; i < seq_ids.size(); ++i) {
batch.seq_id[batch.n_tokens][i] = seq_ids[i];
}
batch.logits [batch.n_tokens] = logits;
batch.n_tokens++;
}
Helper function used in embedding example:
static void batch_add_seq(llama_batch & batch, const std::vector<int32_t> & tokens, llama_seq_id seq_id) {
size_t n_tokens = tokens.size();
for (size_t i = 0; i < n_tokens; i++) {
common_batch_add(batch, tokens[i], i, { seq_id }, true);
}
}
I/O Contract
| Direction | Description |
|---|---|
| Input (batch_add) | Mutable reference to llama_batch; token ID (llama_token); position (llama_pos); sequence ID vector (std::vector<llama_seq_id>); logits flag (bool)
|
| Output (batch_add) | Batch modified in-place with one additional token at index n_tokens; n_tokens incremented by 1
|
| Input (batch_clear) | Mutable reference to llama_batch
|
| Output (batch_clear) | batch.n_tokens set to 0 (token arrays are not zeroed, only the count is reset)
|
| Preconditions | Batch must have been initialized with llama_batch_init(); batch.n_tokens must be less than the allocated capacity (asserted)
|
| Error Handling | GGML_ASSERT failure if batch capacity is exceeded (checks that batch.seq_id[batch.n_tokens] is non-null)
|
Parameter details for common_batch_add:
| Parameter | Type | Description |
|---|---|---|
batch |
struct llama_batch & |
The batch to append to (modified in-place) |
id |
llama_token |
Token ID from vocabulary |
pos |
llama_pos |
Position index within the sequence (starts at 0 for each sequence) |
seq_ids |
const std::vector<llama_seq_id> & |
Sequence IDs this token belongs to (typically a single-element vector) |
logits |
bool |
Whether to compute output (embeddings/logits) for this token position |
Usage Examples
Adding a single sequence for embedding:
std::vector<llama_token> tokens = common_tokenize(vocab, "Hello world", true, true);
struct llama_batch batch = llama_batch_init(512, 0, 1);
for (size_t i = 0; i < tokens.size(); i++) {
common_batch_add(batch, tokens[i], i, {0}, true); // seq_id = 0
}
llama_decode(ctx, batch);
float * emb = llama_get_embeddings_seq(ctx, 0);
Adding multiple sequences in one batch:
std::vector<std::vector<llama_token>> inputs = {
common_tokenize(vocab, "First text", true, true),
common_tokenize(vocab, "Second text", true, true),
common_tokenize(vocab, "Third text", true, true),
};
struct llama_batch batch = llama_batch_init(2048, 0, 1);
for (int seq = 0; seq < (int)inputs.size(); seq++) {
for (size_t i = 0; i < inputs[seq].size(); i++) {
common_batch_add(batch, inputs[seq][i], i, {seq}, true);
}
}
llama_decode(ctx, batch);
// Retrieve embeddings per sequence
for (int seq = 0; seq < (int)inputs.size(); seq++) {
float * emb = llama_get_embeddings_seq(ctx, seq);
// process embedding...
}
Batch overflow handling pattern (from embedding example):
int s = 0; // number of prompts in current batch
for (int k = 0; k < n_prompts; k++) {
auto & inp = inputs[k];
const uint64_t n_toks = inp.size();
// encode if at capacity
if (batch.n_tokens + n_toks > n_batch || s >= n_seq_max) {
batch_decode(ctx, batch, out, s, n_embd_out, params.embd_normalize);
s = 0;
common_batch_clear(batch);
}
// add to batch
batch_add_seq(batch, inp, s);
s += 1;
}
// final batch
batch_decode(ctx, batch, out, s, n_embd_out, params.embd_normalize);