Implementation:Ggml org Llama cpp Batch Header
| Knowledge Sources | |
|---|---|
| Domains | Batch_Processing |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Declares the internal `llama_ubatch` struct (micro-batch) and the `llama_batch_allocr` class for batch management during inference.
Description
The `llama_ubatch` struct holds pointers to token IDs, embeddings, positions, sequence IDs, and output flags for a subset of a batch, supporting multi-positional embeddings (M-RoPE). It optionally owns its data via a shared `data_t` struct. The `llama_batch_allocr` class provides the interface for initializing, validating, and splitting batches into ubatches with different strategies (simple, equal, per-sequence), tracking per-sequence position ranges and coupling information.
Usage
This is a core header used throughout the inference pipeline. The `llama_batch_allocr` class is used by memory implementations to split input batches into appropriately-sized micro-batches for graph computation.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: src/llama-batch.h
- Lines: 1-173
Signature
struct llama_ubatch {
bool equal_seqs() const;
bool is_pos_2d() const;
uint32_t n_tokens;
uint32_t n_seq_tokens;
uint32_t n_seqs;
uint32_t n_seqs_unq;
uint32_t n_pos;
llama_token * token;
float * embd;
llama_pos * pos;
int32_t * n_seq_id;
llama_seq_id ** seq_id;
llama_seq_id * seq_id_unq;
int32_t * seq_idx;
int8_t * output;
};
class llama_batch_allocr {
public:
llama_batch_allocr(uint32_t n_pos_per_embd);
bool init(const llama_batch & batch_inp, const llama_vocab & vocab,
const llama_memory_i * memory, uint32_t n_embd,
uint32_t n_seq_max, bool output_all);
void split_reset();
llama_ubatch split_simple(uint32_t n_ubatch);
llama_ubatch split_equal(uint32_t n_ubatch, bool sequential);
llama_ubatch split_seq(uint32_t n_ubatch);
llama_ubatch ubatch_reserve(uint32_t n_seq_tokens, uint32_t n_seqs);
};
Import
#include "llama-batch.h"
// Dependencies:
#include "llama.h"
#include "llama-cparams.h"
#include <array>
#include <vector>
#include <set>
#include <bitset>
#include <memory>
#include <unordered_map>
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| batch_inp | const llama_batch & | Yes | Input batch to sanitize and split |
| vocab | const llama_vocab & | Yes | Vocabulary for validation |
| memory | const llama_memory_i * | No | Memory instance for sequence continuity checks and position determination |
| n_embd | uint32_t | Yes | Embedding dimension |
| n_seq_max | uint32_t | Yes | Maximum number of sequences |
| n_ubatch | uint32_t | Yes | Maximum tokens per micro-batch for split operations |
| output_all | bool | Yes | Whether all tokens should be marked as output |
Outputs
| Name | Type | Description |
|---|---|---|
| llama_ubatch | struct | Micro-batch containing token IDs, positions, sequence info, and output flags |
| out_ids | std::vector<int32_t> | Output indices in the order encountered during splitting |
| n_outputs | uint32_t | Number of output tokens across all ubatches |
Usage Examples
#include "llama-batch.h"
llama_batch_allocr balloc(1);
balloc.init(batch, vocab, memory, n_embd, n_seq_max, false);
balloc.split_reset();
while (true) {
llama_ubatch ubatch = balloc.split_simple(n_ubatch);
if (ubatch.n_tokens == 0) break;
// process ubatch...
}