Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ggml org Llama cpp Batch Header

From Leeroopedia
Knowledge Sources
Domains Batch_Processing
Last Updated 2026-02-15 00:00 GMT

Overview

Declares the internal `llama_ubatch` struct (micro-batch) and the `llama_batch_allocr` class for batch management during inference.

Description

The `llama_ubatch` struct holds pointers to token IDs, embeddings, positions, sequence IDs, and output flags for a subset of a batch, supporting multi-positional embeddings (M-RoPE). It optionally owns its data via a shared `data_t` struct. The `llama_batch_allocr` class provides the interface for initializing, validating, and splitting batches into ubatches with different strategies (simple, equal, per-sequence), tracking per-sequence position ranges and coupling information.

Usage

This is a core header used throughout the inference pipeline. The `llama_batch_allocr` class is used by memory implementations to split input batches into appropriately-sized micro-batches for graph computation.

Code Reference

Source Location

Signature

struct llama_ubatch {
    bool equal_seqs() const;
    bool is_pos_2d() const;

    uint32_t n_tokens;
    uint32_t n_seq_tokens;
    uint32_t n_seqs;
    uint32_t n_seqs_unq;
    uint32_t n_pos;

    llama_token  * token;
    float        * embd;
    llama_pos    * pos;
    int32_t      * n_seq_id;
    llama_seq_id ** seq_id;
    llama_seq_id * seq_id_unq;
    int32_t      * seq_idx;
    int8_t       * output;
};

class llama_batch_allocr {
public:
    llama_batch_allocr(uint32_t n_pos_per_embd);

    bool init(const llama_batch & batch_inp, const llama_vocab & vocab,
              const llama_memory_i * memory, uint32_t n_embd,
              uint32_t n_seq_max, bool output_all);

    void split_reset();
    llama_ubatch split_simple(uint32_t n_ubatch);
    llama_ubatch split_equal(uint32_t n_ubatch, bool sequential);
    llama_ubatch split_seq(uint32_t n_ubatch);
    llama_ubatch ubatch_reserve(uint32_t n_seq_tokens, uint32_t n_seqs);
};

Import

#include "llama-batch.h"
// Dependencies:
#include "llama.h"
#include "llama-cparams.h"
#include <array>
#include <vector>
#include <set>
#include <bitset>
#include <memory>
#include <unordered_map>

I/O Contract

Inputs

Name Type Required Description
batch_inp const llama_batch & Yes Input batch to sanitize and split
vocab const llama_vocab & Yes Vocabulary for validation
memory const llama_memory_i * No Memory instance for sequence continuity checks and position determination
n_embd uint32_t Yes Embedding dimension
n_seq_max uint32_t Yes Maximum number of sequences
n_ubatch uint32_t Yes Maximum tokens per micro-batch for split operations
output_all bool Yes Whether all tokens should be marked as output

Outputs

Name Type Description
llama_ubatch struct Micro-batch containing token IDs, positions, sequence info, and output flags
out_ids std::vector<int32_t> Output indices in the order encountered during splitting
n_outputs uint32_t Number of output tokens across all ubatches

Usage Examples

#include "llama-batch.h"

llama_batch_allocr balloc(1);
balloc.init(batch, vocab, memory, n_embd, n_seq_max, false);

balloc.split_reset();
while (true) {
    llama_ubatch ubatch = balloc.split_simple(n_ubatch);
    if (ubatch.n_tokens == 0) break;
    // process ubatch...
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment