Implementation:Ollama Ollama Llama Batch Types

Knowledge Sources	Ollama
Domains	Inference, Batching
Last Updated	2025-02-15 00:00 GMT

Overview

Header declaring the internal batch representation (llama_ubatch) and batch allocator/splitter (llama_batch_allocr) used during inference.

Description

Defines llama_ubatch as the internal micro-batch structure with fields for tokens, embeddings, positions (supporting multi-dimensional positions for M-RoPE), sequence IDs, unique sequence tracking, and output flags. Contains a data_t inner struct that owns the actual data vectors. The llama_batch_allocr class provides methods to initialize from user input, validate tokens and sequences, auto-fill missing data, and split into sub-batches via split_simple, split_equal, and split_seq strategies. Tracks sequence positions and coupling information across the batch.

Usage

This header defines the core data structures for the inference pipeline. The llama_ubatch is what actually gets passed to the compute graph, and proper batch management is critical for correctness and performance.

Code Reference

Source Location

Repository: Ollama
File: llama/llama.cpp/src/llama-batch.h
Lines: 1-173

Signature

struct llama_ubatch {
    bool equal_seqs() const;
    bool is_pos_2d() const;

    uint32_t b_equal_seqs;
    uint32_t n_tokens;
    uint32_t n_seq_tokens;
    uint32_t n_seqs;
    uint32_t n_seqs_unq;
    uint32_t n_pos;

    llama_token  *  token;
    float        *  embd;
    llama_pos    *  pos;
    int32_t      *  n_seq_id;
    llama_seq_id ** seq_id;
    llama_seq_id *  seq_id_unq;
    int32_t      *  seq_idx;
    int8_t       *  output;

    struct data_t { /* owning vectors */ };
    std::shared_ptr<data_t> data;
};

class llama_batch_allocr {
public:
    llama_batch_allocr(uint32_t n_pos_per_embd);
    bool init(const llama_batch & batch_inp, const llama_vocab & vocab,
              const llama_memory_i * memory, uint32_t n_embd,
              uint32_t n_seq_max, bool output_all);
    llama_ubatch split_simple(uint32_t n_ubatch);
    llama_ubatch split_equal(uint32_t n_ubatch, bool sequential);
    llama_ubatch split_seq(uint32_t n_ubatch);
};

Import

#include "llama-batch.h"

I/O Contract

Inputs

Name	Type	Required	Description
n_pos_per_embd	uint32_t	Yes	Number of position dimensions per token (1 for standard, 4 for M-RoPE)
batch_inp	const llama_batch &	Yes	User-provided batch to process
n_ubatch	uint32_t	Yes	Maximum tokens per micro-batch

Outputs

Name	Type	Description
ubatch	llama_ubatch	Micro-batch with all fields populated
equal_seqs	bool	Whether all sequence sets have equal length

Usage Examples

#include "llama-batch.h"

// Check ubatch properties
llama_ubatch ubatch = allocr.split_simple(512);
if (ubatch.equal_seqs()) {
    // Can use optimized equal-length processing
}
if (ubatch.is_pos_2d()) {
    // Multi-dimensional positions (M-RoPE)
}

// Access token data
for (uint32_t i = 0; i < ubatch.n_tokens; ++i) {
    llama_token tok = ubatch.token[i];
    llama_pos pos = ubatch.pos[i];
}

Related Pages

Principle:Ollama_Ollama_Inference_Pipeline

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment