Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ollama Ollama Llama Batch

From Leeroopedia
Knowledge Sources
Domains Inference, Batching
Last Updated 2025-02-15 00:00 GMT

Overview

Implements batch allocation, validation, splitting, and management for processing groups of tokens through the model during inference.

Description

The llama_batch_allocr class handles input batch validation (checking token IDs, sequence IDs, positions), auto-generation of missing metadata (positions from memory state, default sequence IDs, output flags), and splitting large batches into smaller llama_ubatch chunks that fit within compute limits. Supports multiple splitting strategies: simple (sequential sub-batches), equal (equal-sized sequence groups), and per-sequence splitting. Tracks per-sequence position sets and sequence coupling information for correct multi-sequence handling.

Usage

Used internally by llama_context during encode/decode to manage input batches. Proper splitting ensures batches fit within hardware memory constraints while maintaining correct sequence state.

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/src/llama-batch.cpp
  • Lines: 1-917

Signature

llama_batch_allocr::llama_batch_allocr(uint32_t n_pos_per_embd);

bool llama_batch_allocr::init(
    const llama_batch & batch_inp,
    const llama_vocab & vocab,
    const llama_memory_i * memory,
    uint32_t n_embd,
    uint32_t n_seq_max,
    bool output_all);

const llama_batch & llama_batch_allocr::get_batch() const;
uint32_t llama_batch_allocr::get_n_tokens()  const;
uint32_t llama_batch_allocr::get_n_outputs() const;

void llama_batch_allocr::split_reset();
llama_ubatch llama_batch_allocr::split_simple(uint32_t n_ubatch);
llama_ubatch llama_batch_allocr::split_equal(uint32_t n_ubatch, bool sequential);
llama_ubatch llama_batch_allocr::split_seq(uint32_t n_ubatch);

Import

#include "llama-batch.h"

I/O Contract

Inputs

Name Type Required Description
batch_inp const llama_batch & Yes User-provided input batch
vocab const llama_vocab & Yes Vocabulary for token validation
memory const llama_memory_i * No Memory system for position tracking
n_ubatch uint32_t Yes Maximum micro-batch size for splitting

Outputs

Name Type Description
ubatch llama_ubatch Split micro-batch ready for compute graph
n_tokens uint32_t Total number of tokens in the batch
n_outputs uint32_t Number of output positions in the batch

Usage Examples

#include "llama-batch.h"

// Create batch allocator
llama_batch_allocr allocr(1); // 1 position per embedding

// Initialize from user input
allocr.init(batch, vocab, memory, n_embd, n_seq_max, false);

// Split into micro-batches
allocr.split_reset();
while (true) {
    llama_ubatch ubatch = allocr.split_simple(n_ubatch);
    if (ubatch.n_tokens == 0) break;
    // Process ubatch through compute graph
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment