Implementation:Ollama Ollama Llama Batch Types
| Knowledge Sources | |
|---|---|
| Domains | Inference, Batching |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Header declaring the internal batch representation (llama_ubatch) and batch allocator/splitter (llama_batch_allocr) used during inference.
Description
Defines llama_ubatch as the internal micro-batch structure with fields for tokens, embeddings, positions (supporting multi-dimensional positions for M-RoPE), sequence IDs, unique sequence tracking, and output flags. Contains a data_t inner struct that owns the actual data vectors. The llama_batch_allocr class provides methods to initialize from user input, validate tokens and sequences, auto-fill missing data, and split into sub-batches via split_simple, split_equal, and split_seq strategies. Tracks sequence positions and coupling information across the batch.
Usage
This header defines the core data structures for the inference pipeline. The llama_ubatch is what actually gets passed to the compute graph, and proper batch management is critical for correctness and performance.
Code Reference
Source Location
- Repository: Ollama
- File: llama/llama.cpp/src/llama-batch.h
- Lines: 1-173
Signature
struct llama_ubatch {
bool equal_seqs() const;
bool is_pos_2d() const;
uint32_t b_equal_seqs;
uint32_t n_tokens;
uint32_t n_seq_tokens;
uint32_t n_seqs;
uint32_t n_seqs_unq;
uint32_t n_pos;
llama_token * token;
float * embd;
llama_pos * pos;
int32_t * n_seq_id;
llama_seq_id ** seq_id;
llama_seq_id * seq_id_unq;
int32_t * seq_idx;
int8_t * output;
struct data_t { /* owning vectors */ };
std::shared_ptr<data_t> data;
};
class llama_batch_allocr {
public:
llama_batch_allocr(uint32_t n_pos_per_embd);
bool init(const llama_batch & batch_inp, const llama_vocab & vocab,
const llama_memory_i * memory, uint32_t n_embd,
uint32_t n_seq_max, bool output_all);
llama_ubatch split_simple(uint32_t n_ubatch);
llama_ubatch split_equal(uint32_t n_ubatch, bool sequential);
llama_ubatch split_seq(uint32_t n_ubatch);
};
Import
#include "llama-batch.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| n_pos_per_embd | uint32_t | Yes | Number of position dimensions per token (1 for standard, 4 for M-RoPE) |
| batch_inp | const llama_batch & | Yes | User-provided batch to process |
| n_ubatch | uint32_t | Yes | Maximum tokens per micro-batch |
Outputs
| Name | Type | Description |
|---|---|---|
| ubatch | llama_ubatch | Micro-batch with all fields populated |
| equal_seqs | bool | Whether all sequence sets have equal length |
Usage Examples
#include "llama-batch.h"
// Check ubatch properties
llama_ubatch ubatch = allocr.split_simple(512);
if (ubatch.equal_seqs()) {
// Can use optimized equal-length processing
}
if (ubatch.is_pos_2d()) {
// Multi-dimensional positions (M-RoPE)
}
// Access token data
for (uint32_t i = 0; i < ubatch.n_tokens; ++i) {
llama_token tok = ubatch.token[i];
llama_pos pos = ubatch.pos[i];
}