Implementation:Ggml org Llama cpp KV Cells

Knowledge Sources	Ggml_org_Llama_cpp
Domains	KV_Cache, Memory
Last Updated	2026-02-15 00:00 GMT

Overview

Declares the `llama_kv_cells` class and `llama_kv_cell_ext` struct for managing per-cell metadata in the KV cache.

Description

`llama_kv_cell_ext` stores 2D spatial positions (for M-RoPE). `llama_kv_cells` manages arrays of cell state: positions (`pos`), extensions (`ext`), shift amounts (`shift`), and sequence membership (`seq` as bitsets). It tracks used cells via an ordered set and maintains per-sequence position maps (`seq_pos`) for efficient min/max position queries. It provides operations for checking emptiness, copying/restoring cell ranges (for state save/load), adding/removing sequence membership, position shifting, and finding contiguous or sparse free slots. All operations maintain consistency between the position arrays, used-set, and seq_pos maps.

Usage

Use this module as the cell-level bookkeeping layer for the KV cache. It tracks which positions are occupied, which sequences they belong to, and where free space exists for new tokens.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: src/llama-kv-cells.h
Lines: 1-533

Signature

struct llama_kv_cell_ext {
    llama_pos x = 0;
    llama_pos y = 0;
    bool is_2d_gt(llama_pos ox, llama_pos oy) const;
    void reset();
};

class llama_kv_cells {
public:
    void reset();
    void reset_shift();
    uint32_t size() const;
    void resize(uint32_t n);
    bool is_empty(uint32_t i) const;
    uint32_t get_used() const;

    // Position and sequence management
    llama_pos get_pos(uint32_t i) const;
    bool has_seq_id(uint32_t i, llama_seq_id seq_id) const;
    void set_pos(uint32_t i, llama_pos pos);
    void seq_add(uint32_t i, llama_seq_id seq_id);
    void seq_rm(uint32_t i, llama_seq_id seq_id);

    // Slot finding
    uint32_t find_contiguous(uint32_t n) const;
    // ...

private:
    std::vector<llama_pos> pos;
    std::vector<llama_kv_cell_ext> ext;
    std::vector<llama_pos> shift;
    std::vector<std::bitset<LLAMA_MAX_SEQ>> seq;
    std::set<uint32_t> used;
    std::map<llama_seq_id, std::map<llama_pos, uint32_t>> seq_pos;
    bool has_shift = false;
};

Import

#pragma once
#include "llama.h"
#include "llama-cparams.h"
#include <bitset>
#include <cassert>
#include <cstring>
#include <map>
#include <set>
#include <vector>

I/O Contract

Inputs

Name	Type	Required	Description
n	uint32_t	Yes	Number of cells to allocate on resize
i	uint32_t	Yes	Cell index for per-cell operations
seq_id	llama_seq_id	Yes	Sequence identifier for membership operations
pos	llama_pos	Yes	Token position to assign to a cell

Outputs

Name	Type	Description
is_empty	bool	Whether a specific cell is unoccupied
get_used	uint32_t	Total number of occupied cells
get_pos	llama_pos	Position stored in a specific cell
has_seq_id	bool	Whether a cell belongs to a given sequence
find_contiguous	uint32_t	Starting index of n contiguous free cells

Usage Examples

// Initialize cells
llama_kv_cells cells;
cells.resize(4096);

// Check and manage cell state
if (cells.is_empty(i)) {
    cells.set_pos(i, new_pos);
    cells.seq_add(i, seq_id);
}

// Query cell state
llama_pos p = cells.get_pos(i);
bool belongs = cells.has_seq_id(i, seq_id);
uint32_t n_used = cells.get_used();

// Find free slots
uint32_t start = cells.find_contiguous(n_tokens);

Related Pages

Principle:Ggml_org_Llama_cpp_KVCacheManagement

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment