Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp KV Cells

From Leeroopedia
Knowledge Sources
Domains KV_Cache, Memory
Last Updated 2026-02-15 00:00 GMT

Overview

Declares the `llama_kv_cells` class and `llama_kv_cell_ext` struct for managing per-cell metadata in the KV cache.

Description

`llama_kv_cell_ext` stores 2D spatial positions (for M-RoPE). `llama_kv_cells` manages arrays of cell state: positions (`pos`), extensions (`ext`), shift amounts (`shift`), and sequence membership (`seq` as bitsets). It tracks used cells via an ordered set and maintains per-sequence position maps (`seq_pos`) for efficient min/max position queries. It provides operations for checking emptiness, copying/restoring cell ranges (for state save/load), adding/removing sequence membership, position shifting, and finding contiguous or sparse free slots. All operations maintain consistency between the position arrays, used-set, and seq_pos maps.

Usage

Use this module as the cell-level bookkeeping layer for the KV cache. It tracks which positions are occupied, which sequences they belong to, and where free space exists for new tokens.

Code Reference

Source Location

Signature

struct llama_kv_cell_ext {
    llama_pos x = 0;
    llama_pos y = 0;
    bool is_2d_gt(llama_pos ox, llama_pos oy) const;
    void reset();
};

class llama_kv_cells {
public:
    void reset();
    void reset_shift();
    uint32_t size() const;
    void resize(uint32_t n);
    bool is_empty(uint32_t i) const;
    uint32_t get_used() const;

    // Position and sequence management
    llama_pos get_pos(uint32_t i) const;
    bool has_seq_id(uint32_t i, llama_seq_id seq_id) const;
    void set_pos(uint32_t i, llama_pos pos);
    void seq_add(uint32_t i, llama_seq_id seq_id);
    void seq_rm(uint32_t i, llama_seq_id seq_id);

    // Slot finding
    uint32_t find_contiguous(uint32_t n) const;
    // ...

private:
    std::vector<llama_pos> pos;
    std::vector<llama_kv_cell_ext> ext;
    std::vector<llama_pos> shift;
    std::vector<std::bitset<LLAMA_MAX_SEQ>> seq;
    std::set<uint32_t> used;
    std::map<llama_seq_id, std::map<llama_pos, uint32_t>> seq_pos;
    bool has_shift = false;
};

Import

#pragma once
#include "llama.h"
#include "llama-cparams.h"
#include <bitset>
#include <cassert>
#include <cstring>
#include <map>
#include <set>
#include <vector>

I/O Contract

Inputs

Name Type Required Description
n uint32_t Yes Number of cells to allocate on resize
i uint32_t Yes Cell index for per-cell operations
seq_id llama_seq_id Yes Sequence identifier for membership operations
pos llama_pos Yes Token position to assign to a cell

Outputs

Name Type Description
is_empty bool Whether a specific cell is unoccupied
get_used uint32_t Total number of occupied cells
get_pos llama_pos Position stored in a specific cell
has_seq_id bool Whether a cell belongs to a given sequence
find_contiguous uint32_t Starting index of n contiguous free cells

Usage Examples

// Initialize cells
llama_kv_cells cells;
cells.resize(4096);

// Check and manage cell state
if (cells.is_empty(i)) {
    cells.set_pos(i, new_pos);
    cells.seq_add(i, seq_id);
}

// Query cell state
llama_pos p = cells.get_pos(i);
bool belongs = cells.has_seq_id(i, seq_id);
uint32_t n_used = cells.get_used();

// Find free slots
uint32_t start = cells.find_contiguous(n_tokens);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment