Implementation:Ggml org Llama cpp KV Cells
| Knowledge Sources | |
|---|---|
| Domains | KV_Cache, Memory |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Declares the `llama_kv_cells` class and `llama_kv_cell_ext` struct for managing per-cell metadata in the KV cache.
Description
`llama_kv_cell_ext` stores 2D spatial positions (for M-RoPE). `llama_kv_cells` manages arrays of cell state: positions (`pos`), extensions (`ext`), shift amounts (`shift`), and sequence membership (`seq` as bitsets). It tracks used cells via an ordered set and maintains per-sequence position maps (`seq_pos`) for efficient min/max position queries. It provides operations for checking emptiness, copying/restoring cell ranges (for state save/load), adding/removing sequence membership, position shifting, and finding contiguous or sparse free slots. All operations maintain consistency between the position arrays, used-set, and seq_pos maps.
Usage
Use this module as the cell-level bookkeeping layer for the KV cache. It tracks which positions are occupied, which sequences they belong to, and where free space exists for new tokens.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: src/llama-kv-cells.h
- Lines: 1-533
Signature
struct llama_kv_cell_ext {
llama_pos x = 0;
llama_pos y = 0;
bool is_2d_gt(llama_pos ox, llama_pos oy) const;
void reset();
};
class llama_kv_cells {
public:
void reset();
void reset_shift();
uint32_t size() const;
void resize(uint32_t n);
bool is_empty(uint32_t i) const;
uint32_t get_used() const;
// Position and sequence management
llama_pos get_pos(uint32_t i) const;
bool has_seq_id(uint32_t i, llama_seq_id seq_id) const;
void set_pos(uint32_t i, llama_pos pos);
void seq_add(uint32_t i, llama_seq_id seq_id);
void seq_rm(uint32_t i, llama_seq_id seq_id);
// Slot finding
uint32_t find_contiguous(uint32_t n) const;
// ...
private:
std::vector<llama_pos> pos;
std::vector<llama_kv_cell_ext> ext;
std::vector<llama_pos> shift;
std::vector<std::bitset<LLAMA_MAX_SEQ>> seq;
std::set<uint32_t> used;
std::map<llama_seq_id, std::map<llama_pos, uint32_t>> seq_pos;
bool has_shift = false;
};
Import
#pragma once
#include "llama.h"
#include "llama-cparams.h"
#include <bitset>
#include <cassert>
#include <cstring>
#include <map>
#include <set>
#include <vector>
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| n | uint32_t | Yes | Number of cells to allocate on resize |
| i | uint32_t | Yes | Cell index for per-cell operations |
| seq_id | llama_seq_id | Yes | Sequence identifier for membership operations |
| pos | llama_pos | Yes | Token position to assign to a cell |
Outputs
| Name | Type | Description |
|---|---|---|
| is_empty | bool | Whether a specific cell is unoccupied |
| get_used | uint32_t | Total number of occupied cells |
| get_pos | llama_pos | Position stored in a specific cell |
| has_seq_id | bool | Whether a cell belongs to a given sequence |
| find_contiguous | uint32_t | Starting index of n contiguous free cells |
Usage Examples
// Initialize cells
llama_kv_cells cells;
cells.resize(4096);
// Check and manage cell state
if (cells.is_empty(i)) {
cells.set_pos(i, new_pos);
cells.seq_add(i, seq_id);
}
// Query cell state
llama_pos p = cells.get_pos(i);
bool belongs = cells.has_seq_id(i, seq_id);
uint32_t n_used = cells.get_used();
// Find free slots
uint32_t start = cells.find_contiguous(n_tokens);