Implementation:Duckdb Duckdb SkipList
| Knowledge Sources | |
|---|---|
| Domains | Data_Structures, Third_Party |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
SkipList is a generic ordered data structure by Paul Ross that maintains elements in sorted order with O(log n) insert, remove, and indexed access, used by DuckDB to implement efficient rolling median and other order-statistic operations.
Description
The implementation resides in the OrderedStructs::SkipList namespace and consists of several template classes:
- HeadNode<T, _Compare> -- The main skip list container and entry point. It manages a probabilistic multi-level linked list where level 0 contains all nodes, and each higher level ideally spans every 2^n nodes. The HeadNode provides:
insert(const T &value)-- Insert a value, maintaining sorted orderremove(const T &value)-- Remove and return a valuehas(const T &value)-- Check for existenceat(size_t index)-- Indexed access in O(log n) time via width-based traversalindex(const T &value)-- Find the index of a valuesize()-- Return the number of elements
- Node<T, _Compare> -- Represents an individual element in the skip list. Each node has a value and a
SwappableNodeRefStackof forward references to other nodes at various levels.
- SwappableNodeRefStack<T, _Compare> / NodeRef<T, _Compare> -- Manages the per-node stack of forward references. Each
NodeRefholds a pointer to the next node at that level plus a width (the number of level-0 nodes spanned), enabling efficient O(log n) indexed access by accumulating widths during traversal.
- RollingMedian -- A utility built on top of the skip list that computes rolling medians over a sliding window. It provides
dest_size()anddest_count()helpers for sizing the output buffer and error-checked computation viaRollingMedianResultcodes.
Node heights are determined probabilistically at insertion time by a virtual coin toss, creating the multi-level structure without requiring explicit rebalancing.
Usage
DuckDB uses the SkipList primarily for window function computations that require order-statistic queries, particularly the MEDIAN window function and other quantile-based operations over sliding windows. The O(log n) indexed access makes it efficient for repeatedly finding the middle element as the window slides.
Code Reference
Source Location
- Repository: Duckdb_Duckdb
- Files:
Signature
namespace OrderedStructs {
namespace SkipList {
template <typename T, typename _Compare = std::less<T>>
class HeadNode {
public:
HeadNode(_Compare cmp = _Compare());
virtual ~HeadNode();
// Lookup
bool has(const T &value) const;
const T &at(size_t index) const;
void at(size_t index, size_t count, std::vector<T> &dest) const;
size_t index(const T &value) const;
size_t size() const;
// Modification
void insert(const T &value);
T remove(const T &value);
// Inspection
size_t height() const;
size_t height(size_t idx) const;
size_t width(size_t idx, size_t level) const;
IntegrityCheck lacksIntegrity() const;
size_t size_of() const;
};
} // namespace SkipList
namespace RollingMedian {
enum RollingMedianResult {
ROLLING_MEDIAN_SUCCESS = 0,
ROLLING_MEDIAN_SOURCE_STRIDE,
ROLLING_MEDIAN_DESTINATION_STRIDE,
ROLLING_MEDIAN_WIN_LENGTH,
};
// Helpers
size_t dest_size(size_t count, size_t win_length, size_t dest_stride);
size_t dest_count(size_t count, size_t win_length);
} // namespace RollingMedian
} // namespace OrderedStructs
Import
#include "skiplist/SkipList.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| value | const T& | Yes (insert/remove/has) | The value to insert, remove, or search for |
| index | size_t | Yes (at) | Zero-based index for positional access into the sorted sequence |
| count | size_t | Yes (at range) | Number of consecutive values to retrieve starting from index |
| cmp | _Compare | No | Comparison function for ordering; defaults to std::less<T>
|
Outputs
| Name | Type | Description |
|---|---|---|
| return (has) | bool | True if the value exists in the skip list |
| return (at) | const T& | The value at the given sorted index; throws IndexError if out of range |
| return (remove) | T | The removed value; throws ValueError if not present |
| return (index) | size_t | Index of the first occurrence; throws ValueError if not found |
| return (size) | size_t | Number of elements currently in the skip list |
Usage Examples
#include "skiplist/SkipList.h"
// Create a skip list of doubles
OrderedStructs::SkipList::HeadNode<double> sl;
// Insert values
sl.insert(42.0);
sl.insert(21.0);
sl.insert(84.0);
sl.insert(63.0);
// Size and indexed access
size_t n = sl.size(); // 4
double median = sl.at(n / 2); // O(log n) access to the median element
// Check existence and find index
bool found = sl.has(42.0); // true
size_t idx = sl.index(42.0); // 1 (sorted position)
// Remove a value
double removed = sl.remove(21.0);
// Rolling median example: maintain a sliding window
OrderedStructs::SkipList::HeadNode<double> window;
// Add elements to the window
window.insert(5.0);
window.insert(3.0);
window.insert(8.0);
// Get median
double med = window.at(window.size() / 2);
// Slide: remove oldest, add newest
window.remove(5.0);
window.insert(1.0);