Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Duckdb Duckdb Skip List Data Structure

From Leeroopedia


Knowledge Sources
Domains Data_Structure, Probabilistic_Algorithm
Last Updated 2026-02-07 12:00 GMT

Overview

A probabilistic ordered data structure that uses multiple layers of linked lists with geometrically decreasing density to provide O(log n) expected time for search, insertion, and deletion operations.

Description

A skip list is a data structure that allows fast search, insertion, and deletion within an ordered sequence of elements. Invented by William Pugh in 1990, it provides an alternative to balanced binary search trees (AVL trees, red-black trees) with simpler implementation and comparable expected performance.

The structure consists of multiple layers of sorted linked lists. The bottom layer (level 0) contains all elements. Each higher layer contains a randomly selected subset of the elements from the layer below, with each element being promoted to the next level with probability p (typically 1/2 or 1/4). This creates a hierarchy where the top levels act as "express lanes" that allow the search to skip over large portions of the list.

Search begins at the top-left corner and proceeds right along each level until the next element would overshoot the target, then drops down one level and continues. This is analogous to a binary search, and the expected number of comparisons is O(log n). Unlike balanced trees, skip lists do not require complex rebalancing operations; the probabilistic level assignment during insertion naturally maintains balance in expectation.

Skip lists are particularly well-suited for applications that require maintaining a sorted collection with frequent insertions and deletions, such as sliding window computations where elements enter and leave the window as it moves across data.

Usage

Skip lists are used in DuckDB for efficiently computing rolling window aggregate functions that require ordered data, particularly the rolling median and quantile computations. When processing window functions like `MEDIAN() OVER (ORDER BY ... ROWS BETWEEN ...)`, the skip list maintains the current window's values in sorted order, enabling O(log n) insertion of new values and removal of expired values as the window slides, with O(1) access to the median position.

Theoretical Basis

Structure: Multiple layers of sorted linked lists:

Level 3: head --------------------------> 50 ---------> nil
Level 2: head -----------> 20 ----------> 50 --> 70 --> nil
Level 1: head --> 10 ----> 20 --> 30 ---> 50 --> 70 --> nil
Level 0: head --> 10 -> 15 -> 20 -> 30 -> 50 -> 60 -> 70 -> 80 -> nil

Search Algorithm:

function search(skip_list, target):
    node = skip_list.head
    for level in max_level downto 0:
        while node.next[level] != nil AND node.next[level].value < target:
            node = node.next[level]   // move right
        // drop down to next level
    node = node.next[0]
    if node != nil AND node.value == target:
        return node
    return NOT_FOUND

Insertion with Random Level:

function random_level(p = 0.5, max_level):
    level = 0
    while random() < p AND level < max_level:
        level += 1
    return level

function insert(skip_list, value):
    // Find insertion point at each level
    update = array[max_level]   // predecessors at each level
    node = skip_list.head
    for level in max_level downto 0:
        while node.next[level] != nil AND node.next[level].value < value:
            node = node.next[level]
        update[level] = node

    // Create new node with random level
    new_level = random_level()
    new_node = Node(value, level = new_level)
    for level in 0 to new_level:
        new_node.next[level] = update[level].next[level]
        update[level].next[level] = new_node

Complexity Analysis:

Expected height: O(log n)
Expected search time: O(log n)
Expected insertion time: O(log n)
Expected deletion time: O(log n)
Space: O(n) expected (each element in ~1/(1-p) levels on average)

With p = 1/2:
  Average pointers per node: 2
  Average search comparisons: (log2 n) / 2 + O(1)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment