Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Duckdb Duckdb VergeSort

From Leeroopedia


Knowledge Sources
Domains Sorting, Third_Party
Last Updated 2026-02-07 12:00 GMT

Overview

VergeSort is an adaptive hybrid sorting algorithm by Morwenn that detects and exploits existing sorted runs (both ascending and descending) in the input data, merging them efficiently while falling back to a configurable comparison sort for unsorted regions.

Description

The implementation resides in the duckdb_vergesort namespace and provides a general-purpose sorting algorithm with the following characteristics:

  • Run detection: The algorithm scans the input for naturally occurring sorted runs. It detects both ascending runs (where consecutive elements satisfy the comparator) and descending runs (which are reversed in-place to become ascending). Runs are identified using a custom is_sorted_until implementation.
  • Minimum run threshold: For random-access iterators, runs must span at least n / log2(n) elements to be considered significant. Shorter runs are grouped with unsorted regions to avoid excessive fragmentation. For bidirectional iterators, the threshold is fixed at 80 elements.
  • Fallback sort: Unsorted regions between detected runs are sorted using a configurable fallback sorter. In DuckDB's configuration, this defaults to quicksort (via the included detail/quicksort.h). The fallback is provided as a template parameter, allowing integration with pdqsort or ska_sort.
  • Three-way merge: The algorithm includes an optimized inplace_merge3 function that merges three sorted regions by choosing the merge order that minimizes the total number of comparisons based on the relative sizes of the regions.
  • Final merge pass: After all runs are identified and unsorted regions are sorted, the algorithm performs pairwise in-place merges using std::inplace_merge until a single sorted run remains.
  • Iterator support: The algorithm dispatches on iterator category, providing specialized implementations for both bidirectional and random-access iterators.

Usage

DuckDB uses vergesort as an adaptive front-end to its sorting pipeline. When data has significant pre-existing order (e.g., partially sorted results from index scans or time-series data), vergesort can exploit these runs to achieve near-linear performance. It is called by ska_sort's header, which includes vergesort as a dependency for its fallback path.

Code Reference

Source Location

Signature

namespace duckdb_vergesort {

// Sort with a custom comparator and fallback sorter
template<typename BidirectionalIterator, typename Compare, typename Fallback>
void vergesort(BidirectionalIterator first, BidirectionalIterator last,
               Compare compare, Fallback fallback);

// Sort with a custom fallback sorter and default comparator (std::less<T>)
template<typename BidirectionalIterator, typename Fallback>
void vergesort(BidirectionalIterator first, BidirectionalIterator last,
               Fallback fallback);

namespace detail {

// Optimized three-way in-place merge
template<typename BidirectionalIterator, typename Compare>
void inplace_merge3(BidirectionalIterator first,
                    BidirectionalIterator middle1,
                    BidirectionalIterator middle2,
                    BidirectionalIterator last,
                    Compare compare);

// Bidirectional iterator specialization
template<typename BidirectionalIterator, typename Compare>
void vergesort(BidirectionalIterator first, BidirectionalIterator last,
               Compare compare, std::bidirectional_iterator_tag);

// Random-access iterator specialization
template<typename RandomAccessIterator, typename Compare, typename Fallback>
void vergesort(RandomAccessIterator first, RandomAccessIterator last,
               Compare compare, std::random_access_iterator_tag, Fallback fallback);

} // namespace detail
} // namespace duckdb_vergesort

Import

#include "vergesort/vergesort.h"

I/O Contract

Inputs

Name Type Required Description
first BidirectionalIterator Yes Iterator to the first element of the range to sort
last BidirectionalIterator Yes Iterator past the last element of the range to sort
compare Compare No Comparison function object; defaults to std::less<T>
fallback Fallback (callable) Yes Fallback sort function called as fallback(first, last) for unsorted regions

Outputs

Name Type Description
[first, last) sorted range (in-place) The input range is sorted in-place according to the comparator

Usage Examples

#include "vergesort/vergesort.h"
#include "pdqsort/pdqsort.h"
#include <vector>
#include <functional>

// Sort a vector using vergesort with pdqsort as the fallback
std::vector<int> data = {1, 2, 3, 8, 7, 6, 4, 5, 9, 10};

// Using a lambda fallback to pdqsort
duckdb_vergesort::vergesort(data.begin(), data.end(), std::less<int>(),
    [](auto first, auto last) {
        duckdb_pdqsort::pdqsort(first, last);
    });

// Partially sorted data benefits most from vergesort:
// Runs [1,2,3] and [6,7,8] are detected and preserved,
// only the unsorted gaps are sorted with the fallback.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment