Implementation:Duckdb Duckdb VergeSort
| Knowledge Sources | |
|---|---|
| Domains | Sorting, Third_Party |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
VergeSort is an adaptive hybrid sorting algorithm by Morwenn that detects and exploits existing sorted runs (both ascending and descending) in the input data, merging them efficiently while falling back to a configurable comparison sort for unsorted regions.
Description
The implementation resides in the duckdb_vergesort namespace and provides a general-purpose sorting algorithm with the following characteristics:
- Run detection: The algorithm scans the input for naturally occurring sorted runs. It detects both ascending runs (where consecutive elements satisfy the comparator) and descending runs (which are reversed in-place to become ascending). Runs are identified using a custom
is_sorted_untilimplementation.
- Minimum run threshold: For random-access iterators, runs must span at least
n / log2(n)elements to be considered significant. Shorter runs are grouped with unsorted regions to avoid excessive fragmentation. For bidirectional iterators, the threshold is fixed at 80 elements.
- Fallback sort: Unsorted regions between detected runs are sorted using a configurable fallback sorter. In DuckDB's configuration, this defaults to quicksort (via the included
detail/quicksort.h). The fallback is provided as a template parameter, allowing integration with pdqsort or ska_sort.
- Three-way merge: The algorithm includes an optimized
inplace_merge3function that merges three sorted regions by choosing the merge order that minimizes the total number of comparisons based on the relative sizes of the regions.
- Final merge pass: After all runs are identified and unsorted regions are sorted, the algorithm performs pairwise in-place merges using
std::inplace_mergeuntil a single sorted run remains.
- Iterator support: The algorithm dispatches on iterator category, providing specialized implementations for both bidirectional and random-access iterators.
Usage
DuckDB uses vergesort as an adaptive front-end to its sorting pipeline. When data has significant pre-existing order (e.g., partially sorted results from index scans or time-series data), vergesort can exploit these runs to achieve near-linear performance. It is called by ska_sort's header, which includes vergesort as a dependency for its fallback path.
Code Reference
Source Location
- Repository: Duckdb_Duckdb
- Files:
Signature
namespace duckdb_vergesort {
// Sort with a custom comparator and fallback sorter
template<typename BidirectionalIterator, typename Compare, typename Fallback>
void vergesort(BidirectionalIterator first, BidirectionalIterator last,
Compare compare, Fallback fallback);
// Sort with a custom fallback sorter and default comparator (std::less<T>)
template<typename BidirectionalIterator, typename Fallback>
void vergesort(BidirectionalIterator first, BidirectionalIterator last,
Fallback fallback);
namespace detail {
// Optimized three-way in-place merge
template<typename BidirectionalIterator, typename Compare>
void inplace_merge3(BidirectionalIterator first,
BidirectionalIterator middle1,
BidirectionalIterator middle2,
BidirectionalIterator last,
Compare compare);
// Bidirectional iterator specialization
template<typename BidirectionalIterator, typename Compare>
void vergesort(BidirectionalIterator first, BidirectionalIterator last,
Compare compare, std::bidirectional_iterator_tag);
// Random-access iterator specialization
template<typename RandomAccessIterator, typename Compare, typename Fallback>
void vergesort(RandomAccessIterator first, RandomAccessIterator last,
Compare compare, std::random_access_iterator_tag, Fallback fallback);
} // namespace detail
} // namespace duckdb_vergesort
Import
#include "vergesort/vergesort.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| first | BidirectionalIterator | Yes | Iterator to the first element of the range to sort |
| last | BidirectionalIterator | Yes | Iterator past the last element of the range to sort |
| compare | Compare | No | Comparison function object; defaults to std::less<T>
|
| fallback | Fallback (callable) | Yes | Fallback sort function called as fallback(first, last) for unsorted regions
|
Outputs
| Name | Type | Description |
|---|---|---|
| [first, last) | sorted range (in-place) | The input range is sorted in-place according to the comparator |
Usage Examples
#include "vergesort/vergesort.h"
#include "pdqsort/pdqsort.h"
#include <vector>
#include <functional>
// Sort a vector using vergesort with pdqsort as the fallback
std::vector<int> data = {1, 2, 3, 8, 7, 6, 4, 5, 9, 10};
// Using a lambda fallback to pdqsort
duckdb_vergesort::vergesort(data.begin(), data.end(), std::less<int>(),
[](auto first, auto last) {
duckdb_pdqsort::pdqsort(first, last);
});
// Partially sorted data benefits most from vergesort:
// Runs [1,2,3] and [6,7,8] are detected and preserved,
// only the unsorted gaps are sorted with the fallback.