Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Duckdb Duckdb PDQSort

From Leeroopedia


Knowledge Sources
Domains Sorting, Third_Party
Last Updated 2026-02-07 12:00 GMT

Overview

PDQSort (Pattern-Defeating Quicksort) is a hybrid sorting algorithm by Orson Peters that combines quicksort, insertion sort, and heap sort with pattern-detection heuristics to achieve robust O(n log n) performance while gracefully handling adversarial and pre-sorted input patterns.

Description

The implementation resides in the duckdb_pdqsort namespace as a single header file. Key design elements:

  • Hybrid approach: Partitions below insertion_sort_threshold (24 elements) are sorted using insertion sort. The main algorithm uses quicksort with a fallback to partial insertion sort when the data is detected to be nearly sorted.
  • Pivot selection: Uses median-of-three for small partitions and Tukey's ninther (median of medians of three) for partitions above ninther_threshold (128 elements), which provides excellent pivot selection resistant to adversarial inputs.
  • Pattern detection: When a partition is detected as already partitioned (the pivot ends up at the same position as if the data were sorted), the algorithm attempts insertion sort with a limited number of moves (partial_insertion_sort_limit = 8). If insertion sort fails, it swaps elements to break up patterns and decrements a "bad partition" counter.
  • Bad partition limit: After log2(n) bad partitions, the algorithm avoids worst-case O(n^2) by shuffling pivot candidates, effectively defeating adversarial patterns.
  • Branchless variant: pdqsort_branchless uses branchless comparison-and-swap operations for arithmetic types, which can improve performance on modern CPUs by avoiding branch mispredictions during partitioning.
  • Block partitioning: Uses a block size of 64 elements to exploit cache locality and enable loop unrolling during the partitioning step.

Usage

DuckDB uses PDQSort as its primary in-place comparison sort. It is also used as the fallback sorter within the ska_sort and vergesort adaptive sorting algorithms included in DuckDB's third-party dependencies. PDQSort is applied in ORDER BY processing, sort-based aggregation, merge joins, and any operator that requires sorted data.

Code Reference

Source Location

Signature

namespace duckdb_pdqsort {

// Standard pdqsort with a custom comparator
template<class Iter, class Compare>
inline void pdqsort(Iter begin, Iter end, Compare comp);

// Standard pdqsort with std::less<T> (default comparator)
template<class Iter>
inline void pdqsort(Iter begin, Iter end);

// Branchless variant with a custom comparator (faster for arithmetic types)
template<class Iter, class Compare>
inline void pdqsort_branchless(Iter begin, Iter end, Compare comp);

// Branchless variant with std::less<T>
template<class Iter>
inline void pdqsort_branchless(Iter begin, Iter end);

} // namespace duckdb_pdqsort

Import

#include "pdqsort/pdqsort.h"

I/O Contract

Inputs

Name Type Required Description
begin Iter (RandomAccessIterator) Yes Iterator to the first element of the range to sort
end Iter (RandomAccessIterator) Yes Iterator past the last element of the range to sort
comp Compare No Comparison function object; defaults to std::less<T>

Outputs

Name Type Description
[begin, end) sorted range (in-place) The input range is sorted in-place according to the comparator

Internal Parameters

Constant Value Description
insertion_sort_threshold 24 Partitions below this size use insertion sort
ninther_threshold 128 Partitions above this size use Tukey's ninther for pivot selection
partial_insertion_sort_limit 8 Maximum element moves attempted for partial insertion sort on detected sorted runs
block_size 64 Block size for the partitioning loop (must be a multiple of 8)
cacheline_size 64 Assumed cache line size in bytes

Usage Examples

#include "pdqsort/pdqsort.h"
#include <vector>

// Sort a vector of integers
std::vector<int> data = {5, 3, 8, 1, 9, 2, 7, 4, 6};
duckdb_pdqsort::pdqsort(data.begin(), data.end());

// Sort with a custom comparator (descending)
duckdb_pdqsort::pdqsort(data.begin(), data.end(), std::greater<int>());

// Use branchless variant for better performance on numeric data
std::vector<double> values = {3.14, 1.41, 2.72, 1.62};
duckdb_pdqsort::pdqsort_branchless(values.begin(), values.end());

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment