Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Duckdb Duckdb SKA Sort

From Leeroopedia


Knowledge Sources
Domains Sorting, Third_Party
Last Updated 2026-02-07 12:00 GMT

Overview

SKA Sort is an in-place radix sort implementation by Malte Skarupke that automatically decomposes keys byte-by-byte to achieve O(n * k) sorting performance (where k is the key size in bytes), falling back to comparison-based sorting for small partitions.

Description

The implementation resides in the duckdb_ska_sort namespace as a single header file. Key design elements:

  • Hybrid radix sort: The algorithm performs MSD (Most Significant Digit) radix sort by extracting one byte at a time from the key using a customizable ExtractKey functor. For each byte position, it uses a counting sort pass to partition elements into 256 buckets.
  • Adaptive fallback: Partitions smaller than StdSortThreshold (128 elements) fall back to pdqsort (pattern-defeating quicksort) for comparison-based sorting. Partitions smaller than AmericanFlagSortThreshold (1024 elements) use American flag sort (an in-place variant of MSD radix sort) instead of the full counting sort.
  • Type decomposition: The library provides SubKey template specializations that know how to decompose various types into their byte representations for radix sorting. This includes support for:
    • Integral types (signed and unsigned)
    • Floating-point types (via to_unsigned_or_bool conversion)
    • Booleans, chars, and wide character types
    • Tuples and pairs (via recursive decomposition)
    • Strings (variable-length, sorted lexicographically byte-by-byte)
  • Counting sort optimization: The counting sort implementation includes a fast-path: if all elements map to the same bucket (only one nonzero count), the data movement step is skipped entirely.
  • In-place operation: Unlike out-of-place radix sorts, ska_sort modifies the input range directly, using the InplaceSorter template to manage recursive partitioning without allocating additional arrays.

Usage

DuckDB uses ska_sort for sorting operations where the key type has a natural byte decomposition (e.g., integers, fixed-width types). The radix-based approach avoids comparison overhead entirely for these types, providing significant speedups over comparison sorts for large datasets. It integrates with pdqsort and vergesort as part of DuckDB's sorting strategy.

Code Reference

Source Location

Signature

namespace duckdb_ska_sort {

// Sort a range using radix sort with a custom key extraction function
template<typename It, typename ExtractKey>
static void ska_sort(It begin, It end, ExtractKey && extract_key);

// Sort a range using radix sort with identity key extraction
template<typename It>
static void ska_sort(It begin, It end);

namespace detail {

// Internal: counting sort for a single byte pass
template<typename count_type, typename It, typename OutIt, typename ExtractKey>
void counting_sort_impl(It begin, It end, OutIt out_begin, ExtractKey && extract_key);

// Internal: in-place radix sort with configurable thresholds
template<size_t StdSortThreshold, size_t AmericanFlagSortThreshold, typename It, typename ExtractKey>
void inplace_radix_sort(It begin, It end, ExtractKey & extract_key);

// Identity functor for default key extraction
struct IdentityFunctor {
    template<typename T>
    typename std::remove_reference<T>::type && operator()(T && i) const;
};

} // namespace detail
} // namespace duckdb_ska_sort

Import

#include "ska_sort/ska_sort.hpp"

I/O Contract

Inputs

Name Type Required Description
begin It (RandomAccessIterator) Yes Iterator to the first element of the range to sort
end It (RandomAccessIterator) Yes Iterator past the last element of the range to sort
extract_key ExtractKey (callable) No Function that extracts the sort key from each element; defaults to identity

Outputs

Name Type Description
[begin, end) sorted range (in-place) The input range is sorted in ascending order of the extracted keys

Internal Parameters

Constant Value Description
StdSortThreshold 128 Partitions below this size fall back to pdqsort
AmericanFlagSortThreshold 1024 Partitions below this size use American flag sort instead of full counting sort

Usage Examples

#include "ska_sort/ska_sort.hpp"
#include <vector>

// Sort a vector of integers using radix sort
std::vector<int> data = {50, 30, 80, 10, 90, 20};
duckdb_ska_sort::ska_sort(data.begin(), data.end());

// Sort with a custom key extraction function
struct Record {
    int key;
    std::string value;
};
std::vector<Record> records = {{3, "c"}, {1, "a"}, {2, "b"}};
duckdb_ska_sort::ska_sort(records.begin(), records.end(),
    [](const Record &r) { return r.key; });

// Sort unsigned 64-bit integers (radix sort excels here)
std::vector<uint64_t> ids = {999999, 123456, 654321, 111111};
duckdb_ska_sort::ska_sort(ids.begin(), ids.end());

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment