Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Duckdb Duckdb Artifact Compression And Upload

From Leeroopedia


Field Value
sources scripts/extension-upload-single.sh, scripts/extension-upload-all.sh
domains Extension_Development, Distribution
last_updated 2026-02-07

Overview

Compressing and distributing build artifacts via cloud object storage. This principle governs the process of taking signed extension binaries, compressing them for efficient network transfer, and uploading them to versioned, platform-specific paths in S3-compatible cloud storage for end-user consumption.

Description

Once an extension binary has been built, annotated with metadata, and cryptographically signed, it must be made available for download by DuckDB users worldwide. This distribution step involves two operations:

Compression

Extension binaries are compressed before upload to reduce download times and storage costs. The compression format depends on the target platform:

  • Native platforms (Linux, macOS, Windows) -- gzip compression is used, producing .duckdb_extension.gz files. Gzip provides a good balance between compression ratio and decompression speed, and is universally supported across operating systems.
  • WebAssembly (WASM) platform -- brotli compression is used instead of gzip. Brotli achieves higher compression ratios than gzip, which is particularly beneficial for browser-based distribution where bandwidth is a premium concern, and browsers natively support Brotli content encoding.

Upload to Versioned Paths

Compressed extensions are uploaded to an S3-compatible object store using a versioned, platform-specific path scheme:

s3://<bucket>/<duckdb_version>/<platform>/<extension_name>.duckdb_extension.gz

This path structure provides:

  • Version isolation -- each DuckDB release has its own directory, preventing version conflicts
  • Platform isolation -- each platform has its own subdirectory, allowing the DuckDB runtime to construct the correct download URL based on the detected platform
  • Immutability -- once uploaded, artifacts at a given path are not overwritten (except during re-releases), enabling reliable caching and reproducibility

Why compression matters for distribution bandwidth:

  • Extension binaries can be several megabytes in size. Gzip typically achieves 50-70% compression on compiled binaries, significantly reducing download times.
  • DuckDB transparently decompresses extensions during the INSTALL command, so compression is invisible to end users.
  • The S3 path scheme allows the DuckDB runtime to construct download URLs deterministically from the DuckDB version and detected platform, requiring no index file or API endpoint.

Usage

This principle applies after signing, to distribute artifacts to end users. It is the third step in the Extension Development and Distribution workflow, following build/metadata and signing, and preceding loading verification and release promotion.

Typical scenarios:

  • CI/CD pipelines upload extensions after each release build
  • Nightly builds upload to a separate nightly bucket for pre-release testing
  • Community extension authors upload to custom repositories

Theoretical Basis

  • Gzip/Brotli compression -- standard compression algorithms that trade CPU time for reduced data size. Gzip (DEFLATE algorithm) is the universal choice for binary distribution; Brotli provides superior compression for web delivery.
  • S3-style object storage with versioned paths -- a flat key-value store where the key encodes version and platform metadata, enabling efficient lookup without a database or index service. This pattern is widely used in package registries (e.g., Maven Central, npm registry).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment