Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Duckdb Duckdb HTTP Communication

From Leeroopedia


Knowledge Sources
Domains Networking, Data_Transfer
Last Updated 2026-02-07 12:00 GMT

Overview

A request-response protocol for client-server communication over TCP/IP, providing methods for retrieving, submitting, and managing resources identified by URIs, with optional TLS encryption for secure transport.

Description

The Hypertext Transfer Protocol (HTTP) is the foundational application-layer protocol of the World Wide Web. It defines a stateless request-response model where a client sends a request message containing a method (GET, POST, PUT, DELETE, etc.), a target URI, headers, and an optional body, and the server responds with a status code, headers, and an optional body.

HTTP methods define the semantics of each request. GET retrieves a resource without side effects. POST submits data to a resource for processing. PUT replaces a resource entirely. DELETE removes a resource. HEAD retrieves only the response headers (useful for checking file existence and size). Range requests (using the Range header) allow partial retrieval of a resource, which is critical for reading portions of remote files without downloading the entire file.

HTTPS (HTTP over TLS) adds transport-layer encryption to HTTP, providing confidentiality, integrity, and server authentication. The client and server perform a TLS handshake to negotiate cipher suites and exchange certificates before any HTTP data is transmitted. This ensures that data in transit cannot be read or modified by third parties.

Content encoding allows the response body to be compressed for transfer efficiency. Common encodings include gzip, deflate, and brotli. The client advertises supported encodings in the Accept-Encoding request header, and the server indicates the applied encoding in the Content-Encoding response header.

Usage

HTTP communication is used in DuckDB for several purposes. The httpfs extension enables DuckDB to read files from HTTP/HTTPS URLs, including CSV, Parquet, and JSON files hosted on web servers or object storage services (S3, GCS, Azure Blob). DuckDB also uses HTTP for downloading extensions from its extension repository. Range requests enable DuckDB to read specific portions of remote Parquet files (column chunks) without downloading the entire file, which is essential for efficient remote query processing.

Theoretical Basis

HTTP Request/Response Format:

// HTTP Request
METHOD SP Request-URI SP HTTP-Version CRLF
Header-Field: Header-Value CRLF
...
CRLF
[Message Body]

// Example GET request:
GET /data/sales.parquet HTTP/1.1
Host: example.com
Accept-Encoding: gzip, br
Range: bytes=1024-2047

// HTTP Response
HTTP-Version SP Status-Code SP Reason-Phrase CRLF
Header-Field: Header-Value CRLF
...
CRLF
[Message Body]

// Example response:
HTTP/1.1 206 Partial Content
Content-Type: application/octet-stream
Content-Range: bytes 1024-2047/1048576
Content-Length: 1024

Range Requests: Partial resource retrieval:

// Single range request
Request:  Range: bytes=0-999
Response: 206 Partial Content
          Content-Range: bytes 0-999/8000

// Suffix range (last N bytes)
Request:  Range: bytes=-500
Response: Content-Range: bytes 7500-7999/8000

// Use case for Parquet: read file footer
// 1. HEAD request -> get Content-Length (file size)
// 2. GET with Range: bytes=(size-8)-(size-1) -> read footer length
// 3. GET with Range: bytes=(size-footer_len)-(size-1) -> read metadata
// 4. GET with Range for specific column chunks as needed

Connection Management:

// HTTP/1.1 persistent connections (default)
Connection: keep-alive    // reuse TCP connection for multiple requests
Connection: close         // close after this request/response

// Connection pooling for database workloads:
pool = ConnectionPool(max_connections=N)
for each request:
    conn = pool.acquire(host, port)  // reuse or create
    send_request(conn)
    response = read_response(conn)
    pool.release(conn)               // return for reuse

// Benefits:
// - Avoids TCP handshake overhead per request
// - Avoids TLS handshake overhead per request
// - Critical for Parquet reading (many small range requests)

TLS Handshake: Establishing secure connection:

// Simplified TLS 1.3 handshake
Client -> Server: ClientHello
    supported_versions, cipher_suites, key_share

Server -> Client: ServerHello
    selected_version, selected_cipher, key_share
Server -> Client: EncryptedExtensions, Certificate, CertificateVerify, Finished

Client -> Server: Finished
// Handshake complete, encrypted HTTP begins

// After handshake, all HTTP data is encrypted:
// plaintext -> TLS record (encrypt + MAC) -> TCP

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment