Principle:Duckdb Duckdb HTTP Communication
| Knowledge Sources | |
|---|---|
| Domains | Networking, Data_Transfer |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
A request-response protocol for client-server communication over TCP/IP, providing methods for retrieving, submitting, and managing resources identified by URIs, with optional TLS encryption for secure transport.
Description
The Hypertext Transfer Protocol (HTTP) is the foundational application-layer protocol of the World Wide Web. It defines a stateless request-response model where a client sends a request message containing a method (GET, POST, PUT, DELETE, etc.), a target URI, headers, and an optional body, and the server responds with a status code, headers, and an optional body.
HTTP methods define the semantics of each request. GET retrieves a resource without side effects. POST submits data to a resource for processing. PUT replaces a resource entirely. DELETE removes a resource. HEAD retrieves only the response headers (useful for checking file existence and size). Range requests (using the Range header) allow partial retrieval of a resource, which is critical for reading portions of remote files without downloading the entire file.
HTTPS (HTTP over TLS) adds transport-layer encryption to HTTP, providing confidentiality, integrity, and server authentication. The client and server perform a TLS handshake to negotiate cipher suites and exchange certificates before any HTTP data is transmitted. This ensures that data in transit cannot be read or modified by third parties.
Content encoding allows the response body to be compressed for transfer efficiency. Common encodings include gzip, deflate, and brotli. The client advertises supported encodings in the Accept-Encoding request header, and the server indicates the applied encoding in the Content-Encoding response header.
Usage
HTTP communication is used in DuckDB for several purposes. The httpfs extension enables DuckDB to read files from HTTP/HTTPS URLs, including CSV, Parquet, and JSON files hosted on web servers or object storage services (S3, GCS, Azure Blob). DuckDB also uses HTTP for downloading extensions from its extension repository. Range requests enable DuckDB to read specific portions of remote Parquet files (column chunks) without downloading the entire file, which is essential for efficient remote query processing.
Theoretical Basis
HTTP Request/Response Format:
// HTTP Request
METHOD SP Request-URI SP HTTP-Version CRLF
Header-Field: Header-Value CRLF
...
CRLF
[Message Body]
// Example GET request:
GET /data/sales.parquet HTTP/1.1
Host: example.com
Accept-Encoding: gzip, br
Range: bytes=1024-2047
// HTTP Response
HTTP-Version SP Status-Code SP Reason-Phrase CRLF
Header-Field: Header-Value CRLF
...
CRLF
[Message Body]
// Example response:
HTTP/1.1 206 Partial Content
Content-Type: application/octet-stream
Content-Range: bytes 1024-2047/1048576
Content-Length: 1024
Range Requests: Partial resource retrieval:
// Single range request
Request: Range: bytes=0-999
Response: 206 Partial Content
Content-Range: bytes 0-999/8000
// Suffix range (last N bytes)
Request: Range: bytes=-500
Response: Content-Range: bytes 7500-7999/8000
// Use case for Parquet: read file footer
// 1. HEAD request -> get Content-Length (file size)
// 2. GET with Range: bytes=(size-8)-(size-1) -> read footer length
// 3. GET with Range: bytes=(size-footer_len)-(size-1) -> read metadata
// 4. GET with Range for specific column chunks as needed
Connection Management:
// HTTP/1.1 persistent connections (default)
Connection: keep-alive // reuse TCP connection for multiple requests
Connection: close // close after this request/response
// Connection pooling for database workloads:
pool = ConnectionPool(max_connections=N)
for each request:
conn = pool.acquire(host, port) // reuse or create
send_request(conn)
response = read_response(conn)
pool.release(conn) // return for reuse
// Benefits:
// - Avoids TCP handshake overhead per request
// - Avoids TLS handshake overhead per request
// - Critical for Parquet reading (many small range requests)
TLS Handshake: Establishing secure connection:
// Simplified TLS 1.3 handshake
Client -> Server: ClientHello
supported_versions, cipher_suites, key_share
Server -> Client: ServerHello
selected_version, selected_cipher, key_share
Server -> Client: EncryptedExtensions, Certificate, CertificateVerify, Finished
Client -> Server: Finished
// Handshake complete, encrypted HTTP begins
// After handshake, all HTTP data is encrypted:
// plaintext -> TLS record (encrypt + MAC) -> TCP