Principle:Tensorflow Serving HTTP Compression
| Knowledge Sources | |
|---|---|
| Domains | Compression |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
A data compression layer implementing the gzip format (RFC 1952) over the DEFLATE algorithm (RFC 1951) for transparent compression and decompression of HTTP request and response bodies.
Description
HTTP Compression uses the zlib library to provide gzip-format compression and decompression. The implementation consists of a state-machine-based gzip header parser that incrementally processes header bytes (handling optional fields like FEXTRA, FNAME, FCOMMENT, and FHCRC), a compression engine that prepends gzip headers and appends CRC32/size footers to DEFLATE-compressed data, and a decompression engine that strips the gzip envelope and validates footer checksums. Both one-shot and streaming (chunked) modes are supported, enabling use cases from complete body compression to chunk-transfer-encoded streaming. The implementation maintains internal zlib stream state for reuse across operations, avoiding expensive repeated initialization. Safety limits (100MB maximum uncompressed size) prevent denial-of-service via decompression bombs. The compression level, window size, and memory level are configurable for tuning the compression ratio vs. speed tradeoff.
Usage
Use this for transparent compression/decompression in HTTP server and client implementations. The HTTP server can automatically decompress gzip-encoded request bodies and compress response bodies, reducing bandwidth usage for large model prediction payloads.
Theoretical Basis
Gzip compression is based on the DEFLATE algorithm (RFC 1951), which combines LZ77 (Lempel-Ziv 1977, a dictionary-based compression scheme that replaces repeated occurrences with references to earlier data) with Huffman coding (an entropy coding scheme that assigns shorter codes to more frequent symbols). The gzip format (RFC 1952) adds a header for metadata and a footer with CRC32 checksum for integrity verification. The state machine header parser follows the finite automaton model, processing one byte at a time through a sequence of states. The streaming mode implements a producer-consumer pattern for incremental processing.