Principle:Tensorflow Serving Lock Free Read Pattern

Knowledge Sources	Tensorflow_Serving
Domains	Concurrency
Last Updated	2026-02-13 00:00 GMT

Overview

A concurrent data access pattern that optimizes for high-throughput reads of infrequently-updated shared data by sharding read pointers across CPU cores and using atomic operations to minimize contention.

Description

The Lock-Free Read Pattern addresses the common scenario where shared data is read frequently by many threads but updated rarely. Instead of using a single mutex (which creates a contention bottleneck) or a single atomic shared_ptr (which creates cache line contention on the reference count), this pattern shards read pointers across multiple slots -- ideally one per CPU core. Each shard holds its own shared_ptr behind a per-shard mutex, so concurrent readers on different cores never contend. Updates are performed by a single writer who: (1) creates a new shared object, (2) double-buffers the shards (writing new pointers to the inactive buffer, then atomically swapping the active index), and (3) waits for all outstanding references to the old object to be released before returning it. The double-buffering ensures temporal consistency -- no reader ever sees an older version after having seen a newer one. False sharing between shards is prevented by padding each shard to at least 64 bytes (a typical cache line size).

Usage

Use this pattern for any shared data that follows a read-heavy, write-light access pattern, such as model data, configuration, or routing tables. It is the foundation for managing servable versions in production serving systems where request-handling threads must access the current model with minimal latency overhead.

Theoretical Basis

This pattern draws on several concurrency principles: Read-Copy-Update (RCU) from the Linux kernel (where readers see a consistent snapshot while writers prepare a new version), cache-line-aware data structures (padding to avoid false sharing), reference counting with epoch-based reclamation (waiting for all readers to release before reclaiming old data), and double-buffering (maintaining two copies to enable atomic switching). The per-CPU sharding strategy is inspired by per-CPU data structures in operating system kernels. The atomic index swap ensures linearizability of the update operation from the readers' perspective.

Related Pages

Implementation:Tensorflow_Serving_Fast_Read_Dynamic_Ptr

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment