Principle:Tensorflow Serving Resource Tracking
| Knowledge Sources | |
|---|---|
| Domains | Resource Management |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
An admission control pattern that tracks available and consumed resources in a serving system, making conservative reservation decisions to ensure that loading a new servable will not exceed system capacity.
Description
Resource Tracking maintains two key quantities: the total resources available in the system (fixed at creation, must be bound to specific device instances) and the currently used resources (accumulated from all active servables). When a new servable requests loading, the tracker uses a conservative approach: it overbinds the current used resources (worst-case across all device instances), adds the new servable's estimated requirements, and checks whether the result fits within total resources. If it fits, the servable's resources are reserved (added to used resources) and the load is approved. The tracker also supports full recomputation of used resources from a list of active loaders, enabling periodic reconciliation. Resource estimates come from the Loader interface's EstimateResources method, decoupling the tracker from specific servable implementations. The conservative overbinding approach trades some resource utilization efficiency for safety, ensuring the system never commits more resources than available.
Usage
Use this pattern as the admission control layer in a model serving system to decide whether new models can be loaded without exceeding memory, compute, or other resource limits. It integrates with the model manager's load/unload decision logic.
Theoretical Basis
This pattern implements admission control, a concept from queuing theory and operating systems where new work is only admitted if sufficient resources are guaranteed to be available. The conservative overbinding approach follows the principle of pessimistic resource reservation, ensuring safety at the cost of potentially underutilizing resources. The periodic recomputation via RecomputeUsedResources is a form of state reconciliation, a technique from distributed systems for correcting drift between tracked state and reality.