Implementation:Tensorflow Serving load servables fast h
| Knowledge Sources | |
|---|---|
| Domains | Model Serving, Core Framework |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
ConnectSourceWithFastInitialLoad connects sources to a manager with a temporarily boosted thread pool for fast initial servable loading.
Description
The functions in this header accelerate the initial loading of servables during server startup. The strategy is to temporarily increase the number of load threads in the AspiredVersionsManager to a high value (default: 4 times the number of schedulable CPUs), connect the source(s) to the manager, wait until all initial servables are loaded, and then reset the thread count to the manager's original configured value.
Two functions are provided:
ConnectSourceWithFastInitialLoad()- For a single source.ConnectSourcesWithFastInitialLoad()- For multiple sources.
Both take the manager, source(s), a ServableStateMonitor for detecting when loading is complete, a list of ServableRequest describing the initial servables to wait for, and an optional thread count override.
The internal namespace exposes helper functions for testing: GetManagerNumLoadThreads() and SetManagerNumLoadThreadsNotifier().
Usage
Use these functions during server initialization when you want to load the initial set of servables as quickly as possible by leveraging maximum CPU parallelism, without keeping that elevated thread count for the lifetime of the server.
Code Reference
Source Location
- Repository: Tensorflow_Serving
- File: tensorflow_serving/core/load_servables_fast.h
- Lines: 1-70
Signature
Status ConnectSourceWithFastInitialLoad(
AspiredVersionsManager* manager,
Source<std::unique_ptr<Loader>>* source,
ServableStateMonitor* servable_state_monitor,
const std::vector<ServableRequest>& initial_servables,
uint32 num_threads = 4 * port::NumSchedulableCPUs());
Status ConnectSourcesWithFastInitialLoad(
AspiredVersionsManager* manager,
std::vector<Source<std::unique_ptr<Loader>>*> sources,
ServableStateMonitor* servable_state_monitor,
const std::vector<ServableRequest>& initial_servables,
uint32 num_threads = 4 * port::NumSchedulableCPUs());
Import
#include "tensorflow_serving/core/load_servables_fast.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| manager | AspiredVersionsManager* | Yes | The manager to connect sources to and temporarily boost threads on |
| source / sources | Source<std::unique_ptr<Loader>>* / vector | Yes | The source(s) to connect to the manager |
| servable_state_monitor | ServableStateMonitor* | Yes | Monitor used to detect when initial servables are loaded |
| initial_servables | const std::vector<ServableRequest>& | Yes | The set of servables to wait for before reverting thread count |
| num_threads | uint32 | No | Number of temporary load threads; defaults to 4 * NumSchedulableCPUs() |
Outputs
| Name | Type | Description |
|---|---|---|
| return | Status | OK if all initial servables loaded successfully; error otherwise |
Usage Examples
Fast Initial Load at Server Startup
#include "tensorflow_serving/core/load_servables_fast.h"
using namespace tensorflow::serving;
// Assume manager, source, and monitor are already set up
std::vector<ServableRequest> initial_servables = {
ServableRequest::Latest("model_a"),
ServableRequest::Latest("model_b"),
};
TF_CHECK_OK(ConnectSourceWithFastInitialLoad(
manager.get(), source.get(), &servable_state_monitor,
initial_servables));
// All initial servables are now loaded, threads reverted to original count