Implementation:Tensorflow Serving load servables fast h

Knowledge Sources	Tensorflow_Serving
Domains	Model Serving, Core Framework
Last Updated	2026-02-13 00:00 GMT

Overview

ConnectSourceWithFastInitialLoad connects sources to a manager with a temporarily boosted thread pool for fast initial servable loading.

Description

The functions in this header accelerate the initial loading of servables during server startup. The strategy is to temporarily increase the number of load threads in the AspiredVersionsManager to a high value (default: 4 times the number of schedulable CPUs), connect the source(s) to the manager, wait until all initial servables are loaded, and then reset the thread count to the manager's original configured value.

Two functions are provided:

ConnectSourceWithFastInitialLoad() - For a single source.
ConnectSourcesWithFastInitialLoad() - For multiple sources.

Both take the manager, source(s), a ServableStateMonitor for detecting when loading is complete, a list of ServableRequest describing the initial servables to wait for, and an optional thread count override.

The internal namespace exposes helper functions for testing: GetManagerNumLoadThreads() and SetManagerNumLoadThreadsNotifier().

Usage

Use these functions during server initialization when you want to load the initial set of servables as quickly as possible by leveraging maximum CPU parallelism, without keeping that elevated thread count for the lifetime of the server.

Code Reference

Source Location

Repository: Tensorflow_Serving
File: tensorflow_serving/core/load_servables_fast.h
Lines: 1-70

Signature

Status ConnectSourceWithFastInitialLoad(
    AspiredVersionsManager* manager,
    Source<std::unique_ptr<Loader>>* source,
    ServableStateMonitor* servable_state_monitor,
    const std::vector<ServableRequest>& initial_servables,
    uint32 num_threads = 4 * port::NumSchedulableCPUs());

Status ConnectSourcesWithFastInitialLoad(
    AspiredVersionsManager* manager,
    std::vector<Source<std::unique_ptr<Loader>>*> sources,
    ServableStateMonitor* servable_state_monitor,
    const std::vector<ServableRequest>& initial_servables,
    uint32 num_threads = 4 * port::NumSchedulableCPUs());

Import

#include "tensorflow_serving/core/load_servables_fast.h"

I/O Contract

Inputs

Name	Type	Required	Description
manager	AspiredVersionsManager*	Yes	The manager to connect sources to and temporarily boost threads on
source / sources	Source<std::unique_ptr<Loader>>* / vector	Yes	The source(s) to connect to the manager
servable_state_monitor	ServableStateMonitor*	Yes	Monitor used to detect when initial servables are loaded
initial_servables	const std::vector<ServableRequest>&	Yes	The set of servables to wait for before reverting thread count
num_threads	uint32	No	Number of temporary load threads; defaults to 4 * NumSchedulableCPUs()

Outputs

Name	Type	Description
return	Status	OK if all initial servables loaded successfully; error otherwise

Usage Examples

Fast Initial Load at Server Startup

#include "tensorflow_serving/core/load_servables_fast.h"

using namespace tensorflow::serving;

// Assume manager, source, and monitor are already set up
std::vector<ServableRequest> initial_servables = {
    ServableRequest::Latest("model_a"),
    ServableRequest::Latest("model_b"),
};

TF_CHECK_OK(ConnectSourceWithFastInitialLoad(
    manager.get(), source.get(), &servable_state_monitor,
    initial_servables));
// All initial servables are now loaded, threads reverted to original count

Related Pages

Principle:Tensorflow_Serving_Manager_Construction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment