Implementation:Tensorflow Serving Util
| Knowledge Sources | |
|---|---|
| Domains | Model Serving, Utilities, Monitoring |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Provides shared utility functions for TensorFlow Serving including input serialization, one-shot tensor computation, model spec construction, resource estimation, latency recording, and metric tracking.
Description
The Util module is a foundational utility library used across TensorFlow Serving's inference pipeline. Key functions include:
Input Processing:
- InputToSerializedExampleTensor: Converts an Input protobuf (ExampleList or ExampleListWithContext) into a string Tensor of serialized examples. Uses an optimized serialization path via SerializedInput to avoid costly lazy deserialization of protobuf fields. Handles context merging for ExampleListWithContext by concatenating serialized context and example bytes.
Session Execution:
- PerformOneShotTensorComputation (two overloads): Convenience functions that serialize input, run a session, and return outputs. The first overload accepts a single input tensor name; the second accepts a set of input names (each fed the same input tensor). Both support custom thread pools and optional runtime latency output.
Model Spec:
- MakeModelSpec: Populates a ModelSpec protobuf with model name, optional signature name (defaulting to the default serving signature key), and optional version.
Resource Estimation:
- GetModelDiskSize: Performs a parallelized BFS traversal of the model directory using a ThreadPoolExecutor (256 threads) to efficiently calculate total file size.
- EstimateResourceFromPathUsingDiskState: Estimates RAM requirements using the formula: RAM = disk_size * 1.2 + 0 bytes padding.
Monitoring:
- RecordRequestExampleCount: Records example count per request in a histogram and counter metric.
- RecordModelRequestCount: Records per-model request counts by status code.
- RecordRuntimeLatency: Records TF/TFRT runtime latency by model name, API, and runtime.
- RecordRequestLatency: Records request-level latency by model name, API, and entrypoint.
- SetSignatureMethodNameCheckFeature / GetSignatureMethodNameCheckFeature: Feature flag for enabling/disabling method_name checks on SignatureDefs (disabled for native TF2 models).
- IsTfrtErrorLoggingEnabled: Checks the ENABLE_TFRT_SERVING_ERROR_LOGGING environment variable.
Set Operations:
- GetMapKeys (template): Extracts string keys from a map.
- SetDifference: Computes set difference (A \ B).
Usage
Use these utilities throughout the serving pipeline. They are consumed by classifiers, regressors, predict utilities, multi-inference runners, and the model warmup infrastructure. The monitoring functions are critical for observability in production deployments.
Code Reference
Source Location
- Repository: Tensorflow_Serving
- Files:
tensorflow_serving/servables/tensorflow/util.h(lines 1-144)tensorflow_serving/servables/tensorflow/util.cc(lines 1-375)
Signature
Status InputToSerializedExampleTensor(const Input& input, Tensor* examples);
Status PerformOneShotTensorComputation(
const RunOptions& run_options, const Input& input,
const string& input_tensor_name,
const std::vector<string>& output_tensor_names, Session* session,
std::vector<Tensor>* outputs, int* num_input_examples,
const thread::ThreadPoolOptions& thread_pool_options =
thread::ThreadPoolOptions(),
int64_t* runtime_latency = nullptr);
void MakeModelSpec(const string& model_name,
const absl::optional<string>& signature_name,
const absl::optional<int64_t>& version,
ModelSpec* model_spec);
void RecordRuntimeLatency(const string& model_name, const string& api,
const string& runtime, int64_t latency_usec);
void RecordRequestExampleCount(const string& model_name, size_t count);
Status GetModelDiskSize(const string& path, FileProbingEnv* env,
uint64_t* total_file_size);
Import
#include "tensorflow_serving/servables/tensorflow/util.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| input | Input |
Yes | Input protobuf containing ExampleList or ExampleListWithContext |
| session | Session* |
Yes | Active TensorFlow session (for PerformOneShotTensorComputation) |
| model_name | string |
Yes | Model name for metrics and ModelSpec |
| path | string |
Yes | Model directory path (for disk size estimation) |
Outputs
| Name | Type | Description |
|---|---|---|
| examples | Tensor* |
String tensor of serialized examples with shape {num_examples} |
| outputs | vector<Tensor>* |
Session::Run output tensors |
| model_spec | ModelSpec* |
Populated model specification protobuf |
| total_file_size | uint64_t* |
Total disk size of the model directory in bytes |
Usage Examples
Serializing Input and Running Computation
Input input;
// Populate input with examples
Tensor serialized_examples;
TF_RETURN_IF_ERROR(InputToSerializedExampleTensor(input, &serialized_examples));
std::vector<Tensor> outputs;
int num_examples;
TF_RETURN_IF_ERROR(PerformOneShotTensorComputation(
RunOptions(), input, "serving_default_inputs",
{"StatefulPartitionedCall:0"}, session, &outputs, &num_examples));
RecordRequestExampleCount("my_model", num_examples);