Implementation:Tensorflow Serving Batching CLI Configuration

Knowledge Sources	TensorFlow Serving TF Serving Performance
Domains	Performance, Configuration
Last Updated	2026-02-13 17:00 GMT

Overview

Concrete configuration pattern for enabling and parameterizing request batching via tensorflow_model_server CLI flags.

Description

Batching is activated through CLI flags on the tensorflow_model_server binary. The --enable_batching flag wraps all model sessions with BatchingSession. The --batching_parameters_file points to a text-format protobuf file containing BatchingParameters (max_batch_size, batch_timeout_micros, etc.). Alternatively, --enable_per_model_batching_parameters reads a batching_params.pbtxt file from each model's SavedModel assets.extra/ directory.

These flags populate the Server::Options struct which is passed to ServerCore.

Usage

Set these flags when launching tensorflow_model_server or configure the equivalent Docker environment variables. The batching parameters file is optional; sensible defaults are used when only --enable_batching is specified.

Code Reference

Source Location

Repository: tensorflow/serving
File: tensorflow_serving/model_servers/main.cc (L107-130 flag definitions)
Header: tensorflow_serving/model_servers/server.h (L39-111 Options struct)

Signature

// CLI flags (defined in main.cc)
--enable_batching            // bool, default: false
--batching_parameters_file   // string, default: ""
--enable_per_model_batching_parameters  // bool, default: false
--enable_model_warmup        // bool, default: true

// Corresponding Server::Options fields
struct Server::Options {
    bool enable_batching = false;                    // L59
    bool enable_per_model_batching_params = false;   // L60
    string batching_parameters_file;                 // L64
    bool enable_model_warmup = true;                 // L91
};

Import

# No code import — these are CLI flags on the binary
tensorflow_model_server --enable_batching=true ...

I/O Contract

Inputs

Name	Type	Required	Description
--enable_batching	bool	No	Activates batching (default false)
--batching_parameters_file	string	No	Path to BatchingParameters text proto file
--enable_per_model_batching_parameters	bool	No	Read per-model batching params from SavedModel

Outputs

Name	Type	Description
Server::Options	struct	Populated configuration passed to ServerCore

Usage Examples

Enable Batching with Default Parameters

tensorflow_model_server \
    --port=8500 \
    --model_name=my_model \
    --model_base_path=/models/my_model \
    --enable_batching=true

With Custom Batching Parameters File

# Create batching_params.txt
cat > /tmp/batching_params.txt << 'EOF'
max_batch_size { value: 128 }
batch_timeout_micros { value: 10000 }
num_batch_threads { value: 8 }
max_enqueued_batches { value: 1000000 }
EOF

tensorflow_model_server \
    --port=8500 \
    --model_name=my_model \
    --model_base_path=/models/my_model \
    --enable_batching=true \
    --batching_parameters_file=/tmp/batching_params.txt

Docker with Batching

docker run -p 8500:8500 -p 8501:8501 \
    --mount type=bind,source=/models/my_model,target=/models/my_model \
    -t tensorflow/serving \
    --model_name=my_model \
    --enable_batching=true

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment