Implementation:Tensorflow Serving BatchingOptions Configuration

Knowledge Sources	TensorFlow Serving
Domains	Performance, Hardware_Optimization
Last Updated	2026-02-13 17:00 GMT

Overview

Concrete configuration structs for tuning batch formation and scheduling behavior to match target hardware, provided by the batching_options and streaming_batch_scheduler modules.

Description

BatchingOptions (aliased as BatchingSessionOptions) controls batch formation behavior:

allowed_batch_sizes: Restricts batch sizes to specific values; batches are padded up to the next allowed size
pad_variable_length_inputs: Enables padding for tensors with different non-batch dimensions

StreamingBatchScheduler::Options provides parameters for the streaming (low-latency) scheduler:

max_batch_size: Maximum tasks per batch (default 1000)
batch_timeout_micros: Max wait for batch fill (default 0 = single-item batches, negative = no timeout)
num_batch_threads: Processing thread count (default MaxParallelism())

Usage

Configure via the BatchingParameters text proto file specified by --batching_parameters_file, or via per-model batching_params.pbtxt in the SavedModel's assets.extra/ directory.

Code Reference

Source Location

Repository: tensorflow/serving
File: tensorflow_serving/batching/batching_options.h (L25-78)
Streaming: tensorflow_serving/batching/streaming_batch_scheduler.h (L117-156)

Signature

// BatchingOptions (batching_options.h)
struct BatchingOptions {
    // Batch sizes to allow. If empty, any size is allowed.
    // Entries must be in increasing order. Last entry must equal max_batch_size.
    std::vector<int> allowed_batch_sizes;

    // If true, pads variable-length inputs to uniform length within batch.
    bool pad_variable_length_inputs = false;
};

using BatchingSessionOptions = BatchingOptions;

// StreamingBatchScheduler::Options
struct Options {
    size_t max_batch_size = 1000;
    int64_t batch_timeout_micros = 0;     // 0 = no waiting
    int num_batch_threads = MaxParallelism();
    string thread_pool_name = "batch_threads";
    uint64_t no_tasks_wait_time_micros = 1000;
};

Import

#include "tensorflow_serving/batching/batching_options.h"
#include "tensorflow_serving/batching/streaming_batch_scheduler.h"

I/O Contract

Inputs

Name	Type	Required	Description
allowed_batch_sizes	vector<int>	No	Restricted batch sizes (empty = any size)
pad_variable_length_inputs	bool	No	Default false; pad variable-length tensors
max_batch_size	size_t	No	Default 1000; maximum batch size
batch_timeout_micros	int64_t	No	Default 0; max wait for batch fill
num_batch_threads	int	No	Default MaxParallelism(); processing threads

Outputs

Name	Type	Description
Configured options	struct	Parameters used by BatchingSession or StreamingBatchScheduler

Usage Examples

BatchingParameters Proto File

# /tmp/batching_params.txt
max_batch_size { value: 128 }
batch_timeout_micros { value: 10000 }
num_batch_threads { value: 8 }
max_enqueued_batches { value: 1000000 }
allowed_batch_sizes: 8
allowed_batch_sizes: 16
allowed_batch_sizes: 32
allowed_batch_sizes: 64
allowed_batch_sizes: 128
pad_variable_length_inputs: true

Per-Model Batching Parameters

# Place in model's SavedModel directory:
# /models/my_model/1/assets.extra/batching_params.pbtxt
max_batch_size { value: 64 }
batch_timeout_micros { value: 5000 }
num_batch_threads { value: 4 }

Related Pages

Implements Principle

Principle:Tensorflow_Serving_Hardware_Tuning

Requires Environment

Environment:Tensorflow_Serving_GPU_CUDA_Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment