Implementation:Tensorflow Serving BatchingOptions Configuration
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Performance, Hardware_Optimization |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
Concrete configuration structs for tuning batch formation and scheduling behavior to match target hardware, provided by the batching_options and streaming_batch_scheduler modules.
Description
BatchingOptions (aliased as BatchingSessionOptions) controls batch formation behavior:
- allowed_batch_sizes: Restricts batch sizes to specific values; batches are padded up to the next allowed size
- pad_variable_length_inputs: Enables padding for tensors with different non-batch dimensions
StreamingBatchScheduler::Options provides parameters for the streaming (low-latency) scheduler:
- max_batch_size: Maximum tasks per batch (default 1000)
- batch_timeout_micros: Max wait for batch fill (default 0 = single-item batches, negative = no timeout)
- num_batch_threads: Processing thread count (default MaxParallelism())
Usage
Configure via the BatchingParameters text proto file specified by --batching_parameters_file, or via per-model batching_params.pbtxt in the SavedModel's assets.extra/ directory.
Code Reference
Source Location
- Repository: tensorflow/serving
- File: tensorflow_serving/batching/batching_options.h (L25-78)
- Streaming: tensorflow_serving/batching/streaming_batch_scheduler.h (L117-156)
Signature
// BatchingOptions (batching_options.h)
struct BatchingOptions {
// Batch sizes to allow. If empty, any size is allowed.
// Entries must be in increasing order. Last entry must equal max_batch_size.
std::vector<int> allowed_batch_sizes;
// If true, pads variable-length inputs to uniform length within batch.
bool pad_variable_length_inputs = false;
};
using BatchingSessionOptions = BatchingOptions;
// StreamingBatchScheduler::Options
struct Options {
size_t max_batch_size = 1000;
int64_t batch_timeout_micros = 0; // 0 = no waiting
int num_batch_threads = MaxParallelism();
string thread_pool_name = "batch_threads";
uint64_t no_tasks_wait_time_micros = 1000;
};
Import
#include "tensorflow_serving/batching/batching_options.h"
#include "tensorflow_serving/batching/streaming_batch_scheduler.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| allowed_batch_sizes | vector<int> | No | Restricted batch sizes (empty = any size) |
| pad_variable_length_inputs | bool | No | Default false; pad variable-length tensors |
| max_batch_size | size_t | No | Default 1000; maximum batch size |
| batch_timeout_micros | int64_t | No | Default 0; max wait for batch fill |
| num_batch_threads | int | No | Default MaxParallelism(); processing threads |
Outputs
| Name | Type | Description |
|---|---|---|
| Configured options | struct | Parameters used by BatchingSession or StreamingBatchScheduler |
Usage Examples
BatchingParameters Proto File
# /tmp/batching_params.txt
max_batch_size { value: 128 }
batch_timeout_micros { value: 10000 }
num_batch_threads { value: 8 }
max_enqueued_batches { value: 1000000 }
allowed_batch_sizes: 8
allowed_batch_sizes: 16
allowed_batch_sizes: 32
allowed_batch_sizes: 64
allowed_batch_sizes: 128
pad_variable_length_inputs: true
Per-Model Batching Parameters
# Place in model's SavedModel directory:
# /models/my_model/1/assets.extra/batching_params.pbtxt
max_batch_size { value: 64 }
batch_timeout_micros { value: 5000 }
num_batch_threads { value: 4 }
Related Pages
Implements Principle
Requires Environment
Uses Heuristic
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment