Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Tensorflow Serving BatchingOptions Configuration

From Leeroopedia
Knowledge Sources
Domains Performance, Hardware_Optimization
Last Updated 2026-02-13 17:00 GMT

Overview

Concrete configuration structs for tuning batch formation and scheduling behavior to match target hardware, provided by the batching_options and streaming_batch_scheduler modules.

Description

BatchingOptions (aliased as BatchingSessionOptions) controls batch formation behavior:

  • allowed_batch_sizes: Restricts batch sizes to specific values; batches are padded up to the next allowed size
  • pad_variable_length_inputs: Enables padding for tensors with different non-batch dimensions

StreamingBatchScheduler::Options provides parameters for the streaming (low-latency) scheduler:

  • max_batch_size: Maximum tasks per batch (default 1000)
  • batch_timeout_micros: Max wait for batch fill (default 0 = single-item batches, negative = no timeout)
  • num_batch_threads: Processing thread count (default MaxParallelism())

Usage

Configure via the BatchingParameters text proto file specified by --batching_parameters_file, or via per-model batching_params.pbtxt in the SavedModel's assets.extra/ directory.

Code Reference

Source Location

  • Repository: tensorflow/serving
  • File: tensorflow_serving/batching/batching_options.h (L25-78)
  • Streaming: tensorflow_serving/batching/streaming_batch_scheduler.h (L117-156)

Signature

// BatchingOptions (batching_options.h)
struct BatchingOptions {
    // Batch sizes to allow. If empty, any size is allowed.
    // Entries must be in increasing order. Last entry must equal max_batch_size.
    std::vector<int> allowed_batch_sizes;

    // If true, pads variable-length inputs to uniform length within batch.
    bool pad_variable_length_inputs = false;
};

using BatchingSessionOptions = BatchingOptions;

// StreamingBatchScheduler::Options
struct Options {
    size_t max_batch_size = 1000;
    int64_t batch_timeout_micros = 0;     // 0 = no waiting
    int num_batch_threads = MaxParallelism();
    string thread_pool_name = "batch_threads";
    uint64_t no_tasks_wait_time_micros = 1000;
};

Import

#include "tensorflow_serving/batching/batching_options.h"
#include "tensorflow_serving/batching/streaming_batch_scheduler.h"

I/O Contract

Inputs

Name Type Required Description
allowed_batch_sizes vector<int> No Restricted batch sizes (empty = any size)
pad_variable_length_inputs bool No Default false; pad variable-length tensors
max_batch_size size_t No Default 1000; maximum batch size
batch_timeout_micros int64_t No Default 0; max wait for batch fill
num_batch_threads int No Default MaxParallelism(); processing threads

Outputs

Name Type Description
Configured options struct Parameters used by BatchingSession or StreamingBatchScheduler

Usage Examples

BatchingParameters Proto File

# /tmp/batching_params.txt
max_batch_size { value: 128 }
batch_timeout_micros { value: 10000 }
num_batch_threads { value: 8 }
max_enqueued_batches { value: 1000000 }
allowed_batch_sizes: 8
allowed_batch_sizes: 16
allowed_batch_sizes: 32
allowed_batch_sizes: 64
allowed_batch_sizes: 128
pad_variable_length_inputs: true

Per-Model Batching Parameters

# Place in model's SavedModel directory:
# /models/my_model/1/assets.extra/batching_params.pbtxt
max_batch_size { value: 64 }
batch_timeout_micros { value: 5000 }
num_batch_threads { value: 4 }

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment