Principle:Tensorflow Serving Batch Scheduling Configuration
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Performance, Scheduling |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
A session-wrapping mechanism that transparently intercepts individual TensorFlow Session::Run() calls and groups them into batches for efficient execution.
Description
Batch scheduling wraps a TensorFlow Session with a BatchingSession that provides the same interface but internally batches requests. When a client calls Run(), the request is:
- Converted into a BatchingSessionTask with the input tensors
- Enqueued into a BasicBatchScheduler which groups tasks into batches
- Blocked until the batch is processed
When a batch is ready (full or timed out), ProcessBatch():
- Merges all input tensors by concatenating along the 0th (batch) dimension via MergeInputTensors()
- Executes a single session->Run() on the merged batch
- Splits output tensors back into individual results via SplitOutputTensors()
Usage
This is the core batching mechanism. It is created automatically when --enable_batching is set. Users control behavior through scheduling parameters (max_batch_size, timeout, thread count).
Theoretical Basis
# Abstract batch scheduling (NOT real implementation)
def batching_session_run(inputs):
task = BatchingSessionTask(inputs, zeroth_dim_size=batch_dim(inputs))
scheduler.enqueue(task)
task.wait() # Block until batch is processed
return task.outputs
def process_batch(batch_of_tasks):
merged_inputs = concatenate([t.inputs for t in batch_of_tasks], axis=0)
merged_outputs = original_session.run(merged_inputs)
for i, task in enumerate(batch_of_tasks):
task.outputs = slice(merged_outputs, start=offset[i], size=task.zeroth_dim_size)
task.notify_done()
Related Pages
Implemented By
Uses Heuristic
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment