Implementation:Tensorflow Serving Tflite Interpreter Pool
| Knowledge Sources | |
|---|---|
| Domains | Model Serving, TFLite, Resource Pooling |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Manages a thread-safe pool of TFLite interpreter instances with optimized string tensor handling, enabling concurrent TFLite inference within TensorFlow Serving.
Description
The TFLite Interpreter Pool module provides two classes in the internal namespace that manage TFLite interpreter lifecycle and concurrency:
TfLiteInterpreterWrapper wraps a single TFLite interpreter instance with additional capabilities:
- String tensor buffer management: Maintains an internal buffer (tensor_buffer_) for each string input tensor. The SetStringData method serializes TensorFlow string tensors into TFLite's internal string format (count + offsets + data), dynamically allocating buffer memory as needed and avoiding repeated allocations by tracking maximum buffer sizes.
- CPU backend context: Configures a TFLite ExternalCpuBackendContext with caching enabled and single-thread execution to reduce contention across sessions.
- Profiling support: When compiled with TFLITE_PROFILE, supports buffered profiling with configurable event counts, profile summarization, and output writing.
- Batch size tracking: Maintains the current batch size for string tensor resize optimization.
- CreateTfLiteInterpreterWrapper: Static factory that builds an interpreter from a FlatBufferModel with the BuiltinOpResolver (including ParseExample custom op), configures the CPU backend context, and performs initial tensor allocation.
TfLiteInterpreterPool provides a mutex-protected pool of TfLiteInterpreterWrapper instances:
- GetInterpreter: Blocks (using absl::Condition) until an interpreter is available, then returns it as a unique_ptr, removing it from the pool.
- ReturnInterpreter: Returns an interpreter back to the available pool.
- CreateTfLiteInterpreterPool: Static factory that creates a pool of the specified size, initializing each interpreter wrapper from the model.
The constant kInitialBatchSize (500) is used as the default batch scheduling maximum.
Usage
Use this module as the concurrency management layer for TFLite inference. It is used internally by TfLiteSession to manage interpreter access. The pool size determines maximum concurrency for TFLite inference operations.
Code Reference
Source Location
- Repository: Tensorflow_Serving
- Files:
tensorflow_serving/servables/tensorflow/tflite_interpreter_pool.h(lines 1-162)tensorflow_serving/servables/tensorflow/tflite_interpreter_pool.cc(lines 1-203)
Signature
namespace internal {
class TfLiteInterpreterWrapper {
public:
static Status CreateTfLiteInterpreterWrapper(
const tflite::FlatBufferModel& model,
const tensorflow::SessionOptions& options,
std::unique_ptr<TfLiteInterpreterWrapper>& wrapper);
tflite::Interpreter* Get();
TfLiteStatus Invoke();
tensorflow::Status SetStringData(const std::vector<const Tensor*>& tensors,
TfLiteTensor* tflite_tensor,
int tensor_index, int batch_size);
};
class TfLiteInterpreterPool {
public:
static tensorflow::Status CreateTfLiteInterpreterPool(
const tflite::FlatBufferModel* model,
const tensorflow::SessionOptions& options, int pool_size,
std::unique_ptr<TfLiteInterpreterPool>& interpreter_pool);
std::unique_ptr<TfLiteInterpreterWrapper> GetInterpreter();
void ReturnInterpreter(
std::unique_ptr<TfLiteInterpreterWrapper> interpreter);
};
} // namespace internal
Import
#include "tensorflow_serving/servables/tensorflow/tflite_interpreter_pool.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | const tflite::FlatBufferModel* |
Yes | Compiled TFLite model to create interpreters from |
| options | tensorflow::SessionOptions |
Yes | Session options (currently used for future extension) |
| pool_size | int |
Yes | Number of interpreter instances to create in the pool |
Outputs
| Name | Type | Description |
|---|---|---|
| interpreter_pool | std::unique_ptr<TfLiteInterpreterPool>& |
Created interpreter pool ready for concurrent use |
| return | Status |
OK on success; Internal error if interpreter creation fails |
Usage Examples
Creating and Using an Interpreter Pool
std::unique_ptr<internal::TfLiteInterpreterPool> pool;
TF_RETURN_IF_ERROR(internal::TfLiteInterpreterPool::CreateTfLiteInterpreterPool(
model.get(), SessionOptions(), /*pool_size=*/4, pool));
// Borrow an interpreter (may block if none available)
auto interpreter = pool->GetInterpreter();
// Use interpreter->Get() for TFLite operations
auto status = interpreter->Invoke();
// Return to pool
pool->ReturnInterpreter(std::move(interpreter));