Implementation:Tensorflow Serving Tflite Session
| Knowledge Sources | |
|---|---|
| Domains | Model Serving, TFLite, Batching |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Provides a TensorFlow Session-compatible interface for running inference on TensorFlow Lite models, with integrated batch scheduling and input splitting support.
Description
TfLiteSession extends ServingSession to provide TFLite model inference through the standard TensorFlow Session API. It manages a pool of TFLite interpreters for concurrent execution and supports automatic batching through a BasicBatchScheduler.
Key components and behaviors:
- Create: Static factory method that builds a TfLiteSession from a serialized FlatBuffer model. It initializes the model, creates an interpreter pool, extracts input/output tensor maps, reads SignatureDefs from the model (or generates a default one if absent), and configures batch scheduling when multiple interpreters per pool are requested.
- TfLiteBatchTask: A BatchTask subclass encapsulating input tensors, output pointers, notification mechanisms, and support for partial tasks when input splitting is used.
- Run: Implements three overloads of the Session::Run interface. When no scheduler is configured, it directly invokes RunInternal. With a scheduler, it creates a TfLiteBatchTask, submits it to the batch scheduler, and waits for completion.
- ProcessBatch: The batch processing callback that merges input tensors from multiple tasks, invokes RunInternal on the combined batch, splits output tensors back to individual tasks, and handles timeout expiration.
- SplitTfLiteInputTask: Splits a single large input task into multiple smaller tasks that fit within batch size constraints, using an IncrementalBarrier to synchronize completion and concatenate results.
- RunInternal: Core execution method that borrows an interpreter from the pool, sets input data (handling both memcpy-able and string tensor types), resizes tensors as needed, invokes the interpreter, creates output tensors, and copies results.
The module handles type conversion between TFLite and TensorFlow types, legacy tensor naming (stripping ':0' suffixes), and dynamic tensor resizing for variable batch sizes.
Usage
Use TfLiteSession when serving TFLite models in TensorFlow Serving. It is created by the SavedModelBundleFactory when a .tflite model file is detected. The session provides the standard TensorFlow Session interface, making TFLite models interchangeable with TF models from the serving infrastructure perspective.
Code Reference
Source Location
- Repository: Tensorflow_Serving
- Files:
tensorflow_serving/servables/tensorflow/tflite_session.h(lines 1-212)tensorflow_serving/servables/tensorflow/tflite_session.cc(lines 1-792)
Signature
class TfLiteSession : public ServingSession {
public:
static Status Create(string&& buffer, const SessionOptions& options,
int num_pools, int num_interpreters_per_pool,
std::unique_ptr<TfLiteSession>* tflite_session,
::google::protobuf::Map<string, SignatureDef>* signatures);
Status Run(const std::vector<std::pair<string, Tensor>>& inputs,
const std::vector<string>& output_tensor_names,
const std::vector<string>& target_node_names,
std::vector<Tensor>* outputs) override;
Status Run(const RunOptions& run_options,
const std::vector<std::pair<string, Tensor>>& inputs,
const std::vector<string>& output_tensor_names,
const std::vector<string>& target_node_names,
std::vector<Tensor>* outputs, RunMetadata* run_metadata,
const thread::ThreadPoolOptions& thread_pool_options) override;
};
Import
#include "tensorflow_serving/servables/tensorflow/tflite_session.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| buffer | string&& |
Yes | Serialized TFLite FlatBuffer model bytes |
| options | SessionOptions |
Yes | TensorFlow session options |
| num_pools | int |
Yes | Number of interpreter instances in the pool |
| num_interpreters_per_pool | int |
Yes | When >1, enables batch scheduling with this concurrency |
| inputs | vector<pair<string, Tensor>> |
Yes | Named input tensors for inference |
| output_tensor_names | vector<string> |
Yes | Names of desired output tensors |
Outputs
| Name | Type | Description |
|---|---|---|
| tflite_session | std::unique_ptr<TfLiteSession>* |
Created session for the TFLite model |
| signatures | Map<string, SignatureDef>* |
Extracted or generated signature definitions |
| outputs | vector<Tensor>* |
Output tensors from inference |
| return | Status |
OK on success; errors for invalid model, tensor mismatches, or execution failures |
Usage Examples
Creating and Using a TfLiteSession
string model_bytes = ReadModelFile("/path/to/model.tflite");
std::unique_ptr<TfLiteSession> session;
::google::protobuf::Map<string, SignatureDef> signatures;
TF_RETURN_IF_ERROR(TfLiteSession::Create(
std::move(model_bytes), SessionOptions(),
/*num_pools=*/4, /*num_interpreters_per_pool=*/1,
&session, &signatures));
std::vector<Tensor> outputs;
TF_RETURN_IF_ERROR(session->Run(
{{"input_tensor", input}}, {"output_tensor"}, {}, &outputs));