Implementation:Tensorflow Serving Tflite Session

Knowledge Sources	Tensorflow_Serving
Domains	Model Serving, TFLite, Batching
Last Updated	2026-02-13 00:00 GMT

Overview

Provides a TensorFlow Session-compatible interface for running inference on TensorFlow Lite models, with integrated batch scheduling and input splitting support.

Description

TfLiteSession extends ServingSession to provide TFLite model inference through the standard TensorFlow Session API. It manages a pool of TFLite interpreters for concurrent execution and supports automatic batching through a BasicBatchScheduler.

Key components and behaviors:

Create: Static factory method that builds a TfLiteSession from a serialized FlatBuffer model. It initializes the model, creates an interpreter pool, extracts input/output tensor maps, reads SignatureDefs from the model (or generates a default one if absent), and configures batch scheduling when multiple interpreters per pool are requested.

TfLiteBatchTask: A BatchTask subclass encapsulating input tensors, output pointers, notification mechanisms, and support for partial tasks when input splitting is used.

Run: Implements three overloads of the Session::Run interface. When no scheduler is configured, it directly invokes RunInternal. With a scheduler, it creates a TfLiteBatchTask, submits it to the batch scheduler, and waits for completion.

ProcessBatch: The batch processing callback that merges input tensors from multiple tasks, invokes RunInternal on the combined batch, splits output tensors back to individual tasks, and handles timeout expiration.

SplitTfLiteInputTask: Splits a single large input task into multiple smaller tasks that fit within batch size constraints, using an IncrementalBarrier to synchronize completion and concatenate results.

RunInternal: Core execution method that borrows an interpreter from the pool, sets input data (handling both memcpy-able and string tensor types), resizes tensors as needed, invokes the interpreter, creates output tensors, and copies results.

The module handles type conversion between TFLite and TensorFlow types, legacy tensor naming (stripping ':0' suffixes), and dynamic tensor resizing for variable batch sizes.

Usage

Use TfLiteSession when serving TFLite models in TensorFlow Serving. It is created by the SavedModelBundleFactory when a .tflite model file is detected. The session provides the standard TensorFlow Session interface, making TFLite models interchangeable with TF models from the serving infrastructure perspective.

Code Reference

Source Location

Repository: Tensorflow_Serving
Files:
- tensorflow_serving/servables/tensorflow/tflite_session.h (lines 1-212)
- tensorflow_serving/servables/tensorflow/tflite_session.cc (lines 1-792)

Signature

class TfLiteSession : public ServingSession {
 public:
  static Status Create(string&& buffer, const SessionOptions& options,
                       int num_pools, int num_interpreters_per_pool,
                       std::unique_ptr<TfLiteSession>* tflite_session,
                       ::google::protobuf::Map<string, SignatureDef>* signatures);

  Status Run(const std::vector<std::pair<string, Tensor>>& inputs,
             const std::vector<string>& output_tensor_names,
             const std::vector<string>& target_node_names,
             std::vector<Tensor>* outputs) override;

  Status Run(const RunOptions& run_options,
             const std::vector<std::pair<string, Tensor>>& inputs,
             const std::vector<string>& output_tensor_names,
             const std::vector<string>& target_node_names,
             std::vector<Tensor>* outputs, RunMetadata* run_metadata,
             const thread::ThreadPoolOptions& thread_pool_options) override;
};

Import

#include "tensorflow_serving/servables/tensorflow/tflite_session.h"

I/O Contract

Inputs

Name	Type	Required	Description
buffer	`string&&`	Yes	Serialized TFLite FlatBuffer model bytes
options	`SessionOptions`	Yes	TensorFlow session options
num_pools	`int`	Yes	Number of interpreter instances in the pool
num_interpreters_per_pool	`int`	Yes	When >1, enables batch scheduling with this concurrency
inputs	`vector<pair<string, Tensor>>`	Yes	Named input tensors for inference
output_tensor_names	`vector<string>`	Yes	Names of desired output tensors

Outputs

Name	Type	Description
tflite_session	`std::unique_ptr<TfLiteSession>*`	Created session for the TFLite model
signatures	`Map<string, SignatureDef>*`	Extracted or generated signature definitions
outputs	`vector<Tensor>*`	Output tensors from inference
return	`Status`	OK on success; errors for invalid model, tensor mismatches, or execution failures

Usage Examples

Creating and Using a TfLiteSession

string model_bytes = ReadModelFile("/path/to/model.tflite");
std::unique_ptr<TfLiteSession> session;
::google::protobuf::Map<string, SignatureDef> signatures;
TF_RETURN_IF_ERROR(TfLiteSession::Create(
    std::move(model_bytes), SessionOptions(),
    /*num_pools=*/4, /*num_interpreters_per_pool=*/1,
    &session, &signatures));

std::vector<Tensor> outputs;
TF_RETURN_IF_ERROR(session->Run(
    {{"input_tensor", input}}, {"output_tensor"}, {}, &outputs));

Related Pages

Principle:Tensorflow_Serving_TFLite_Serving

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment