Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Tensorflow Serving Tflite Session

From Leeroopedia
Revision as of 13:54, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Tensorflow_Serving_Tflite_Session.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Model Serving, TFLite, Batching
Last Updated 2026-02-13 00:00 GMT

Overview

Provides a TensorFlow Session-compatible interface for running inference on TensorFlow Lite models, with integrated batch scheduling and input splitting support.

Description

TfLiteSession extends ServingSession to provide TFLite model inference through the standard TensorFlow Session API. It manages a pool of TFLite interpreters for concurrent execution and supports automatic batching through a BasicBatchScheduler.

Key components and behaviors:

  • Create: Static factory method that builds a TfLiteSession from a serialized FlatBuffer model. It initializes the model, creates an interpreter pool, extracts input/output tensor maps, reads SignatureDefs from the model (or generates a default one if absent), and configures batch scheduling when multiple interpreters per pool are requested.
  • TfLiteBatchTask: A BatchTask subclass encapsulating input tensors, output pointers, notification mechanisms, and support for partial tasks when input splitting is used.
  • Run: Implements three overloads of the Session::Run interface. When no scheduler is configured, it directly invokes RunInternal. With a scheduler, it creates a TfLiteBatchTask, submits it to the batch scheduler, and waits for completion.
  • ProcessBatch: The batch processing callback that merges input tensors from multiple tasks, invokes RunInternal on the combined batch, splits output tensors back to individual tasks, and handles timeout expiration.
  • SplitTfLiteInputTask: Splits a single large input task into multiple smaller tasks that fit within batch size constraints, using an IncrementalBarrier to synchronize completion and concatenate results.
  • RunInternal: Core execution method that borrows an interpreter from the pool, sets input data (handling both memcpy-able and string tensor types), resizes tensors as needed, invokes the interpreter, creates output tensors, and copies results.

The module handles type conversion between TFLite and TensorFlow types, legacy tensor naming (stripping ':0' suffixes), and dynamic tensor resizing for variable batch sizes.

Usage

Use TfLiteSession when serving TFLite models in TensorFlow Serving. It is created by the SavedModelBundleFactory when a .tflite model file is detected. The session provides the standard TensorFlow Session interface, making TFLite models interchangeable with TF models from the serving infrastructure perspective.

Code Reference

Source Location

  • Repository: Tensorflow_Serving
  • Files:
    • tensorflow_serving/servables/tensorflow/tflite_session.h (lines 1-212)
    • tensorflow_serving/servables/tensorflow/tflite_session.cc (lines 1-792)

Signature

class TfLiteSession : public ServingSession {
 public:
  static Status Create(string&& buffer, const SessionOptions& options,
                       int num_pools, int num_interpreters_per_pool,
                       std::unique_ptr<TfLiteSession>* tflite_session,
                       ::google::protobuf::Map<string, SignatureDef>* signatures);

  Status Run(const std::vector<std::pair<string, Tensor>>& inputs,
             const std::vector<string>& output_tensor_names,
             const std::vector<string>& target_node_names,
             std::vector<Tensor>* outputs) override;

  Status Run(const RunOptions& run_options,
             const std::vector<std::pair<string, Tensor>>& inputs,
             const std::vector<string>& output_tensor_names,
             const std::vector<string>& target_node_names,
             std::vector<Tensor>* outputs, RunMetadata* run_metadata,
             const thread::ThreadPoolOptions& thread_pool_options) override;
};

Import

#include "tensorflow_serving/servables/tensorflow/tflite_session.h"

I/O Contract

Inputs

Name Type Required Description
buffer string&& Yes Serialized TFLite FlatBuffer model bytes
options SessionOptions Yes TensorFlow session options
num_pools int Yes Number of interpreter instances in the pool
num_interpreters_per_pool int Yes When >1, enables batch scheduling with this concurrency
inputs vector<pair<string, Tensor>> Yes Named input tensors for inference
output_tensor_names vector<string> Yes Names of desired output tensors

Outputs

Name Type Description
tflite_session std::unique_ptr<TfLiteSession>* Created session for the TFLite model
signatures Map<string, SignatureDef>* Extracted or generated signature definitions
outputs vector<Tensor>* Output tensors from inference
return Status OK on success; errors for invalid model, tensor mismatches, or execution failures

Usage Examples

Creating and Using a TfLiteSession

string model_bytes = ReadModelFile("/path/to/model.tflite");
std::unique_ptr<TfLiteSession> session;
::google::protobuf::Map<string, SignatureDef> signatures;
TF_RETURN_IF_ERROR(TfLiteSession::Create(
    std::move(model_bytes), SessionOptions(),
    /*num_pools=*/4, /*num_interpreters_per_pool=*/1,
    &session, &signatures));

std::vector<Tensor> outputs;
TF_RETURN_IF_ERROR(session->Run(
    {{"input_tensor", input}}, {"output_tensor"}, {}, &outputs));

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment