Heuristic:Recommenders team Recommenders TensorFlow Session Ordering
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Debugging, TensorFlow |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Critical ordering constraint: Keras models must be built after setting the TensorFlow session, or model weights become unavailable in threaded execution.
Description
When using TensorFlow 2.x in `tf.compat.v1` graph execution mode (which all Recommenders TF models use), the Keras backend session must be configured before any model construction. The TF session is created with `GPUOptions(allow_growth=True)` for dynamic GPU memory allocation, then registered as the Keras default session via `tf.compat.v1.keras.backend.set_session()`. Only after this step should `_build_graph()` be called to construct the model. Reversing this order causes model weights to silently become unavailable in multi-threaded contexts.
Usage
Apply this heuristic whenever initializing a TensorFlow-based model in the Recommenders library (NRMS, NAML, LSTUR, NPA, or any DeepRec model). This applies to both the newsrec `BaseModel` and the deeprec `BaseModel`. If you are writing a new TF-based model class, follow the same session-then-build pattern.
The Insight (Rule of Thumb)
- Action: Always call `tf.compat.v1.keras.backend.set_session(sess)` before calling `self._build_graph()`.
- Value: The session must use `GPUOptions(allow_growth=True)` to avoid pre-allocating all GPU memory.
- Trade-off: None. This is a correctness requirement, not a performance trade-off.
- Failure mode: Reversing the order causes model weights to be unavailable in threads after the session is set, leading to silent failures or incorrect predictions.
Reasoning
TensorFlow 1.x-style sessions bind variables to a specific session context. When Keras creates model weights in the default session and a new session is subsequently set, the original weights become unreachable from the new session's thread context. The Recommenders codebase explicitly documents this as an `IMPORTANT` comment in the source code, indicating it was discovered through debugging production failures.
The `allow_growth=True` GPU option is also critical: without it, TensorFlow pre-allocates all available GPU memory, preventing PyTorch models or other processes from using the GPU concurrently.
Code evidence from `recommenders/models/newsrec/models/base_model.py:61-72`:
# set GPU use with on demand growth
gpu_options = tf.compat.v1.GPUOptions(allow_growth=True)
sess = tf.compat.v1.Session(
config=tf.compat.v1.ConfigProto(gpu_options=gpu_options)
)
# set this TensorFlow session as the default session for Keras
tf.compat.v1.keras.backend.set_session(sess)
# IMPORTANT: models have to be loaded AFTER SETTING THE SESSION for keras!
# Otherwise, their weights will be unavailable in the threads after the session there has been set
self.model, self.scorer = self._build_graph()
The same pattern is used in `recommenders/models/deeprec/models/base_model.py:69-72` and `recommenders/models/ncf/ncf_singlenode.py:85-89`.