Principle:Tensorflow Tfjs Layer Freezing

Metadata

Field	Value
Principle Name	Tensorflow Tfjs Layer Freezing
Library	TensorFlow.js
Domains	Transfer_Learning, Optimization
Type	Principle
Implemented By	Implementation:Tensorflow_Tfjs_Layer_Trainable_Setter
Source	TensorFlow.js
Last Updated	2026-02-10 00:00 GMT

Overview

Layer Freezing is the practice of making specific layers in a neural network non-trainable so that their weights are not updated during backpropagation. In transfer learning, freezing the pretrained base model's layers preserves the valuable feature representations learned from the source dataset while allowing only the new task-specific head to be trained. This is a critical mechanism for preventing catastrophic forgetting -- the phenomenon where fine-tuning destroys the pretrained representations.

Description

When a pretrained model is adapted for a new task, not all layers should be updated during training. The trainable property on each layer controls whether that layer's weights participate in gradient computation and optimizer updates during backpropagation.

Setting a layer's trainable property to false has two effects:

The layer's weights are excluded from gradient computation -- no gradients are calculated with respect to these weights.
The layer's weights are excluded from the optimizer's update step -- even if gradients were somehow available, the optimizer will not modify these weights.

The layer continues to perform its forward computation normally. It still transforms input tensors to output tensors using its fixed weights. Only the learning (weight update) is disabled.

When to Freeze

Scenario	Freezing Strategy	Rationale
Small target dataset, similar tasks	Freeze all base layers	Prevents overfitting; pretrained features are already relevant
Small target dataset, dissimilar tasks	Freeze early layers, unfreeze later layers	Early features are general; later features need adaptation
Large target dataset, similar tasks	Freeze early layers or none	Enough data to fine-tune without overfitting
Large target dataset, dissimilar tasks	Freeze none (or only very early layers)	The model needs substantial adaptation

Theoretical Basis

Catastrophic Forgetting

Without freezing, updating all layers with a small target dataset can cause catastrophic forgetting: the network rapidly overwrites the general features learned during pretraining with task-specific features that overfit to the small dataset. Freezing prevents this by keeping the base layers' weights at their pretrained values.

Parameter Reduction

Freezing layers directly reduces the number of trainable parameters:

A full MobileNet V1 has approximately 3.2 million parameters.
Freezing all but the last 5 layers might leave only a few hundred thousand trainable parameters.
Fewer trainable parameters means faster training, lower memory usage, and reduced overfitting risk.

Gradient Flow

When a layer is frozen, the backpropagation algorithm still passes gradients through the frozen layer (to reach any trainable layers before it in the graph), but it does not accumulate gradients for the frozen layer's own weights. This distinction is important: freezing does not block gradient flow to earlier trainable layers.

Two-Phase Training Strategy

A common and effective approach is two-phase training:

Phase 1: Feature extraction -- Freeze all base model layers. Train only the new task head. Use a moderate learning rate (e.g., 0.001).
Phase 2: Fine-tuning -- Unfreeze some of the later base model layers. Train the entire unfrozen portion with a very low learning rate (e.g., 0.00001) to gently adapt the pretrained features.

This two-phase approach prevents the large random gradients from the untrained task head from disrupting the pretrained weights during the initial training phase.

Usage

Layer freezing is applied in the following contexts:

Transfer learning -- Freezing all or most pretrained layers while training a new task head.
Progressive unfreezing -- Gradually unfreezing layers from top to bottom during training, allowing deeper layers to adapt slowly.
Multi-task learning -- Freezing shared layers while training task-specific branches.
Model distillation -- Freezing a teacher model while training a student model.

Practical Considerations

Batch normalization layers -- Frozen batch normalization layers use their stored running mean and variance (inference mode) rather than batch statistics. This is important for maintaining stable predictions.
Compile after freezing -- In some frameworks, the model must be recompiled after changing trainable flags so the optimizer is aware of the updated set of trainable weights. In TensorFlow.js, the compile method should be called after modifying trainable properties.
Verification -- Always verify the number of trainable parameters after freezing to confirm the expected behavior.

Related Pages

Principle:Tensorflow_Tfjs_Base_Model_Loading -- Loading the pretrained model to be frozen
Principle:Tensorflow_Tfjs_Feature_Extraction_Layer_Selection -- Selecting which layer defines the freeze boundary
Principle:Tensorflow_Tfjs_Task_Head_Construction -- Building trainable layers on top of frozen features
Principle:Tensorflow_Tfjs_Fine_Tuning -- Training with frozen and unfrozen layers
Implementation:Tensorflow_Tfjs_Layer_Trainable_Setter -- TensorFlow.js implementation of layer freezing

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment