Principle:Tensorflow Tfjs Layer Freezing
Metadata
| Field | Value |
|---|---|
| Principle Name | Tensorflow Tfjs Layer Freezing |
| Library | TensorFlow.js |
| Domains | Transfer_Learning, Optimization |
| Type | Principle |
| Implemented By | Implementation:Tensorflow_Tfjs_Layer_Trainable_Setter |
| Source | TensorFlow.js |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Layer Freezing is the practice of making specific layers in a neural network non-trainable so that their weights are not updated during backpropagation. In transfer learning, freezing the pretrained base model's layers preserves the valuable feature representations learned from the source dataset while allowing only the new task-specific head to be trained. This is a critical mechanism for preventing catastrophic forgetting -- the phenomenon where fine-tuning destroys the pretrained representations.
Description
When a pretrained model is adapted for a new task, not all layers should be updated during training. The trainable property on each layer controls whether that layer's weights participate in gradient computation and optimizer updates during backpropagation.
Setting a layer's trainable property to false has two effects:
- The layer's weights are excluded from gradient computation -- no gradients are calculated with respect to these weights.
- The layer's weights are excluded from the optimizer's update step -- even if gradients were somehow available, the optimizer will not modify these weights.
The layer continues to perform its forward computation normally. It still transforms input tensors to output tensors using its fixed weights. Only the learning (weight update) is disabled.
When to Freeze
| Scenario | Freezing Strategy | Rationale |
|---|---|---|
| Small target dataset, similar tasks | Freeze all base layers | Prevents overfitting; pretrained features are already relevant |
| Small target dataset, dissimilar tasks | Freeze early layers, unfreeze later layers | Early features are general; later features need adaptation |
| Large target dataset, similar tasks | Freeze early layers or none | Enough data to fine-tune without overfitting |
| Large target dataset, dissimilar tasks | Freeze none (or only very early layers) | The model needs substantial adaptation |
Theoretical Basis
Catastrophic Forgetting
Without freezing, updating all layers with a small target dataset can cause catastrophic forgetting: the network rapidly overwrites the general features learned during pretraining with task-specific features that overfit to the small dataset. Freezing prevents this by keeping the base layers' weights at their pretrained values.
Parameter Reduction
Freezing layers directly reduces the number of trainable parameters:
- A full MobileNet V1 has approximately 3.2 million parameters.
- Freezing all but the last 5 layers might leave only a few hundred thousand trainable parameters.
- Fewer trainable parameters means faster training, lower memory usage, and reduced overfitting risk.
Gradient Flow
When a layer is frozen, the backpropagation algorithm still passes gradients through the frozen layer (to reach any trainable layers before it in the graph), but it does not accumulate gradients for the frozen layer's own weights. This distinction is important: freezing does not block gradient flow to earlier trainable layers.
Two-Phase Training Strategy
A common and effective approach is two-phase training:
- Phase 1: Feature extraction -- Freeze all base model layers. Train only the new task head. Use a moderate learning rate (e.g., 0.001).
- Phase 2: Fine-tuning -- Unfreeze some of the later base model layers. Train the entire unfrozen portion with a very low learning rate (e.g., 0.00001) to gently adapt the pretrained features.
This two-phase approach prevents the large random gradients from the untrained task head from disrupting the pretrained weights during the initial training phase.
Usage
Layer freezing is applied in the following contexts:
- Transfer learning -- Freezing all or most pretrained layers while training a new task head.
- Progressive unfreezing -- Gradually unfreezing layers from top to bottom during training, allowing deeper layers to adapt slowly.
- Multi-task learning -- Freezing shared layers while training task-specific branches.
- Model distillation -- Freezing a teacher model while training a student model.
Practical Considerations
- Batch normalization layers -- Frozen batch normalization layers use their stored running mean and variance (inference mode) rather than batch statistics. This is important for maintaining stable predictions.
- Compile after freezing -- In some frameworks, the model must be recompiled after changing trainable flags so the optimizer is aware of the updated set of trainable weights. In TensorFlow.js, the compile method should be called after modifying trainable properties.
- Verification -- Always verify the number of trainable parameters after freezing to confirm the expected behavior.
Related Pages
- Principle:Tensorflow_Tfjs_Base_Model_Loading -- Loading the pretrained model to be frozen
- Principle:Tensorflow_Tfjs_Feature_Extraction_Layer_Selection -- Selecting which layer defines the freeze boundary
- Principle:Tensorflow_Tfjs_Task_Head_Construction -- Building trainable layers on top of frozen features
- Principle:Tensorflow_Tfjs_Fine_Tuning -- Training with frozen and unfrozen layers
- Implementation:Tensorflow_Tfjs_Layer_Trainable_Setter -- TensorFlow.js implementation of layer freezing