Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Tensorflow Tfjs Layer Freezing

From Leeroopedia


Metadata

Field Value
Principle Name Tensorflow Tfjs Layer Freezing
Library TensorFlow.js
Domains Transfer_Learning, Optimization
Type Principle
Implemented By Implementation:Tensorflow_Tfjs_Layer_Trainable_Setter
Source TensorFlow.js
Last Updated 2026-02-10 00:00 GMT

Overview

Layer Freezing is the practice of making specific layers in a neural network non-trainable so that their weights are not updated during backpropagation. In transfer learning, freezing the pretrained base model's layers preserves the valuable feature representations learned from the source dataset while allowing only the new task-specific head to be trained. This is a critical mechanism for preventing catastrophic forgetting -- the phenomenon where fine-tuning destroys the pretrained representations.

Description

When a pretrained model is adapted for a new task, not all layers should be updated during training. The trainable property on each layer controls whether that layer's weights participate in gradient computation and optimizer updates during backpropagation.

Setting a layer's trainable property to false has two effects:

  • The layer's weights are excluded from gradient computation -- no gradients are calculated with respect to these weights.
  • The layer's weights are excluded from the optimizer's update step -- even if gradients were somehow available, the optimizer will not modify these weights.

The layer continues to perform its forward computation normally. It still transforms input tensors to output tensors using its fixed weights. Only the learning (weight update) is disabled.

When to Freeze

Scenario Freezing Strategy Rationale
Small target dataset, similar tasks Freeze all base layers Prevents overfitting; pretrained features are already relevant
Small target dataset, dissimilar tasks Freeze early layers, unfreeze later layers Early features are general; later features need adaptation
Large target dataset, similar tasks Freeze early layers or none Enough data to fine-tune without overfitting
Large target dataset, dissimilar tasks Freeze none (or only very early layers) The model needs substantial adaptation

Theoretical Basis

Catastrophic Forgetting

Without freezing, updating all layers with a small target dataset can cause catastrophic forgetting: the network rapidly overwrites the general features learned during pretraining with task-specific features that overfit to the small dataset. Freezing prevents this by keeping the base layers' weights at their pretrained values.

Parameter Reduction

Freezing layers directly reduces the number of trainable parameters:

  • A full MobileNet V1 has approximately 3.2 million parameters.
  • Freezing all but the last 5 layers might leave only a few hundred thousand trainable parameters.
  • Fewer trainable parameters means faster training, lower memory usage, and reduced overfitting risk.

Gradient Flow

When a layer is frozen, the backpropagation algorithm still passes gradients through the frozen layer (to reach any trainable layers before it in the graph), but it does not accumulate gradients for the frozen layer's own weights. This distinction is important: freezing does not block gradient flow to earlier trainable layers.

Two-Phase Training Strategy

A common and effective approach is two-phase training:

  1. Phase 1: Feature extraction -- Freeze all base model layers. Train only the new task head. Use a moderate learning rate (e.g., 0.001).
  2. Phase 2: Fine-tuning -- Unfreeze some of the later base model layers. Train the entire unfrozen portion with a very low learning rate (e.g., 0.00001) to gently adapt the pretrained features.

This two-phase approach prevents the large random gradients from the untrained task head from disrupting the pretrained weights during the initial training phase.

Usage

Layer freezing is applied in the following contexts:

  • Transfer learning -- Freezing all or most pretrained layers while training a new task head.
  • Progressive unfreezing -- Gradually unfreezing layers from top to bottom during training, allowing deeper layers to adapt slowly.
  • Multi-task learning -- Freezing shared layers while training task-specific branches.
  • Model distillation -- Freezing a teacher model while training a student model.

Practical Considerations

  • Batch normalization layers -- Frozen batch normalization layers use their stored running mean and variance (inference mode) rather than batch statistics. This is important for maintaining stable predictions.
  • Compile after freezing -- In some frameworks, the model must be recompiled after changing trainable flags so the optimizer is aware of the updated set of trainable weights. In TensorFlow.js, the compile method should be called after modifying trainable properties.
  • Verification -- Always verify the number of trainable parameters after freezing to confirm the expected behavior.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment