Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Tensorflow Tfjs Model Compilation

From Leeroopedia


Knowledge Sources
Domains Deep_Learning, Optimization
Last Updated 2026-02-10 00:00 GMT

Overview

Model compilation is the process of configuring a neural network for training by binding an optimizer, a loss function, and optional evaluation metrics to the model. It is the mandatory bridge between architecture definition and the training loop.

Description

After a neural network's architecture has been defined (layers stacked, weights initialized), the model exists as a computational graph that can perform forward passes but cannot be trained. Compilation transforms the model from a passive function approximator into an active learner by specifying three critical components:

  1. Optimizer — The algorithm that updates model weights based on computed gradients. Different optimizers implement different strategies for navigating the loss landscape (learning rate scheduling, momentum, adaptive per-parameter rates).
  2. Loss function — The differentiable objective function L(y, y_hat) that quantifies the discrepancy between the model's predictions y_hat and the true labels y. The loss function defines what the model is trying to minimize.
  3. Metrics — Optional functions that evaluate model performance during training and validation. Unlike the loss function, metrics do not need to be differentiable and are not used for gradient computation. They provide human-interpretable measures of model quality.

Compilation is a configuration step, not a computation step. It does not modify the model's weights or architecture. Instead, it stores references to the optimizer, loss, and metrics so that the subsequent training loop (fit()) knows how to compute gradients and update parameters.

Usage

Compilation must occur:

  • After architecture definition — all layers must be added before compiling.
  • Before training — calling fit() on an uncompiled model raises an error.
  • After any architecture change — if layers are added or modified, the model must be recompiled.

A model can be compiled multiple times. Recompiling changes the optimizer and/or loss without affecting the current weight values, which is useful for techniques like:

  • Switching optimizers mid-training (e.g., from Adam to SGD for fine-tuning).
  • Changing the loss function between training phases.
  • Adjusting the learning rate by recompiling with a new optimizer instance.

Theoretical Basis

The Optimizer

The optimizer implements the parameter update rule applied after each gradient computation. All optimizers follow the general form:

w_{t+1} = w_t - alpha * g(gradient_t, state_t)

where alpha is the learning rate and g is the optimizer-specific gradient transformation function.

Optimizer Update Rule Key Properties
SGD w = w - alpha * gradient Simplest; may oscillate in ravines
SGD + Momentum v = beta * v + gradient; w = w - alpha * v Accelerates convergence; dampens oscillation
RMSProp s = gamma * s + (1-gamma) * gradient^2; w = w - alpha * gradient / sqrt(s + epsilon) Adaptive per-parameter rates; good for RNNs
Adam Combines momentum (first moment) and RMSProp (second moment) with bias correction Default choice; works well across most tasks

The Loss Function

The loss function L(y, y_hat) must be differentiable with respect to y_hat so that gradients can be computed via backpropagation. Common loss functions:

Loss Function Formula Use Case
Mean Squared Error (1/n) * sum((y - y_hat)^2) Regression
Binary Crossentropy -mean(y * log(y_hat) + (1-y) * log(1-y_hat)) Binary classification
Categorical Crossentropy -sum(y * log(y_hat)) Multi-class classification (one-hot labels)
Sparse Categorical Crossentropy Same as above but with integer labels Multi-class classification (integer labels)

The choice of loss function must be consistent with the output layer activation:

  • Softmax output -> Categorical crossentropy
  • Sigmoid output -> Binary crossentropy
  • Linear output -> Mean squared error

Metrics

Metrics provide evaluation feedback but do not influence gradient computation. Common metrics include:

  • accuracy — Fraction of correct predictions.
  • precision / recall — For imbalanced classification tasks.
  • mse — Mean squared error reported as a metric (same formula as loss but tracked separately).

Metrics are computed on both training and validation data at the end of each epoch, enabling the practitioner to monitor for overfitting (training metric improves while validation metric degrades).

Compilation as Binding

Mathematically, compilation defines the optimization problem:

minimize_{w} E_{(x,y) ~ D} [ L(f_w(x), y) ]

where:

  • f_w is the model parameterized by weights w (defined during architecture specification)
  • L is the loss function (bound during compilation)
  • The minimization strategy is the optimizer (bound during compilation)
  • D is the data distribution (provided during training)

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment