Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Tensorflow Tfjs Define Model Architecture

From Leeroopedia


Knowledge Sources
Domains Deep_Learning, Neural_Networks
Last Updated 2026-02-10 00:00 GMT

Overview

Defining model architecture is the process of specifying the structure of a neural network by composing differentiable layers in a linear (sequential) topology, where each layer has exactly one input tensor and one output tensor.

Description

Before a neural network can be trained or used for inference, its architecture must be defined. Architecture definition answers the fundamental questions: how many layers does the network have, what type is each layer, how many parameters does each layer contain, and how do tensors flow from input to output?

The sequential model pattern is the simplest and most common form of neural network construction. In a sequential model, layers are stacked linearly: layer is output becomes layer i+1s input. There are no branches, skip connections, or multiple inputs/outputs. This topology covers a wide range of practical architectures including multilayer perceptrons (MLPs), simple convolutional networks, and autoencoders.

The sequential pattern involves two core operations:

  1. Creating an empty model container — An empty sequential model is instantiated as a container that will hold an ordered list of layers.
  2. Adding layers one by one — Layers are appended to the model in order. Each layer specifies its own configuration: the number of units (neurons), the activation function, whether to include a bias term, weight initialization strategy, and regularization. The first layer must additionally specify the input shape so the model knows the dimensionality of incoming data.

Usage

Use the sequential model pattern when:

  • The network has a linear topology — each layer feeds directly into the next with no branching.
  • Building feedforward networks such as MLPs for classification or regression.
  • Prototyping simple architectures before moving to more complex functional or subclassed models.
  • The model has a single input and single output.

Do not use the sequential pattern when:

  • The model requires skip connections (e.g., ResNet).
  • The model has multiple inputs or outputs (e.g., multi-task learning).
  • Layers need to be shared across different parts of the graph.

Theoretical Basis

Sequential Composition

A sequential model with n layers computes a function:

f(x) = f_n(f_{n-1}(...f_2(f_1(x))))

where each f_i is a differentiable layer function. This is a direct application of function composition. The key mathematical property that enables training is that the composition of differentiable functions is itself differentiable, allowing gradients to flow backward through the chain via the chain rule.

Layer Configuration Parameters

Each layer in a sequential model is configured by several parameters:

Parameter Description Example Values
units Number of neurons (output dimensionality) 32, 64, 128, 256, 512
activation Non-linear function applied to layer output 'relu', 'sigmoid', 'softmax', 'tanh'
inputShape Shape of input tensor (first layer only) [784], [28, 28, 1]
useBias Whether to include a bias vector true (default), false
kernelInitializer Strategy for initializing weight matrices 'glorotUniform' (default), 'heNormal'
kernelRegularizer Penalty applied to weight magnitudes during training 'l1', 'l2', 'l1l2'

Input Shape Inference

Only the first layer in a sequential model requires an explicit inputShape. All subsequent layers infer their input shape automatically from the output shape of the preceding layer. This is possible because the sequential topology guarantees a one-to-one mapping between consecutive layers.

For a dense (fully connected) layer with m input features and n units:

  • Weight matrix shape: [m, n]
  • Bias vector shape: [n]
  • Output shape: [batchSize, n]
  • Total trainable parameters: m * n + n (weights + biases)

Activation Functions

Activation functions introduce non-linearity into the network. Without them, a multi-layer network would collapse into a single linear transformation. Common choices:

Activation Formula Typical Use
ReLU max(0, x) Hidden layers (default choice)
Sigmoid 1 / (1 + e^(-x)) Binary classification output
Softmax e^(x_i) / sum(e^(x_j)) Multi-class classification output
Tanh (e^x - e^(-x)) / (e^x + e^(-x)) Hidden layers (centered output)

Architecture Design Heuristics

Common patterns for sequential architectures:

  • Funnel shape: Layers progressively decrease in width (e.g., 784 -> 128 -> 64 -> 10), compressing representations toward the output.
  • Output layer activation: Matches the task — 'softmax' for multi-class classification, 'sigmoid' for binary classification, linear (no activation) for regression.
  • First layer inputShape: Must exactly match the dimensionality of one training sample (excluding the batch dimension).

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment