Principle:Tensorflow Tfjs Define Model Architecture
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Neural_Networks |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Defining model architecture is the process of specifying the structure of a neural network by composing differentiable layers in a linear (sequential) topology, where each layer has exactly one input tensor and one output tensor.
Description
Before a neural network can be trained or used for inference, its architecture must be defined. Architecture definition answers the fundamental questions: how many layers does the network have, what type is each layer, how many parameters does each layer contain, and how do tensors flow from input to output?
The sequential model pattern is the simplest and most common form of neural network construction. In a sequential model, layers are stacked linearly: layer is output becomes layer i+1s input. There are no branches, skip connections, or multiple inputs/outputs. This topology covers a wide range of practical architectures including multilayer perceptrons (MLPs), simple convolutional networks, and autoencoders.
The sequential pattern involves two core operations:
- Creating an empty model container — An empty sequential model is instantiated as a container that will hold an ordered list of layers.
- Adding layers one by one — Layers are appended to the model in order. Each layer specifies its own configuration: the number of units (neurons), the activation function, whether to include a bias term, weight initialization strategy, and regularization. The first layer must additionally specify the input shape so the model knows the dimensionality of incoming data.
Usage
Use the sequential model pattern when:
- The network has a linear topology — each layer feeds directly into the next with no branching.
- Building feedforward networks such as MLPs for classification or regression.
- Prototyping simple architectures before moving to more complex functional or subclassed models.
- The model has a single input and single output.
Do not use the sequential pattern when:
- The model requires skip connections (e.g., ResNet).
- The model has multiple inputs or outputs (e.g., multi-task learning).
- Layers need to be shared across different parts of the graph.
Theoretical Basis
Sequential Composition
A sequential model with n layers computes a function:
f(x) = f_n(f_{n-1}(...f_2(f_1(x))))
where each f_i is a differentiable layer function. This is a direct application of function composition. The key mathematical property that enables training is that the composition of differentiable functions is itself differentiable, allowing gradients to flow backward through the chain via the chain rule.
Layer Configuration Parameters
Each layer in a sequential model is configured by several parameters:
| Parameter | Description | Example Values |
|---|---|---|
| units | Number of neurons (output dimensionality) | 32, 64, 128, 256, 512 |
| activation | Non-linear function applied to layer output | 'relu', 'sigmoid', 'softmax', 'tanh' |
| inputShape | Shape of input tensor (first layer only) | [784], [28, 28, 1] |
| useBias | Whether to include a bias vector | true (default), false |
| kernelInitializer | Strategy for initializing weight matrices | 'glorotUniform' (default), 'heNormal' |
| kernelRegularizer | Penalty applied to weight magnitudes during training | 'l1', 'l2', 'l1l2' |
Input Shape Inference
Only the first layer in a sequential model requires an explicit inputShape. All subsequent layers infer their input shape automatically from the output shape of the preceding layer. This is possible because the sequential topology guarantees a one-to-one mapping between consecutive layers.
For a dense (fully connected) layer with m input features and n units:
- Weight matrix shape: [m, n]
- Bias vector shape: [n]
- Output shape: [batchSize, n]
- Total trainable parameters: m * n + n (weights + biases)
Activation Functions
Activation functions introduce non-linearity into the network. Without them, a multi-layer network would collapse into a single linear transformation. Common choices:
| Activation | Formula | Typical Use |
|---|---|---|
| ReLU | max(0, x) | Hidden layers (default choice) |
| Sigmoid | 1 / (1 + e^(-x)) | Binary classification output |
| Softmax | e^(x_i) / sum(e^(x_j)) | Multi-class classification output |
| Tanh | (e^x - e^(-x)) / (e^x + e^(-x)) | Hidden layers (centered output) |
Architecture Design Heuristics
Common patterns for sequential architectures:
- Funnel shape: Layers progressively decrease in width (e.g., 784 -> 128 -> 64 -> 10), compressing representations toward the output.
- Output layer activation: Matches the task — 'softmax' for multi-class classification, 'sigmoid' for binary classification, linear (no activation) for regression.
- First layer inputShape: Must exactly match the dimensionality of one training sample (excluding the batch dimension).