Principle:LaurentMazare Tch rs Linear Layer
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Neural_Network_Layers |
| Last Updated | 2026-02-08 14:00 GMT |
Overview
Fully-connected layer that applies an affine transformation to input data, mapping from one dimensionality to another.
Description
A linear (fully-connected) layer computes y = xW^T + b where W is a weight matrix of shape [out_dim, in_dim] and b is an optional bias vector of shape [out_dim]. It is the most fundamental building block in neural networks, used for dimensionality changes, classification heads, and feature projections. Weight initialization defaults to Kaiming uniform, while bias is initialized uniformly scaled by 1/sqrt(in_dim).
Usage
Use this principle whenever you need to transform feature vectors between dimensions: input-to-hidden, hidden-to-hidden, or hidden-to-output projections. Essential for classifier heads, MLP blocks, and any dense connection between layers.
Theoretical Basis
Where:
- x: Input tensor of shape [batch_size, in_dim]
- W: Weight matrix of shape [out_dim, in_dim]
- b: Optional bias vector of shape [out_dim]
- y: Output tensor of shape [batch_size, out_dim]
Default initialization:
- Weight: Kaiming uniform — Failed to parse (syntax error): {\displaystyle U(-\sqrt{1/\text{in\_dim}}, \sqrt{1/\text{in\_dim}})}
- Bias: Uniform — Failed to parse (syntax error): {\displaystyle U(-\sqrt{1/\text{in\_dim}}, \sqrt{1/\text{in\_dim}})}