Principle:LaurentMazare Tch rs Linear Layer

Knowledge Sources	tch-rs
Domains	Deep_Learning, Neural_Network_Layers
Last Updated	2026-02-08 14:00 GMT

Overview

Fully-connected layer that applies an affine transformation to input data, mapping from one dimensionality to another.

Description

A linear (fully-connected) layer computes y = xW^T + b where W is a weight matrix of shape [out_dim, in_dim] and b is an optional bias vector of shape [out_dim]. It is the most fundamental building block in neural networks, used for dimensionality changes, classification heads, and feature projections. Weight initialization defaults to Kaiming uniform, while bias is initialized uniformly scaled by 1/sqrt(in_dim).

Usage

Use this principle whenever you need to transform feature vectors between dimensions: input-to-hidden, hidden-to-hidden, or hidden-to-output projections. Essential for classifier heads, MLP blocks, and any dense connection between layers.

Theoretical Basis

$y = x W^{T} + b$

Where:

x: Input tensor of shape [batch_size, in_dim]
W: Weight matrix of shape [out_dim, in_dim]
b: Optional bias vector of shape [out_dim]
y: Output tensor of shape [batch_size, out_dim]

Default initialization:

Weight: Kaiming uniform — Failed to parse (syntax error): {\displaystyle U(-\sqrt{1/\text{in\_dim}}, \sqrt{1/\text{in\_dim}})}
Bias: Uniform — Failed to parse (syntax error): {\displaystyle U(-\sqrt{1/\text{in\_dim}}, \sqrt{1/\text{in\_dim}})}

Related Pages

Implemented By

Implementation:LaurentMazare_Tch_rs_Nn_Linear

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment