Principle:Fastai Fastbook Tensor Fundamentals

Knowledge Sources	Deep Learning for Coders with fastai and PyTorch Numerical Python (NumPy) Documentation
Domains	Deep Learning, Data Representation, Computer Vision
Last Updated	2026-02-09 17:00 GMT

Overview

A tensor is a multi-dimensional array of numerical values that serves as the fundamental data structure for all computation in deep learning, generalizing scalars (rank 0), vectors (rank 1), and matrices (rank 2) to arbitrary dimensions.

Description

Before a neural network can process any real-world data, that data must be converted into tensors. In the case of images, each pixel intensity becomes a numerical value inside a tensor. The critical insight is that all deep learning operations, from simple addition to complex gradient computation, operate on tensors. Understanding tensor rank, shape, and element types is prerequisite to every subsequent step in building a neural network.

A rank-3 tensor of shape (N, H, W) is the natural representation for a dataset of N grayscale images, each of height H and width W. Stacking individual image tensors along a new first axis produces this rank-3 tensor and enables vectorized operations across the entire dataset simultaneously.

Usage

Use tensor representation whenever you need to:

Convert raw data (images, text, tabular records) into a form suitable for mathematical operations.
Aggregate multiple data samples into a single batch for efficient parallel computation.
Compute summary statistics (mean, standard deviation) across a specific axis of the data.
Normalize values into a standard range (e.g., dividing pixel values by 255 to obtain floats in [0, 1]).

Theoretical Basis

Tensor Rank and Shape

The rank (also called ndim) of a tensor is the number of axes it possesses:

Rank 0 (scalar): A single number, e.g., 42.0
Rank 1 (vector): A one-dimensional sequence, shape (n,)
Rank 2 (matrix): A two-dimensional grid, shape (rows, cols)
Rank 3: A three-dimensional block, shape (depth, rows, cols)

The shape is a tuple giving the size along each axis. For example, a tensor of shape (6131, 28, 28) contains 6,131 images, each 28 pixels high and 28 pixels wide.

Stacking

Given a list of N tensors, each of shape (H, W), the stacking operation produces a single tensor of shape (N, H, W):

stack([t_1, t_2, ..., t_N]) -> T  where T.shape = (N, H, W)

Mean Along an Axis

Computing the mean along axis 0 of a rank-3 tensor T of shape (N, H, W) collapses the first dimension and produces a rank-2 tensor of shape (H, W):

mean_image[i, j] = (1/N) * sum(T[k, i, j] for k in 0..N-1)

This yields the "average" or "ideal" image, where each pixel represents the central tendency of that position across all samples in the class.

Type Conversion and Normalization

Raw image pixels are typically stored as unsigned 8-bit integers (0 to 255). For gradient-based learning, they must be converted to floating-point values and normalized to the range [0, 1]:

T_float = T.to_float() / 255.0

This ensures numerical stability and consistent scale across features.

Related Pages

Implemented By

Implementation:Fastai_Fastbook_Tensor_Data_Pipeline

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment