Principle:Fastai Fastbook Tensor Fundamentals
| Knowledge Sources | |
|---|---|
| Domains | Deep Learning, Data Representation, Computer Vision |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
A tensor is a multi-dimensional array of numerical values that serves as the fundamental data structure for all computation in deep learning, generalizing scalars (rank 0), vectors (rank 1), and matrices (rank 2) to arbitrary dimensions.
Description
Before a neural network can process any real-world data, that data must be converted into tensors. In the case of images, each pixel intensity becomes a numerical value inside a tensor. The critical insight is that all deep learning operations, from simple addition to complex gradient computation, operate on tensors. Understanding tensor rank, shape, and element types is prerequisite to every subsequent step in building a neural network.
A rank-3 tensor of shape (N, H, W) is the natural representation for a dataset of N grayscale images, each of height H and width W. Stacking individual image tensors along a new first axis produces this rank-3 tensor and enables vectorized operations across the entire dataset simultaneously.
Usage
Use tensor representation whenever you need to:
- Convert raw data (images, text, tabular records) into a form suitable for mathematical operations.
- Aggregate multiple data samples into a single batch for efficient parallel computation.
- Compute summary statistics (mean, standard deviation) across a specific axis of the data.
- Normalize values into a standard range (e.g., dividing pixel values by 255 to obtain floats in [0, 1]).
Theoretical Basis
Tensor Rank and Shape
The rank (also called ndim) of a tensor is the number of axes it possesses:
- Rank 0 (scalar): A single number, e.g.,
42.0 - Rank 1 (vector): A one-dimensional sequence, shape
(n,) - Rank 2 (matrix): A two-dimensional grid, shape
(rows, cols) - Rank 3: A three-dimensional block, shape
(depth, rows, cols)
The shape is a tuple giving the size along each axis. For example, a tensor of shape (6131, 28, 28) contains 6,131 images, each 28 pixels high and 28 pixels wide.
Stacking
Given a list of N tensors, each of shape (H, W), the stacking operation produces a single tensor of shape (N, H, W):
stack([t_1, t_2, ..., t_N]) -> T where T.shape = (N, H, W)
Mean Along an Axis
Computing the mean along axis 0 of a rank-3 tensor T of shape (N, H, W) collapses the first dimension and produces a rank-2 tensor of shape (H, W):
mean_image[i, j] = (1/N) * sum(T[k, i, j] for k in 0..N-1)
This yields the "average" or "ideal" image, where each pixel represents the central tendency of that position across all samples in the class.
Type Conversion and Normalization
Raw image pixels are typically stored as unsigned 8-bit integers (0 to 255). For gradient-based learning, they must be converted to floating-point values and normalized to the range [0, 1]:
T_float = T.to_float() / 255.0
This ensures numerical stability and consistent scale across features.