Implementation:Fastai Fastbook Tabular Learner

Knowledge Sources	fastbook fastai docs
Domains	Deep Learning, Tabular Data
Last Updated	2026-02-09 17:00 GMT

Overview

Concrete tool for training deep learning tabular models with entity embeddings, provided by fastai. It creates a neural network that automatically handles categorical variables via embedding layers and continuous variables via normalization.

Description

tabular_learner is a fastai factory function that creates a Learner wrapping a TabularModel neural network. The network architecture is determined by the TabularPandas data object: each categorical column gets an embedding layer whose dimension is automatically computed based on cardinality, and continuous columns are concatenated after normalization. The resulting input vector is passed through user-specified fully connected layers (default: [200, 100]) with batch normalization and dropout.

In the fastbook chapter, tabular_learner is configured with layers=[500, 250] for the larger Bulldozers dataset, y_range=(8, 12) to constrain log-price predictions, n_out=1 for single-value regression, and loss_func=F.mse_loss to match the competition metric.

Usage

Use tabular_learner after creating a TabularPandas object with Normalize in the processor list and converting it to DataLoaders. The typical workflow is: create DataLoaders, build the learner, find the learning rate, and train with fit_one_cycle.

Code Reference

Source Location

Repository: fastbook
File: translations/cn/09_tabular.md (Lines 1327-1352)
Library source: fastai.tabular.learner

Signature

# Create DataLoaders from TabularPandas
dls = to_nn.dataloaders(bs=1024)

# Create the tabular learner
tabular_learner(
    dls,               # DataLoaders object
    y_range=(8, 12),   # Output range for sigmoid clamping
    layers=[500, 250], # Hidden layer sizes
    n_out=1,           # Number of output values
    loss_func=F.mse_loss,  # Loss function
    # Additional optional parameters:
    # emb_szs=None,    # Override embedding sizes (dict or list of tuples)
    # config=None,     # TabularModel configuration
    # ps=None,         # Dropout probabilities for each layer
    # embed_p=0.0,     # Embedding dropout
    # y_range=None,    # Tuple of (min, max) for output sigmoid
    # metrics=None,    # Metrics to track during training
)

Import

from fastai.tabular.all import *
import torch.nn.functional as F

I/O Contract

Inputs

Name	Type	Required	Description
dls	DataLoaders	Yes	Created from `TabularPandas.dataloaders(bs)`. Contains training and validation DataLoaders with properly preprocessed tabular batches.
layers	list of int	No	Sizes of hidden fully connected layers. Default `[200, 100]`. The fastbook chapter uses `[500, 250]` for larger datasets.
y_range	tuple of (float, float)	No	Min and max values for output sigmoid. Should slightly exceed the actual target range. E.g., `(8, 12)` for log prices in range [8.5, 11.9].
n_out	int	No	Number of output values. Default 1 for regression.
loss_func	callable	No	Loss function. Default depends on task; set to `F.mse_loss` for MSE regression.
emb_szs	dict or list	No	Override automatic embedding dimension computation. Dict maps column names to sizes.
ps	list of float	No	Dropout probabilities for each hidden layer. Defaults to fastai heuristics.
embed_p	float	No	Dropout applied to the concatenated embedding vector. Default 0.0.
metrics	list	No	Metrics tracked during training (e.g., `[rmse]`).

Outputs

Name	Type	Description
learn	Learner	A fastai Learner wrapping a TabularModel. Supports `.lr_find()`, `.fit_one_cycle()`, `.get_preds()`, `.save()`, `.load()`.
learn.model	TabularModel	The underlying PyTorch module containing embedding layers, batch normalization, linear layers, and the output head.
preds, targs (from get_preds)	tuple of Tensors	Predictions and targets on the validation set, used for RMSE computation.

Usage Examples

Basic Usage

from fastai.tabular.all import *
import torch.nn.functional as F

# Prerequisites: df_nn_final is prepared, splits are defined
dep_var = 'SalePrice'
procs_nn = [Categorify, FillMissing, Normalize]
cont_nn, cat_nn = cont_cat_split(df_nn_final, max_card=9000, dep_var=dep_var)

# Ensure saleElapsed is treated as continuous for extrapolation
if 'saleElapsed' in cat_nn:
    cont_nn.append('saleElapsed')
    cat_nn.remove('saleElapsed')

# Create TabularPandas with Normalize
to_nn = TabularPandas(df_nn_final, procs_nn, cat_nn, cont_nn,
                      splits=splits, y_names=dep_var)

# Create DataLoaders with large batch size (tabular data uses less GPU memory)
dls = to_nn.dataloaders(1024)

# Build the tabular learner
learn = tabular_learner(dls, y_range=(8, 12), layers=[500, 250],
                        n_out=1, loss_func=F.mse_loss)

# Find optimal learning rate
learn.lr_find()

# Train for 5 epochs with 1-cycle policy
learn.fit_one_cycle(5, 1e-2)

# Evaluate on validation set
preds, targs = learn.get_preds()
rmse = round(math.sqrt(((preds.squeeze() - targs)**2).mean().item()), 4)
print(f"Validation RMSE: {rmse}")  # ~0.2258

# Save the model for later use or ensembling
learn.save('nn')

Inspecting Embeddings

# Access learned embedding matrices
for name, emb in learn.model.embds:
    print(f"{name}: {emb.weight.shape}")
    # e.g., "YearMade: torch.Size([74, 18])"
    # 74 categories mapped to 18-dimensional embedding vectors

Related Pages

Implements Principle

Principle:Fastai_Fastbook_Entity_Embeddings

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment