Implementation:Fastai Fastbook Tabular Learner
| Knowledge Sources | |
|---|---|
| Domains | Deep Learning, Tabular Data |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
Concrete tool for training deep learning tabular models with entity embeddings, provided by fastai. It creates a neural network that automatically handles categorical variables via embedding layers and continuous variables via normalization.
Description
tabular_learner is a fastai factory function that creates a Learner wrapping a TabularModel neural network. The network architecture is determined by the TabularPandas data object: each categorical column gets an embedding layer whose dimension is automatically computed based on cardinality, and continuous columns are concatenated after normalization. The resulting input vector is passed through user-specified fully connected layers (default: [200, 100]) with batch normalization and dropout.
In the fastbook chapter, tabular_learner is configured with layers=[500, 250] for the larger Bulldozers dataset, y_range=(8, 12) to constrain log-price predictions, n_out=1 for single-value regression, and loss_func=F.mse_loss to match the competition metric.
Usage
Use tabular_learner after creating a TabularPandas object with Normalize in the processor list and converting it to DataLoaders. The typical workflow is: create DataLoaders, build the learner, find the learning rate, and train with fit_one_cycle.
Code Reference
Source Location
- Repository: fastbook
- File: translations/cn/09_tabular.md (Lines 1327-1352)
- Library source: fastai.tabular.learner
Signature
# Create DataLoaders from TabularPandas
dls = to_nn.dataloaders(bs=1024)
# Create the tabular learner
tabular_learner(
dls, # DataLoaders object
y_range=(8, 12), # Output range for sigmoid clamping
layers=[500, 250], # Hidden layer sizes
n_out=1, # Number of output values
loss_func=F.mse_loss, # Loss function
# Additional optional parameters:
# emb_szs=None, # Override embedding sizes (dict or list of tuples)
# config=None, # TabularModel configuration
# ps=None, # Dropout probabilities for each layer
# embed_p=0.0, # Embedding dropout
# y_range=None, # Tuple of (min, max) for output sigmoid
# metrics=None, # Metrics to track during training
)
Import
from fastai.tabular.all import *
import torch.nn.functional as F
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| dls | DataLoaders | Yes | Created from TabularPandas.dataloaders(bs). Contains training and validation DataLoaders with properly preprocessed tabular batches.
|
| layers | list of int | No | Sizes of hidden fully connected layers. Default [200, 100]. The fastbook chapter uses [500, 250] for larger datasets.
|
| y_range | tuple of (float, float) | No | Min and max values for output sigmoid. Should slightly exceed the actual target range. E.g., (8, 12) for log prices in range [8.5, 11.9].
|
| n_out | int | No | Number of output values. Default 1 for regression. |
| loss_func | callable | No | Loss function. Default depends on task; set to F.mse_loss for MSE regression.
|
| emb_szs | dict or list | No | Override automatic embedding dimension computation. Dict maps column names to sizes. |
| ps | list of float | No | Dropout probabilities for each hidden layer. Defaults to fastai heuristics. |
| embed_p | float | No | Dropout applied to the concatenated embedding vector. Default 0.0. |
| metrics | list | No | Metrics tracked during training (e.g., [rmse]).
|
Outputs
| Name | Type | Description |
|---|---|---|
| learn | Learner | A fastai Learner wrapping a TabularModel. Supports .lr_find(), .fit_one_cycle(), .get_preds(), .save(), .load().
|
| learn.model | TabularModel | The underlying PyTorch module containing embedding layers, batch normalization, linear layers, and the output head. |
| preds, targs (from get_preds) | tuple of Tensors | Predictions and targets on the validation set, used for RMSE computation. |
Usage Examples
Basic Usage
from fastai.tabular.all import *
import torch.nn.functional as F
# Prerequisites: df_nn_final is prepared, splits are defined
dep_var = 'SalePrice'
procs_nn = [Categorify, FillMissing, Normalize]
cont_nn, cat_nn = cont_cat_split(df_nn_final, max_card=9000, dep_var=dep_var)
# Ensure saleElapsed is treated as continuous for extrapolation
if 'saleElapsed' in cat_nn:
cont_nn.append('saleElapsed')
cat_nn.remove('saleElapsed')
# Create TabularPandas with Normalize
to_nn = TabularPandas(df_nn_final, procs_nn, cat_nn, cont_nn,
splits=splits, y_names=dep_var)
# Create DataLoaders with large batch size (tabular data uses less GPU memory)
dls = to_nn.dataloaders(1024)
# Build the tabular learner
learn = tabular_learner(dls, y_range=(8, 12), layers=[500, 250],
n_out=1, loss_func=F.mse_loss)
# Find optimal learning rate
learn.lr_find()
# Train for 5 epochs with 1-cycle policy
learn.fit_one_cycle(5, 1e-2)
# Evaluate on validation set
preds, targs = learn.get_preds()
rmse = round(math.sqrt(((preds.squeeze() - targs)**2).mean().item()), 4)
print(f"Validation RMSE: {rmse}") # ~0.2258
# Save the model for later use or ensembling
learn.save('nn')
Inspecting Embeddings
# Access learned embedding matrices
for name, emb in learn.model.embds:
print(f"{name}: {emb.weight.shape}")
# e.g., "YearMade: torch.Size([74, 18])"
# 74 categories mapped to 18-dimensional embedding vectors
Related Pages
Implements Principle
Requires Environment
- Environment:Fastai_Fastbook_Python_FastAI_Environment
- Environment:Fastai_Fastbook_CUDA_GPU_Environment