Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Fastai Fastbook Tabular Learner

From Leeroopedia


Knowledge Sources
Domains Deep Learning, Tabular Data
Last Updated 2026-02-09 17:00 GMT

Overview

Concrete tool for training deep learning tabular models with entity embeddings, provided by fastai. It creates a neural network that automatically handles categorical variables via embedding layers and continuous variables via normalization.

Description

tabular_learner is a fastai factory function that creates a Learner wrapping a TabularModel neural network. The network architecture is determined by the TabularPandas data object: each categorical column gets an embedding layer whose dimension is automatically computed based on cardinality, and continuous columns are concatenated after normalization. The resulting input vector is passed through user-specified fully connected layers (default: [200, 100]) with batch normalization and dropout.

In the fastbook chapter, tabular_learner is configured with layers=[500, 250] for the larger Bulldozers dataset, y_range=(8, 12) to constrain log-price predictions, n_out=1 for single-value regression, and loss_func=F.mse_loss to match the competition metric.

Usage

Use tabular_learner after creating a TabularPandas object with Normalize in the processor list and converting it to DataLoaders. The typical workflow is: create DataLoaders, build the learner, find the learning rate, and train with fit_one_cycle.

Code Reference

Source Location

  • Repository: fastbook
  • File: translations/cn/09_tabular.md (Lines 1327-1352)
  • Library source: fastai.tabular.learner

Signature

# Create DataLoaders from TabularPandas
dls = to_nn.dataloaders(bs=1024)

# Create the tabular learner
tabular_learner(
    dls,               # DataLoaders object
    y_range=(8, 12),   # Output range for sigmoid clamping
    layers=[500, 250], # Hidden layer sizes
    n_out=1,           # Number of output values
    loss_func=F.mse_loss,  # Loss function
    # Additional optional parameters:
    # emb_szs=None,    # Override embedding sizes (dict or list of tuples)
    # config=None,     # TabularModel configuration
    # ps=None,         # Dropout probabilities for each layer
    # embed_p=0.0,     # Embedding dropout
    # y_range=None,    # Tuple of (min, max) for output sigmoid
    # metrics=None,    # Metrics to track during training
)

Import

from fastai.tabular.all import *
import torch.nn.functional as F

I/O Contract

Inputs

Name Type Required Description
dls DataLoaders Yes Created from TabularPandas.dataloaders(bs). Contains training and validation DataLoaders with properly preprocessed tabular batches.
layers list of int No Sizes of hidden fully connected layers. Default [200, 100]. The fastbook chapter uses [500, 250] for larger datasets.
y_range tuple of (float, float) No Min and max values for output sigmoid. Should slightly exceed the actual target range. E.g., (8, 12) for log prices in range [8.5, 11.9].
n_out int No Number of output values. Default 1 for regression.
loss_func callable No Loss function. Default depends on task; set to F.mse_loss for MSE regression.
emb_szs dict or list No Override automatic embedding dimension computation. Dict maps column names to sizes.
ps list of float No Dropout probabilities for each hidden layer. Defaults to fastai heuristics.
embed_p float No Dropout applied to the concatenated embedding vector. Default 0.0.
metrics list No Metrics tracked during training (e.g., [rmse]).

Outputs

Name Type Description
learn Learner A fastai Learner wrapping a TabularModel. Supports .lr_find(), .fit_one_cycle(), .get_preds(), .save(), .load().
learn.model TabularModel The underlying PyTorch module containing embedding layers, batch normalization, linear layers, and the output head.
preds, targs (from get_preds) tuple of Tensors Predictions and targets on the validation set, used for RMSE computation.

Usage Examples

Basic Usage

from fastai.tabular.all import *
import torch.nn.functional as F

# Prerequisites: df_nn_final is prepared, splits are defined
dep_var = 'SalePrice'
procs_nn = [Categorify, FillMissing, Normalize]
cont_nn, cat_nn = cont_cat_split(df_nn_final, max_card=9000, dep_var=dep_var)

# Ensure saleElapsed is treated as continuous for extrapolation
if 'saleElapsed' in cat_nn:
    cont_nn.append('saleElapsed')
    cat_nn.remove('saleElapsed')

# Create TabularPandas with Normalize
to_nn = TabularPandas(df_nn_final, procs_nn, cat_nn, cont_nn,
                      splits=splits, y_names=dep_var)

# Create DataLoaders with large batch size (tabular data uses less GPU memory)
dls = to_nn.dataloaders(1024)

# Build the tabular learner
learn = tabular_learner(dls, y_range=(8, 12), layers=[500, 250],
                        n_out=1, loss_func=F.mse_loss)

# Find optimal learning rate
learn.lr_find()

# Train for 5 epochs with 1-cycle policy
learn.fit_one_cycle(5, 1e-2)

# Evaluate on validation set
preds, targs = learn.get_preds()
rmse = round(math.sqrt(((preds.squeeze() - targs)**2).mean().item()), 4)
print(f"Validation RMSE: {rmse}")  # ~0.2258

# Save the model for later use or ensembling
learn.save('nn')

Inspecting Embeddings

# Access learned embedding matrices
for name, emb in learn.model.embds:
    print(f"{name}: {emb.weight.shape}")
    # e.g., "YearMade: torch.Size([74, 18])"
    # 74 categories mapped to 18-dimensional embedding vectors

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment