Implementation:Fastai Fastbook TabularPandas

Knowledge Sources	fastbook fastai docs
Domains	Tabular Data, Data Preprocessing
Last Updated	2026-02-09 17:00 GMT

Overview

Concrete tool for tabular data preprocessing and train/validation splitting provided by fastai. It wraps a pandas DataFrame and applies a pipeline of TabularProc transforms (Categorify, FillMissing, Normalize) in place.

Description

TabularPandas is a fastai class that wraps a pandas DataFrame and provides a convenient interface for applying preprocessing transforms, splitting into training and validation sets, and accessing features and targets. It differs from standard scikit-learn transforms in two key ways: (1) transforms modify the DataFrame in place rather than returning a new object, and (2) transforms run eagerly at construction time rather than lazily at access time.

The companion function cont_cat_split automatically classifies columns as continuous or categorical based on their cardinality (number of unique values). Columns with fewer unique values than max_card are treated as categorical; the rest are continuous.

Usage

Use TabularPandas after feature engineering (e.g., after add_datepart) and before model training. It is the central data object for both tree-based and deep learning tabular workflows in fastai. For neural networks, add Normalize to the processor list. For tree-based models, omit it.

Code Reference

Source Location

Repository: fastbook
File: translations/cn/09_tabular.md (Lines 346-377)
Library source: fastai.tabular.core

Signature

# Automatic column classification
cont_cat_split(df, max_card=20, dep_var=None)

# Main preprocessing wrapper
TabularPandas(df, procs=None, cat_names=None, cont_names=None,
              y_names=None, y_block=None, splits=None, do_setup=True,
              inplace=False, reduce_memory=True)

Import

from fastai.tabular.all import (TabularPandas, Categorify, FillMissing,
                                 Normalize, cont_cat_split)

I/O Contract

Inputs

Name	Type	Required	Description
df	pandas.DataFrame	Yes	The DataFrame to preprocess. Must already have date features extracted and ordinal categories set.
procs	list of TabularProc classes	Yes	List of preprocessing transforms to apply. Common choices: `[Categorify, FillMissing]` for trees, `[Categorify, FillMissing, Normalize]` for neural nets.
cat_names	list of str	Yes	Names of categorical columns. Can be obtained from `cont_cat_split`.
cont_names	list of str	Yes	Names of continuous columns. Can be obtained from `cont_cat_split`.
y_names	str or list of str	Yes	Name(s) of the dependent variable column(s).
splits	tuple of (list, list)	No	Tuple of (train_indices, valid_indices). If None, uses a random split.
max_card (for cont_cat_split)	int	No	Maximum cardinality threshold. Columns with fewer unique values than this are treated as categorical. Default 20.
dep_var (for cont_cat_split)	str	No	Name of dependent variable to exclude from the feature classification.

Outputs

Name	Type	Description
to.train	TabularPandas subset	Training split with `.xs` (features DataFrame) and `.y` (target Series) attributes.
to.valid	TabularPandas subset	Validation split with `.xs` (features DataFrame) and `.y` (target Series) attributes.
to.classes	dict	Dictionary mapping categorical column names to their category lists (including `#na#` for missing).
to.items	pandas.DataFrame	The fully transformed underlying DataFrame.
to.dataloaders(bs)	DataLoaders	Creates PyTorch DataLoaders for neural network training.

Usage Examples

Basic Usage

from fastai.tabular.all import *
import pandas as pd
import numpy as np

# Assume df is already loaded and add_datepart has been applied
dep_var = 'SalePrice'

# Define processors for tree-based models
procs = [Categorify, FillMissing]

# Create time-based split: train on data before Oct 2011, validate after
cond = (df.saleYear < 2011) | (df.saleMonth < 10)
train_idx = np.where( cond)[0]
valid_idx = np.where(~cond)[0]
splits = (list(train_idx), list(valid_idx))

# Automatically classify columns as continuous or categorical
cont, cat = cont_cat_split(df, max_card=1, dep_var=dep_var)

# Create the TabularPandas object
to = TabularPandas(df, procs, cat, cont, y_names=dep_var, splits=splits)

# Access training and validation data
xs, y = to.train.xs, to.train.y
valid_xs, valid_y = to.valid.xs, to.valid.y

print(f"Training:   {len(xs)} rows, {len(xs.columns)} features")
print(f"Validation: {len(valid_xs)} rows")

For Neural Networks (with Normalize)

# Add Normalize for neural network preprocessing
procs_nn = [Categorify, FillMissing, Normalize]
cont_nn, cat_nn = cont_cat_split(df_nn_final, max_card=9000, dep_var=dep_var)

to_nn = TabularPandas(df_nn_final, procs_nn, cat_nn, cont_nn,
                      splits=splits, y_names=dep_var)

# Create DataLoaders with large batch size for tabular data
dls = to_nn.dataloaders(1024)

Related Pages

Implements Principle

Principle:Fastai_Fastbook_Tabular_Preprocessing

Requires Environment

Environment:Fastai_Fastbook_Python_FastAI_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment