Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Fastai Fastbook DataBlock

From Leeroopedia


Knowledge Sources
Domains Computer_Vision, Data_Engineering
Last Updated 2026-02-09 17:00 GMT

Overview

Concrete tool for constructing a declarative data pipeline blueprint provided by fastai.data.block.DataBlock.

Description

The DataBlock class is fastai's mid-level API for defining data pipelines. It accepts a set of keyword arguments that answer the five key data questions (types, item retrieval, labeling, splitting, transforms) and produces a reusable blueprint object. The blueprint does not load or process any data until its dataloaders method is called with a source path.

DataBlock supports arbitrary combinations of input/output block types (ImageBlock, CategoryBlock, MultiCategoryBlock, BBoxBlock, etc.), making it applicable far beyond image classification.

Usage

Import DataBlock and its companion components from fastai.vision.all at the start of any fastbook notebook that trains a vision model. Configure it once per task, then call .dataloaders(path) to materialize the data.

Code Reference

Source Location

  • Repository: fastbook
  • File: translations/cn/02_production.md (lines 276-281), translations/cn/05_pet_breeds.md (lines 94-99)

Signature

DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=Resize(128),
    batch_tfms=aug_transforms(size=224, min_scale=0.75)
)

Import

from fastai.vision.all import (
    DataBlock, ImageBlock, CategoryBlock,
    get_image_files, RandomSplitter, parent_label,
    Resize, aug_transforms
)

I/O Contract

Inputs

Name Type Required Description
blocks tuple Yes Tuple of block types defining input and output types (e.g., (ImageBlock, CategoryBlock))
get_items callable Yes Function that takes a path and returns a list of items (e.g., get_image_files)
splitter callable No Splitter object that divides items into train/valid sets (default: RandomSplitter)
get_y callable Yes Function that extracts the label from each item (e.g., parent_label, RegexLabeller)
get_x callable No Function that extracts the input from each item (default: identity)
item_tfms Transform or list No Transforms applied to individual items on CPU (e.g., Resize(128))
batch_tfms Transform or list No Transforms applied to batches on GPU (e.g., aug_transforms(size=224))

Outputs

Name Type Description
datablock DataBlock A blueprint object that defines the complete data pipeline but holds no data

Usage Examples

Basic Usage: Bear Classifier (Chapter 2)

from fastai.vision.all import *
from pathlib import Path

path = Path('bears')

bears = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=Resize(128)
)

Advanced Usage: Pet Breeds with Presizing (Chapter 5)

from fastai.vision.all import *

path = untar_data(URLs.PETS) / 'images'

pets = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=using_attr(RegexLabeller(r'^(.+)_\d+.jpg$'), 'name'),
    item_tfms=Resize(460),
    batch_tfms=aug_transforms(size=224, min_scale=0.75)
)

Regex Labeling from Filename

from fastai.vision.all import *

# When labels are embedded in filenames like 'Persian_123.jpg'
pets = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=using_attr(RegexLabeller(r'^(.+)_\d+.jpg$'), 'name'),
    item_tfms=Resize(460),
    batch_tfms=aug_transforms(size=224, min_scale=0.75)
)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment