Implementation:Fastai Fastbook DataBlock
| Knowledge Sources | |
|---|---|
| Domains | Computer_Vision, Data_Engineering |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
Concrete tool for constructing a declarative data pipeline blueprint provided by fastai.data.block.DataBlock.
Description
The DataBlock class is fastai's mid-level API for defining data pipelines. It accepts a set of keyword arguments that answer the five key data questions (types, item retrieval, labeling, splitting, transforms) and produces a reusable blueprint object. The blueprint does not load or process any data until its dataloaders method is called with a source path.
DataBlock supports arbitrary combinations of input/output block types (ImageBlock, CategoryBlock, MultiCategoryBlock, BBoxBlock, etc.), making it applicable far beyond image classification.
Usage
Import DataBlock and its companion components from fastai.vision.all at the start of any fastbook notebook that trains a vision model. Configure it once per task, then call .dataloaders(path) to materialize the data.
Code Reference
Source Location
- Repository: fastbook
- File: translations/cn/02_production.md (lines 276-281), translations/cn/05_pet_breeds.md (lines 94-99)
Signature
DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=parent_label,
item_tfms=Resize(128),
batch_tfms=aug_transforms(size=224, min_scale=0.75)
)
Import
from fastai.vision.all import (
DataBlock, ImageBlock, CategoryBlock,
get_image_files, RandomSplitter, parent_label,
Resize, aug_transforms
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| blocks | tuple | Yes | Tuple of block types defining input and output types (e.g., (ImageBlock, CategoryBlock)) |
| get_items | callable | Yes | Function that takes a path and returns a list of items (e.g., get_image_files) |
| splitter | callable | No | Splitter object that divides items into train/valid sets (default: RandomSplitter) |
| get_y | callable | Yes | Function that extracts the label from each item (e.g., parent_label, RegexLabeller) |
| get_x | callable | No | Function that extracts the input from each item (default: identity) |
| item_tfms | Transform or list | No | Transforms applied to individual items on CPU (e.g., Resize(128)) |
| batch_tfms | Transform or list | No | Transforms applied to batches on GPU (e.g., aug_transforms(size=224)) |
Outputs
| Name | Type | Description |
|---|---|---|
| datablock | DataBlock | A blueprint object that defines the complete data pipeline but holds no data |
Usage Examples
Basic Usage: Bear Classifier (Chapter 2)
from fastai.vision.all import *
from pathlib import Path
path = Path('bears')
bears = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=parent_label,
item_tfms=Resize(128)
)
Advanced Usage: Pet Breeds with Presizing (Chapter 5)
from fastai.vision.all import *
path = untar_data(URLs.PETS) / 'images'
pets = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=using_attr(RegexLabeller(r'^(.+)_\d+.jpg$'), 'name'),
item_tfms=Resize(460),
batch_tfms=aug_transforms(size=224, min_scale=0.75)
)
Regex Labeling from Filename
from fastai.vision.all import *
# When labels are embedded in filenames like 'Persian_123.jpg'
pets = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=using_attr(RegexLabeller(r'^(.+)_\d+.jpg$'), 'name'),
item_tfms=Resize(460),
batch_tfms=aug_transforms(size=224, min_scale=0.75)
)