Implementation:Fastai Fastbook Search Images
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Computer_Vision, Data_Engineering |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
Concrete tools for collecting, downloading, and verifying training images provided by the fastbook utils.py module and the fastai.vision.utils package.
Description
The fastbook repository ships four utility functions that together cover the full data-collection pipeline:
- search_images_bing -- queries the Bing Image Search API and returns a list of image URLs.
- search_images_ddg -- queries DuckDuckGo image search (no API key required) and returns a list of image URLs.
- download_images -- downloads a list of URLs to a local directory, handling timeouts and errors gracefully.
- verify_images -- scans a directory of image files and returns a list of paths that cannot be opened.
Usage
Import these functions at the start of any fastbook notebook that needs to collect images from the web. Use the Bing variant when you have a Microsoft Azure Cognitive Services API key for higher quotas and reliability. Use the DuckDuckGo variant for quick prototyping without requiring an API key.
Code Reference
Source Location
- Repository: fastbook
- File: utils.py (lines 33-69)
Signature
# Bing Image Search (utils.py)
def search_images_bing(key, term, min_sz=128, max_images=150):
...
# DuckDuckGo Image Search (utils.py)
def search_images_ddg(term, max_images=200):
...
# Download images to disk (fastai.vision.utils)
def download_images(dest, urls, max_pics=200, n_workers=8, timeout=4):
...
# Verify image integrity (fastai.vision.utils)
def verify_images(fns):
...
Import
from fastbook import search_images_bing, search_images_ddg
from fastai.vision.utils import download_images, verify_images, get_image_files
from fastcore.all import L
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| key | str | Yes (Bing only) | Azure Cognitive Services API key for Bing Image Search |
| term | str | Yes | Search query string describing the desired image category (e.g., "grizzly bear") |
| min_sz | int | No | Minimum image size in pixels for Bing results (default: 128) |
| max_images | int | No | Maximum number of image URLs to return (default: 150 for Bing, 200 for DDG) |
| dest | Path | Yes (download) | Local directory path where images will be saved |
| urls | list | Yes (download) | List of image URLs to download |
| fns | L | Yes (verify) | List of file paths to verify (typically from get_image_files) |
Outputs
| Name | Type | Description |
|---|---|---|
| urls | L (fastai list) | List of image URL strings returned by search functions |
| downloaded files | files on disk | JPEG/PNG image files saved to the dest directory |
| failed | L | List of Path objects for images that failed verification (could not be opened) |
Usage Examples
Basic Usage: Bing Image Search
from fastbook import search_images_bing
from fastai.vision.utils import download_images, verify_images, get_image_files
from pathlib import Path
import os
key = os.environ.get('AZURE_SEARCH_KEY', 'YOUR_KEY_HERE')
bear_types = ['grizzly', 'black', 'teddy']
path = Path('bears')
for bear in bear_types:
dest = path / bear
dest.mkdir(parents=True, exist_ok=True)
results = search_images_bing(key, f'{bear} bear', max_images=150)
download_images(dest, urls=results)
# Verify all downloaded images
fns = get_image_files(path)
failed = verify_images(fns)
print(f'Found {len(failed)} corrupted images')
failed.map(Path.unlink) # Delete corrupted files
Basic Usage: DuckDuckGo (No API Key)
from fastbook import search_images_ddg
from fastai.vision.utils import download_images, verify_images, get_image_files
from pathlib import Path
categories = ['grizzly bear', 'black bear', 'teddy bear']
path = Path('bears')
for category in categories:
folder_name = category.split()[0] # 'grizzly', 'black', 'teddy'
dest = path / folder_name
dest.mkdir(parents=True, exist_ok=True)
urls = search_images_ddg(category, max_images=200)
download_images(dest, urls=urls)
# Verify and clean up
fns = get_image_files(path)
failed = verify_images(fns)
print(f'Removing {len(failed)} failed images')
failed.map(Path.unlink)
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment