Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Fastai Fastbook Search Images

From Leeroopedia


Knowledge Sources
Domains Computer_Vision, Data_Engineering
Last Updated 2026-02-09 17:00 GMT

Overview

Concrete tools for collecting, downloading, and verifying training images provided by the fastbook utils.py module and the fastai.vision.utils package.

Description

The fastbook repository ships four utility functions that together cover the full data-collection pipeline:

  • search_images_bing -- queries the Bing Image Search API and returns a list of image URLs.
  • search_images_ddg -- queries DuckDuckGo image search (no API key required) and returns a list of image URLs.
  • download_images -- downloads a list of URLs to a local directory, handling timeouts and errors gracefully.
  • verify_images -- scans a directory of image files and returns a list of paths that cannot be opened.

Usage

Import these functions at the start of any fastbook notebook that needs to collect images from the web. Use the Bing variant when you have a Microsoft Azure Cognitive Services API key for higher quotas and reliability. Use the DuckDuckGo variant for quick prototyping without requiring an API key.

Code Reference

Source Location

  • Repository: fastbook
  • File: utils.py (lines 33-69)

Signature

# Bing Image Search (utils.py)
def search_images_bing(key, term, min_sz=128, max_images=150):
    ...

# DuckDuckGo Image Search (utils.py)
def search_images_ddg(term, max_images=200):
    ...

# Download images to disk (fastai.vision.utils)
def download_images(dest, urls, max_pics=200, n_workers=8, timeout=4):
    ...

# Verify image integrity (fastai.vision.utils)
def verify_images(fns):
    ...

Import

from fastbook import search_images_bing, search_images_ddg
from fastai.vision.utils import download_images, verify_images, get_image_files
from fastcore.all import L

I/O Contract

Inputs

Name Type Required Description
key str Yes (Bing only) Azure Cognitive Services API key for Bing Image Search
term str Yes Search query string describing the desired image category (e.g., "grizzly bear")
min_sz int No Minimum image size in pixels for Bing results (default: 128)
max_images int No Maximum number of image URLs to return (default: 150 for Bing, 200 for DDG)
dest Path Yes (download) Local directory path where images will be saved
urls list Yes (download) List of image URLs to download
fns L Yes (verify) List of file paths to verify (typically from get_image_files)

Outputs

Name Type Description
urls L (fastai list) List of image URL strings returned by search functions
downloaded files files on disk JPEG/PNG image files saved to the dest directory
failed L List of Path objects for images that failed verification (could not be opened)

Usage Examples

Basic Usage: Bing Image Search

from fastbook import search_images_bing
from fastai.vision.utils import download_images, verify_images, get_image_files
from pathlib import Path
import os

key = os.environ.get('AZURE_SEARCH_KEY', 'YOUR_KEY_HERE')
bear_types = ['grizzly', 'black', 'teddy']
path = Path('bears')

for bear in bear_types:
    dest = path / bear
    dest.mkdir(parents=True, exist_ok=True)
    results = search_images_bing(key, f'{bear} bear', max_images=150)
    download_images(dest, urls=results)

# Verify all downloaded images
fns = get_image_files(path)
failed = verify_images(fns)
print(f'Found {len(failed)} corrupted images')
failed.map(Path.unlink)  # Delete corrupted files

Basic Usage: DuckDuckGo (No API Key)

from fastbook import search_images_ddg
from fastai.vision.utils import download_images, verify_images, get_image_files
from pathlib import Path

categories = ['grizzly bear', 'black bear', 'teddy bear']
path = Path('bears')

for category in categories:
    folder_name = category.split()[0]  # 'grizzly', 'black', 'teddy'
    dest = path / folder_name
    dest.mkdir(parents=True, exist_ok=True)
    urls = search_images_ddg(category, max_images=200)
    download_images(dest, urls=urls)

# Verify and clean up
fns = get_image_files(path)
failed = verify_images(fns)
print(f'Removing {len(failed)} failed images')
failed.map(Path.unlink)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment