Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai Image Captioning

From Leeroopedia


Knowledge Sources
Domains Machine Learning, Computer Vision, Image Captioning, Transformers
Last Updated 2026-02-10 01:00 GMT

Overview

Concrete tool for generating text captions from images using image-to-text models provided by txtai.

Description

Caption extends HFPipeline and wraps the Hugging Face image-to-text pipeline to generate descriptive captions for images. It accepts images as file path strings or PIL Image objects. When file paths are provided, they are automatically opened using PIL. Multiple generated text segments per image are joined into a single caption string. The pipeline requires the PIL (Pillow) library.

Usage

Use Caption when you need to generate human-readable descriptions of images. This is useful for image indexing, accessibility (alt text generation), content moderation, and multimodal search applications.

Code Reference

Source Location

  • Repository: Neuml_Txtai
  • File: src/python/txtai/pipeline/image/caption.py

Signature

class Caption(HFPipeline):
    def __init__(self, path=None, quantize=False, gpu=True, model=None, **kwargs)
    def __call__(self, images)

Import

from txtai.pipeline.image.caption import Caption

I/O Contract

Inputs

Name Type Required Description
path str No Model path; accepts Hugging Face model hub id or local path. Uses default image-to-text model if not provided.
quantize bool No If True, quantizes the model to int8 (CPU only). Defaults to False.
gpu bool or int No True/False to enable GPU, or a specific GPU device id. Defaults to True.
images str, PIL.Image, or list Yes (call) A single image (file path string or PIL Image object) or a list of images.

Outputs

Name Type Description
caption str or list A caption string for single image input, or a list of caption strings for list input.

Usage Examples

from txtai.pipeline.image.caption import Caption

# Create a caption pipeline
caption = Caption(gpu=True)

# Caption a single image from file path
result = caption("photo.jpg")
# Returns: "a dog sitting on a park bench"

# Caption a PIL Image object
from PIL import Image
img = Image.open("photo.jpg")
result = caption(img)

# Caption multiple images
results = caption(["photo1.jpg", "photo2.jpg", "photo3.jpg"])
# Returns: ["a cat on a sofa", "a sunset over mountains", "a city street at night"]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment