Implementation:Neuml Txtai Image Captioning

Knowledge Sources	Neuml_Txtai
Domains	Machine Learning, Computer Vision, Image Captioning, Transformers
Last Updated	2026-02-10 01:00 GMT

Overview

Concrete tool for generating text captions from images using image-to-text models provided by txtai.

Description

Caption extends HFPipeline and wraps the Hugging Face image-to-text pipeline to generate descriptive captions for images. It accepts images as file path strings or PIL Image objects. When file paths are provided, they are automatically opened using PIL. Multiple generated text segments per image are joined into a single caption string. The pipeline requires the PIL (Pillow) library.

Usage

Use Caption when you need to generate human-readable descriptions of images. This is useful for image indexing, accessibility (alt text generation), content moderation, and multimodal search applications.

Code Reference

Source Location

Repository: Neuml_Txtai
File: src/python/txtai/pipeline/image/caption.py

Signature

class Caption(HFPipeline):
    def __init__(self, path=None, quantize=False, gpu=True, model=None, **kwargs)
    def __call__(self, images)

Import

from txtai.pipeline.image.caption import Caption

I/O Contract

Inputs

Name	Type	Required	Description
path	str	No	Model path; accepts Hugging Face model hub id or local path. Uses default image-to-text model if not provided.
quantize	bool	No	If True, quantizes the model to int8 (CPU only). Defaults to False.
gpu	bool or int	No	True/False to enable GPU, or a specific GPU device id. Defaults to True.
images	str, PIL.Image, or list	Yes (call)	A single image (file path string or PIL Image object) or a list of images.

Outputs

Name	Type	Description
caption	str or list	A caption string for single image input, or a list of caption strings for list input.

Usage Examples

from txtai.pipeline.image.caption import Caption

# Create a caption pipeline
caption = Caption(gpu=True)

# Caption a single image from file path
result = caption("photo.jpg")
# Returns: "a dog sitting on a park bench"

# Caption a PIL Image object
from PIL import Image
img = Image.open("photo.jpg")
result = caption(img)

# Caption multiple images
results = caption(["photo1.jpg", "photo2.jpg", "photo3.jpg"])
# Returns: ["a cat on a sofa", "a sunset over mountains", "a city street at night"]

Related Pages

Environment:Neuml_Txtai_Python_Core_Dependencies

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment