Implementation:Neuml Txtai Image Captioning
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Computer Vision, Image Captioning, Transformers |
| Last Updated | 2026-02-10 01:00 GMT |
Overview
Concrete tool for generating text captions from images using image-to-text models provided by txtai.
Description
Caption extends HFPipeline and wraps the Hugging Face image-to-text pipeline to generate descriptive captions for images. It accepts images as file path strings or PIL Image objects. When file paths are provided, they are automatically opened using PIL. Multiple generated text segments per image are joined into a single caption string. The pipeline requires the PIL (Pillow) library.
Usage
Use Caption when you need to generate human-readable descriptions of images. This is useful for image indexing, accessibility (alt text generation), content moderation, and multimodal search applications.
Code Reference
Source Location
- Repository: Neuml_Txtai
- File:
src/python/txtai/pipeline/image/caption.py
Signature
class Caption(HFPipeline):
def __init__(self, path=None, quantize=False, gpu=True, model=None, **kwargs)
def __call__(self, images)
Import
from txtai.pipeline.image.caption import Caption
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| path | str | No | Model path; accepts Hugging Face model hub id or local path. Uses default image-to-text model if not provided. |
| quantize | bool | No | If True, quantizes the model to int8 (CPU only). Defaults to False. |
| gpu | bool or int | No | True/False to enable GPU, or a specific GPU device id. Defaults to True. |
| images | str, PIL.Image, or list | Yes (call) | A single image (file path string or PIL Image object) or a list of images. |
Outputs
| Name | Type | Description |
|---|---|---|
| caption | str or list | A caption string for single image input, or a list of caption strings for list input. |
Usage Examples
from txtai.pipeline.image.caption import Caption
# Create a caption pipeline
caption = Caption(gpu=True)
# Caption a single image from file path
result = caption("photo.jpg")
# Returns: "a dog sitting on a park bench"
# Caption a PIL Image object
from PIL import Image
img = Image.open("photo.jpg")
result = caption(img)
# Caption multiple images
results = caption(["photo1.jpg", "photo2.jpg", "photo3.jpg"])
# Returns: ["a cat on a sofa", "a sunset over mountains", "a city street at night"]