Implementation:Neuml Txtai ImageHash
| Knowledge Sources | |
|---|---|
| Domains | Computer Vision, Image Processing, Hashing, Near-Duplicate Detection |
| Last Updated | 2026-02-10 01:00 GMT |
Overview
Concrete tool for generating perceptual image hashes for near-duplicate image detection provided by txtai.
Description
ImageHash extends the base Pipeline class and generates perceptual hashes for images using the imagehash library. Unlike cryptographic hashes, perceptual hashes produce similar values for visually similar images, enabling near-duplicate detection. The pipeline supports five hashing algorithms: average (default), perceptual (pHash), difference (dHash), wavelet (wHash), and color hash. Output can be hex strings (default) or numpy float32 arrays. This method is not backed by machine learning models and is not intended for finding conceptually similar images.
Usage
Use ImageHash when you need fast, non-ML-based near-duplicate image detection. Perceptual hashes are useful for deduplication, content-based image retrieval at scale, and detecting modified versions of the same image.
Code Reference
Source Location
- Repository: Neuml_Txtai
- File:
src/python/txtai/pipeline/image/imagehash.py
Signature
class ImageHash(Pipeline):
def __init__(self, algorithm="average", size=8, strings=True)
def __call__(self, images)
def ihash(self, image)
Import
from txtai.pipeline.image.imagehash import ImageHash
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| algorithm | str | No | Hashing algorithm: "average" (default), "perceptual", "difference", "wavelet", or "color". |
| size | int | No | Hash size parameter. Defaults to 8. |
| strings | bool | No | If True (default), outputs hex strings. If False, outputs numpy float32 arrays. |
| images | str, PIL.Image, or list | Yes (call) | A single image (file path string or PIL Image object) or a list of images. |
Outputs
| Name | Type | Description |
|---|---|---|
| hash | str, numpy.ndarray, or list | Hex string hash for single image (when strings=True), numpy float32 array (when strings=False), or a list of hashes for list input. |
Usage Examples
from txtai.pipeline.image.imagehash import ImageHash
# Create an image hash pipeline with default average hash
hasher = ImageHash()
# Hash a single image
result = hasher("photo.jpg")
# Returns: "f8f0e0c0c0e0f0f8" (hex string)
# Hash multiple images
results = hasher(["photo1.jpg", "photo2.jpg", "photo3.jpg"])
# Use perceptual hash algorithm
hasher = ImageHash(algorithm="perceptual", size=16)
result = hasher("photo.jpg")
# Get numpy array output instead of hex strings
hasher = ImageHash(algorithm="difference", strings=False)
result = hasher("photo.jpg")
# Returns: numpy float32 array
# Compare two images for near-duplicate detection
hasher = ImageHash()
hash1 = hasher("original.jpg")
hash2 = hasher("modified.jpg")
# Similar images will have similar hash values