Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai ImageHash

From Leeroopedia


Knowledge Sources
Domains Computer Vision, Image Processing, Hashing, Near-Duplicate Detection
Last Updated 2026-02-10 01:00 GMT

Overview

Concrete tool for generating perceptual image hashes for near-duplicate image detection provided by txtai.

Description

ImageHash extends the base Pipeline class and generates perceptual hashes for images using the imagehash library. Unlike cryptographic hashes, perceptual hashes produce similar values for visually similar images, enabling near-duplicate detection. The pipeline supports five hashing algorithms: average (default), perceptual (pHash), difference (dHash), wavelet (wHash), and color hash. Output can be hex strings (default) or numpy float32 arrays. This method is not backed by machine learning models and is not intended for finding conceptually similar images.

Usage

Use ImageHash when you need fast, non-ML-based near-duplicate image detection. Perceptual hashes are useful for deduplication, content-based image retrieval at scale, and detecting modified versions of the same image.

Code Reference

Source Location

  • Repository: Neuml_Txtai
  • File: src/python/txtai/pipeline/image/imagehash.py

Signature

class ImageHash(Pipeline):
    def __init__(self, algorithm="average", size=8, strings=True)
    def __call__(self, images)
    def ihash(self, image)

Import

from txtai.pipeline.image.imagehash import ImageHash

I/O Contract

Inputs

Name Type Required Description
algorithm str No Hashing algorithm: "average" (default), "perceptual", "difference", "wavelet", or "color".
size int No Hash size parameter. Defaults to 8.
strings bool No If True (default), outputs hex strings. If False, outputs numpy float32 arrays.
images str, PIL.Image, or list Yes (call) A single image (file path string or PIL Image object) or a list of images.

Outputs

Name Type Description
hash str, numpy.ndarray, or list Hex string hash for single image (when strings=True), numpy float32 array (when strings=False), or a list of hashes for list input.

Usage Examples

from txtai.pipeline.image.imagehash import ImageHash

# Create an image hash pipeline with default average hash
hasher = ImageHash()

# Hash a single image
result = hasher("photo.jpg")
# Returns: "f8f0e0c0c0e0f0f8" (hex string)

# Hash multiple images
results = hasher(["photo1.jpg", "photo2.jpg", "photo3.jpg"])

# Use perceptual hash algorithm
hasher = ImageHash(algorithm="perceptual", size=16)
result = hasher("photo.jpg")

# Get numpy array output instead of hex strings
hasher = ImageHash(algorithm="difference", strings=False)
result = hasher("photo.jpg")
# Returns: numpy float32 array

# Compare two images for near-duplicate detection
hasher = ImageHash()
hash1 = hasher("original.jpg")
hash2 = hasher("modified.jpg")
# Similar images will have similar hash values

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment