Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai Similarity Pipeline

From Leeroopedia
Revision as of 16:05, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Neuml_Txtai_Similarity_Pipeline.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Machine Learning, NLP, Semantic Similarity, Transformers
Last Updated 2026-02-10 01:00 GMT

Overview

Concrete tool for computing semantic similarity between a query and candidate texts using multiple backend strategies provided by txtai.

Description

Similarity extends Labels and serves as a unified similarity interface that supports three backend strategies: zero-shot classification (default, inherited from Labels), cross-encoder scoring (via CrossEncoder), and late interaction scoring (via LateEncoder). The backend is selected at initialization based on the crossencode and lateencode flags. For the zero-shot mode, the query is used as the candidate label and the texts are classified against it, with scores transposed to produce per-query similarity rankings. All modes return results as (id, score) tuples sorted by descending score.

Usage

Use Similarity when you need a flexible similarity pipeline that can switch between zero-shot, cross-encoder, and late interaction backends. It is the primary similarity interface used by other txtai components such as the Reranker pipeline.

Code Reference

Source Location

  • Repository: Neuml_Txtai
  • File: src/python/txtai/pipeline/text/similarity.py

Signature

class Similarity(Labels):
    def __init__(self, path=None, quantize=False, gpu=True, model=None, dynamic=True, crossencode=False, lateencode=False, **kwargs)
    def __call__(self, query, texts, multilabel=True, **kwargs)
    def encode(self, data, category)

Import

from txtai.pipeline.text.similarity import Similarity

I/O Contract

Inputs

Name Type Required Description
path str No Model path; accepts Hugging Face model hub id or local path.
quantize bool No If True, quantizes the model to int8 (CPU only). Defaults to False.
gpu bool or int No True/False to enable GPU, or a specific GPU device id. Defaults to True.
model Pipeline No Optional existing pipeline model to wrap.
dynamic bool No If True (default), uses zero-shot classification. If False, uses standard text classification.
crossencode bool No If True, uses a cross-encoder backend. Defaults to False.
lateencode bool No If True, uses a late interaction encoder backend (e.g. ColBERT). Defaults to False.
query str or list Yes (call) Query text or list of query texts.
texts list Yes (call) List of candidate text strings to compare against the query.
multilabel bool or None No (call) Score normalization mode. Defaults to True (sigmoid).

Outputs

Name Type Description
result list of (int, float) List of (id, score) tuples sorted by descending score. If query is a string, returns a 1D list. If query is a list, returns a 2D list with one row per query.

Usage Examples

from txtai.pipeline.text.similarity import Similarity

# Zero-shot similarity (default)
similarity = Similarity()
results = similarity("What is machine learning?", [
    "Machine learning is a type of AI",
    "Python is a programming language",
    "Neural networks process data"
])
# Returns: [(0, 0.92), (2, 0.45), (1, 0.08)]

# Cross-encoder similarity
similarity = Similarity("cross-encoder/ms-marco-MiniLM-L-6-v2", crossencode=True)
results = similarity("What is AI?", ["AI is intelligence", "Python is a language"])

# Late interaction similarity (ColBERT)
similarity = Similarity("colbert-ir/colbertv2.0", lateencode=True)
results = similarity("What is AI?", ["AI is intelligence", "Python is a language"])

# Multiple queries
results = similarity(["What is AI?", "What is Python?"], ["AI is intelligence", "Python is a language"])

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment