Implementation:Neuml Txtai CrossEncoder

Knowledge Sources	Neuml_Txtai
Domains	Machine Learning, NLP, Semantic Similarity, Transformers
Last Updated	2026-02-10 01:00 GMT

Overview

Concrete tool for computing pairwise similarity between a query and a list of texts using a cross-encoder model provided by txtai.

Description

CrossEncoder extends HFPipeline and wraps a Hugging Face text-classification model to compute similarity scores between a query and a list of candidate texts. Each (query, text) pair is passed through the model as a text-classification input with text_pair. The raw logit scores can be transformed using sigmoid (independent labels), softmax (normalized scores), or returned raw, controlled by the multilabel parameter. Results are returned as a list of (id, score) tuples sorted by descending score.

Usage

Use CrossEncoder when you need accurate pairwise similarity scores between a query and candidate texts. Cross-encoders process the query and text jointly, producing higher quality scores than bi-encoder approaches, at the cost of increased computation. This is typically used for re-ranking search results.

Code Reference

Source Location

Repository: Neuml_Txtai
File: src/python/txtai/pipeline/text/crossencoder.py

Signature

class CrossEncoder(HFPipeline):
    def __init__(self, path=None, quantize=False, gpu=True, model=None, **kwargs)
    def __call__(self, query, texts, multilabel=True, workers=0)
    def function(self, scores, multilabel)

Import

from txtai.pipeline.text.crossencoder import CrossEncoder

I/O Contract

Inputs

Name	Type	Required	Description
query	str or list	Yes	Query text or list of query texts.
texts	list	Yes	List of candidate text strings to compare against the query.
multilabel	bool or None	No	If True, applies sigmoid (independent scores). If False, applies softmax (normalized to sum to 1). If None, returns raw scores. Defaults to True.
workers	int	No	Number of concurrent workers for data processing. Defaults to 0.

Outputs

Name	Type	Description
result	list of (int, float)	List of (id, score) tuples sorted by descending score. If query is a string, returns a 1D list. If query is a list, returns a 2D list with one row per query.

Usage Examples

from txtai.pipeline.text.crossencoder import CrossEncoder

# Create a cross-encoder pipeline
encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2", gpu=True)

# Compute similarity for a single query
results = encoder("What is machine learning?", [
    "Machine learning is a branch of AI",
    "Python is a programming language",
    "Deep learning uses neural networks"
])
# Returns: [(0, 0.98), (2, 0.85), (1, 0.12)]

# Compute similarity for multiple queries
results = encoder(
    ["What is AI?", "What is Python?"],
    ["AI is artificial intelligence", "Python is a language", "ML is a subset of AI"]
)

Related Pages

Environment:Neuml_Txtai_Python_Core_Dependencies

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment