Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai CrossEncoder

From Leeroopedia


Knowledge Sources
Domains Machine Learning, NLP, Semantic Similarity, Transformers
Last Updated 2026-02-10 01:00 GMT

Overview

Concrete tool for computing pairwise similarity between a query and a list of texts using a cross-encoder model provided by txtai.

Description

CrossEncoder extends HFPipeline and wraps a Hugging Face text-classification model to compute similarity scores between a query and a list of candidate texts. Each (query, text) pair is passed through the model as a text-classification input with text_pair. The raw logit scores can be transformed using sigmoid (independent labels), softmax (normalized scores), or returned raw, controlled by the multilabel parameter. Results are returned as a list of (id, score) tuples sorted by descending score.

Usage

Use CrossEncoder when you need accurate pairwise similarity scores between a query and candidate texts. Cross-encoders process the query and text jointly, producing higher quality scores than bi-encoder approaches, at the cost of increased computation. This is typically used for re-ranking search results.

Code Reference

Source Location

  • Repository: Neuml_Txtai
  • File: src/python/txtai/pipeline/text/crossencoder.py

Signature

class CrossEncoder(HFPipeline):
    def __init__(self, path=None, quantize=False, gpu=True, model=None, **kwargs)
    def __call__(self, query, texts, multilabel=True, workers=0)
    def function(self, scores, multilabel)

Import

from txtai.pipeline.text.crossencoder import CrossEncoder

I/O Contract

Inputs

Name Type Required Description
query str or list Yes Query text or list of query texts.
texts list Yes List of candidate text strings to compare against the query.
multilabel bool or None No If True, applies sigmoid (independent scores). If False, applies softmax (normalized to sum to 1). If None, returns raw scores. Defaults to True.
workers int No Number of concurrent workers for data processing. Defaults to 0.

Outputs

Name Type Description
result list of (int, float) List of (id, score) tuples sorted by descending score. If query is a string, returns a 1D list. If query is a list, returns a 2D list with one row per query.

Usage Examples

from txtai.pipeline.text.crossencoder import CrossEncoder

# Create a cross-encoder pipeline
encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2", gpu=True)

# Compute similarity for a single query
results = encoder("What is machine learning?", [
    "Machine learning is a branch of AI",
    "Python is a programming language",
    "Deep learning uses neural networks"
])
# Returns: [(0, 0.98), (2, 0.85), (1, 0.12)]

# Compute similarity for multiple queries
results = encoder(
    ["What is AI?", "What is Python?"],
    ["AI is artificial intelligence", "Python is a language", "ML is a subset of AI"]
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment