Implementation:Neuml Txtai CrossEncoder
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, NLP, Semantic Similarity, Transformers |
| Last Updated | 2026-02-10 01:00 GMT |
Overview
Concrete tool for computing pairwise similarity between a query and a list of texts using a cross-encoder model provided by txtai.
Description
CrossEncoder extends HFPipeline and wraps a Hugging Face text-classification model to compute similarity scores between a query and a list of candidate texts. Each (query, text) pair is passed through the model as a text-classification input with text_pair. The raw logit scores can be transformed using sigmoid (independent labels), softmax (normalized scores), or returned raw, controlled by the multilabel parameter. Results are returned as a list of (id, score) tuples sorted by descending score.
Usage
Use CrossEncoder when you need accurate pairwise similarity scores between a query and candidate texts. Cross-encoders process the query and text jointly, producing higher quality scores than bi-encoder approaches, at the cost of increased computation. This is typically used for re-ranking search results.
Code Reference
Source Location
- Repository: Neuml_Txtai
- File:
src/python/txtai/pipeline/text/crossencoder.py
Signature
class CrossEncoder(HFPipeline):
def __init__(self, path=None, quantize=False, gpu=True, model=None, **kwargs)
def __call__(self, query, texts, multilabel=True, workers=0)
def function(self, scores, multilabel)
Import
from txtai.pipeline.text.crossencoder import CrossEncoder
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| query | str or list | Yes | Query text or list of query texts. |
| texts | list | Yes | List of candidate text strings to compare against the query. |
| multilabel | bool or None | No | If True, applies sigmoid (independent scores). If False, applies softmax (normalized to sum to 1). If None, returns raw scores. Defaults to True. |
| workers | int | No | Number of concurrent workers for data processing. Defaults to 0. |
Outputs
| Name | Type | Description |
|---|---|---|
| result | list of (int, float) | List of (id, score) tuples sorted by descending score. If query is a string, returns a 1D list. If query is a list, returns a 2D list with one row per query. |
Usage Examples
from txtai.pipeline.text.crossencoder import CrossEncoder
# Create a cross-encoder pipeline
encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2", gpu=True)
# Compute similarity for a single query
results = encoder("What is machine learning?", [
"Machine learning is a branch of AI",
"Python is a programming language",
"Deep learning uses neural networks"
])
# Returns: [(0, 0.98), (2, 0.85), (1, 0.12)]
# Compute similarity for multiple queries
results = encoder(
["What is AI?", "What is Python?"],
["AI is artificial intelligence", "Python is a language", "ML is a subset of AI"]
)