Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Deepset ai Haystack Query Text Embedding

From Leeroopedia

Metadata

Field Value
Principle Name Query Text Embedding
Domains NLP, Embeddings
Related Implementation Deepset_ai_Haystack_SentenceTransformersTextEmbedder
Source Reference haystack/components/embedders/sentence_transformers_text_embedder.py:L17-243
Repository Deepset_ai_Haystack

Overview

Query text embedding converts a single text query into a dense vector representation for semantic retrieval against pre-embedded documents. It is the query-side counterpart to document embedding and must use the same model and parameters to ensure that query and document vectors exist in the same semantic space.

Description

In a dense retrieval system, the query must be transformed into a vector that is directly comparable to the document vectors stored in the document store. Query text embedding performs this transformation at query time: a user's natural language question or search string is passed through the same sentence transformer model that was used to embed the documents, producing a single embedding vector.

The key constraint of query text embedding is model consistency. The query embedder and the document embedder must use:

  • The same model (same architecture and weights).
  • The same normalization settings (if documents were L2-normalized, queries must be as well).
  • The same precision settings.
  • Compatible prefix and suffix settings (some models use different prefixes for queries vs. documents, such as "query: " for queries and "passage: " for documents).

Unlike document embedding, which processes batches of documents during an offline indexing phase, query embedding operates on a single text string at query time and must be fast enough for interactive use.

The resulting query embedding is then passed to a retriever component (such as InMemoryEmbeddingRetriever) which computes similarity scores between the query vector and all stored document vectors to identify the most relevant documents.

Theoretical Basis

Bi-Encoder Query Encoding

Query text embedding uses the query tower of the bi-encoder architecture. In a bi-encoder system:

  • The document encoder maps documents to vectors during indexing (offline).
  • The query encoder maps the query to a vector at search time (online).
  • Both encoders produce vectors in the same shared space, enabling direct comparison.

Because the two encoders are typically the same model (a symmetric bi-encoder), the critical requirement is that all preprocessing steps (prefix, suffix, normalization, precision) are identical or intentionally asymmetric as the model expects.

Prefix and Suffix Instructions

Some embedding models are trained with task-specific prefixes. For example:

  • E5 models: Expect "query: " prepended to queries and "passage: " prepended to documents.
  • BGE models: Use instruction prefixes like "Represent this sentence: ".

The text embedder provides prefix and suffix parameters to support these patterns natively.

Single-Vector Output

Unlike document embedding which produces a list of enriched Document objects, query embedding produces a single flat vector (list[float]). This vector is the only output and is designed to be passed directly to an embedding retriever's query_embedding input.

Usage

Query text embedding is used in query pipelines (also called retrieval pipelines). A typical semantic search pipeline consists of:

  1. A SentenceTransformersTextEmbedder that converts the user query into a vector.
  2. An InMemoryEmbeddingRetriever (or another embedding retriever) that finds the most similar documents.
  3. Optionally, a ranker that reranks the retrieved documents for higher precision.
from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore

document_store = InMemoryDocumentStore()
# (assume documents have been pre-embedded and written to document_store)

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

result = query_pipeline.run({"text_embedder": {"text": "What is semantic search?"}})
print(result["retriever"]["documents"])

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment