Principle:Deepset ai Haystack Query Text Embedding

Metadata

Field	Value
Principle Name	Query Text Embedding
Domains	NLP, Embeddings
Related Implementation	Deepset_ai_Haystack_SentenceTransformersTextEmbedder
Source Reference	`haystack/components/embedders/sentence_transformers_text_embedder.py:L17-243`
Repository	Deepset_ai_Haystack

Overview

Query text embedding converts a single text query into a dense vector representation for semantic retrieval against pre-embedded documents. It is the query-side counterpart to document embedding and must use the same model and parameters to ensure that query and document vectors exist in the same semantic space.

Description

In a dense retrieval system, the query must be transformed into a vector that is directly comparable to the document vectors stored in the document store. Query text embedding performs this transformation at query time: a user's natural language question or search string is passed through the same sentence transformer model that was used to embed the documents, producing a single embedding vector.

The key constraint of query text embedding is model consistency. The query embedder and the document embedder must use:

The same model (same architecture and weights).
The same normalization settings (if documents were L2-normalized, queries must be as well).
The same precision settings.
Compatible prefix and suffix settings (some models use different prefixes for queries vs. documents, such as "query: " for queries and "passage: " for documents).

Unlike document embedding, which processes batches of documents during an offline indexing phase, query embedding operates on a single text string at query time and must be fast enough for interactive use.

The resulting query embedding is then passed to a retriever component (such as InMemoryEmbeddingRetriever) which computes similarity scores between the query vector and all stored document vectors to identify the most relevant documents.

Theoretical Basis

Bi-Encoder Query Encoding

Query text embedding uses the query tower of the bi-encoder architecture. In a bi-encoder system:

The document encoder maps documents to vectors during indexing (offline).
The query encoder maps the query to a vector at search time (online).
Both encoders produce vectors in the same shared space, enabling direct comparison.

Because the two encoders are typically the same model (a symmetric bi-encoder), the critical requirement is that all preprocessing steps (prefix, suffix, normalization, precision) are identical or intentionally asymmetric as the model expects.

Prefix and Suffix Instructions

Some embedding models are trained with task-specific prefixes. For example:

E5 models: Expect "query: " prepended to queries and "passage: " prepended to documents.
BGE models: Use instruction prefixes like "Represent this sentence: ".

The text embedder provides prefix and suffix parameters to support these patterns natively.

Single-Vector Output

Unlike document embedding which produces a list of enriched Document objects, query embedding produces a single flat vector (list[float]). This vector is the only output and is designed to be passed directly to an embedding retriever's query_embedding input.

Usage

Query text embedding is used in query pipelines (also called retrieval pipelines). A typical semantic search pipeline consists of:

A SentenceTransformersTextEmbedder that converts the user query into a vector.
An InMemoryEmbeddingRetriever (or another embedding retriever) that finds the most similar documents.
Optionally, a ranker that reranks the retrieved documents for higher precision.

from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore

document_store = InMemoryDocumentStore()
# (assume documents have been pre-embedded and written to document_store)

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

result = query_pipeline.run({"text_embedder": {"text": "What is semantic search?"}})
print(result["retriever"]["documents"])

Related Pages

Implementation: Deepset_ai_Haystack_SentenceTransformersTextEmbedder -- The concrete Haystack component that implements this principle.
Related Principle: Deepset_ai_Haystack_Document_Embedding -- The document-side counterpart; uses the same model to embed documents during indexing.
Related Principle: Deepset_ai_Haystack_Embedding_Based_Retrieval -- The retrieval stage that uses the query embedding to find similar documents.

Implemented By

Implementation:Deepset_ai_Haystack_SentenceTransformersTextEmbedder

Uses Heuristic

Heuristic:Deepset_ai_Haystack_Embedding_Batch_Size_And_Prefix

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment