Principle:PacktPublishing LLM Engineers Handbook Vector Similarity Search

Field	Value
Concept	Approximate nearest-neighbor search in vector space
Category	Retrieval / Vector Search
Workflow	RAG_Inference
Repository	PacktPublishing/LLM-Engineers-Handbook
Implemented by	Implementation:PacktPublishing_LLM_Engineers_Handbook_VectorBaseDocument_Search

Overview

Dense Vector Retrieval is the technique of searching for semantically similar documents by computing vector similarity (cosine distance) between query embeddings and stored document embeddings. The query text is first embedded using the same model as the documents, then Qdrant's ANN (Approximate Nearest Neighbor) index finds the closest vectors. Optional metadata filters (from self-query) restrict the search space. Parallel search across multiple collections (posts, articles, repositories) broadens coverage.

Theory

Mathematical Basis

The core operation of vector similarity search is computing the cosine similarity between a query embedding and each document embedding:

similarity(q, d) = cos(embed(q), embed(d)) = (embed(q) . embed(d)) / (||embed(q)|| * ||embed(d)||)

Where:

embed(q) is the vector representation of the query
embed(d) is the vector representation of the document
The dot product in the numerator measures directional alignment
The norms in the denominator normalize for vector magnitude

Approximate Nearest Neighbor (ANN)

Exact nearest-neighbor search is O(n) in the number of documents, which is prohibitive for large collections. Qdrant uses HNSW (Hierarchical Navigable Small World) graphs to perform approximate nearest-neighbor search in sub-linear time. This trades a small amount of recall for dramatic speedups.

Multi-Collection Search

The retrieval system searches across multiple Qdrant collections in parallel:

Posts - social media and blog posts
Articles - long-form articles and documentation
Repositories - code repository content

Results from all collections are aggregated into a unified candidate set for reranking.

Metadata Filtering

When self-query extracts metadata (e.g., author_id), Qdrant applies these as pre-filters before vector search. This is more efficient than post-filtering because it reduces the search space before computing distances.

When to Use

When retrieving relevant context documents for RAG based on semantic similarity
When the query and documents may use different vocabulary for the same concepts
When you need to search across multiple document collections simultaneously
When metadata constraints should restrict the search space for precision

Related Concepts

Bi-encoder models - models that encode query and document independently
HNSW index - graph-based ANN algorithm used by Qdrant
Hybrid search - combining dense vector search with sparse keyword search
Embedding models - neural models that map text to dense vector representations

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment