Principle:PacktPublishing LLM Engineers Handbook Vector Similarity Search
| Field | Value |
|---|---|
| Concept | Approximate nearest-neighbor search in vector space |
| Category | Retrieval / Vector Search |
| Workflow | RAG_Inference |
| Repository | PacktPublishing/LLM-Engineers-Handbook |
| Implemented by | Implementation:PacktPublishing_LLM_Engineers_Handbook_VectorBaseDocument_Search |
Overview
Dense Vector Retrieval is the technique of searching for semantically similar documents by computing vector similarity (cosine distance) between query embeddings and stored document embeddings. The query text is first embedded using the same model as the documents, then Qdrant's ANN (Approximate Nearest Neighbor) index finds the closest vectors. Optional metadata filters (from self-query) restrict the search space. Parallel search across multiple collections (posts, articles, repositories) broadens coverage.
Theory
Mathematical Basis
The core operation of vector similarity search is computing the cosine similarity between a query embedding and each document embedding:
similarity(q, d) = cos(embed(q), embed(d)) = (embed(q) . embed(d)) / (||embed(q)|| * ||embed(d)||)
Where:
embed(q)is the vector representation of the queryembed(d)is the vector representation of the document- The dot product in the numerator measures directional alignment
- The norms in the denominator normalize for vector magnitude
Approximate Nearest Neighbor (ANN)
Exact nearest-neighbor search is O(n) in the number of documents, which is prohibitive for large collections. Qdrant uses HNSW (Hierarchical Navigable Small World) graphs to perform approximate nearest-neighbor search in sub-linear time. This trades a small amount of recall for dramatic speedups.
Multi-Collection Search
The retrieval system searches across multiple Qdrant collections in parallel:
- Posts - social media and blog posts
- Articles - long-form articles and documentation
- Repositories - code repository content
Results from all collections are aggregated into a unified candidate set for reranking.
Metadata Filtering
When self-query extracts metadata (e.g., author_id), Qdrant applies these as pre-filters before vector search. This is more efficient than post-filtering because it reduces the search space before computing distances.
When to Use
- When retrieving relevant context documents for RAG based on semantic similarity
- When the query and documents may use different vocabulary for the same concepts
- When you need to search across multiple document collections simultaneously
- When metadata constraints should restrict the search space for precision
Related Concepts
- Bi-encoder models - models that encode query and document independently
- HNSW index - graph-based ANN algorithm used by Qdrant
- Hybrid search - combining dense vector search with sparse keyword search
- Embedding models - neural models that map text to dense vector representations