Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:PacktPublishing LLM Engineers Handbook Vector Similarity Search

From Leeroopedia


Field Value
Concept Approximate nearest-neighbor search in vector space
Category Retrieval / Vector Search
Workflow RAG_Inference
Repository PacktPublishing/LLM-Engineers-Handbook
Implemented by Implementation:PacktPublishing_LLM_Engineers_Handbook_VectorBaseDocument_Search

Overview

Dense Vector Retrieval is the technique of searching for semantically similar documents by computing vector similarity (cosine distance) between query embeddings and stored document embeddings. The query text is first embedded using the same model as the documents, then Qdrant's ANN (Approximate Nearest Neighbor) index finds the closest vectors. Optional metadata filters (from self-query) restrict the search space. Parallel search across multiple collections (posts, articles, repositories) broadens coverage.

Theory

Mathematical Basis

The core operation of vector similarity search is computing the cosine similarity between a query embedding and each document embedding:

similarity(q, d) = cos(embed(q), embed(d)) = (embed(q) . embed(d)) / (||embed(q)|| * ||embed(d)||)

Where:

  • embed(q) is the vector representation of the query
  • embed(d) is the vector representation of the document
  • The dot product in the numerator measures directional alignment
  • The norms in the denominator normalize for vector magnitude

Approximate Nearest Neighbor (ANN)

Exact nearest-neighbor search is O(n) in the number of documents, which is prohibitive for large collections. Qdrant uses HNSW (Hierarchical Navigable Small World) graphs to perform approximate nearest-neighbor search in sub-linear time. This trades a small amount of recall for dramatic speedups.

Multi-Collection Search

The retrieval system searches across multiple Qdrant collections in parallel:

  • Posts - social media and blog posts
  • Articles - long-form articles and documentation
  • Repositories - code repository content

Results from all collections are aggregated into a unified candidate set for reranking.

Metadata Filtering

When self-query extracts metadata (e.g., author_id), Qdrant applies these as pre-filters before vector search. This is more efficient than post-filtering because it reduces the search space before computing distances.

When to Use

  • When retrieving relevant context documents for RAG based on semantic similarity
  • When the query and documents may use different vocabulary for the same concepts
  • When you need to search across multiple document collections simultaneously
  • When metadata constraints should restrict the search space for precision

Related Concepts

  • Bi-encoder models - models that encode query and document independently
  • HNSW index - graph-based ANN algorithm used by Qdrant
  • Hybrid search - combining dense vector search with sparse keyword search
  • Embedding models - neural models that map text to dense vector representations

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment