Implementation:PacktPublishing LLM Engineers Handbook VectorBaseDocument Search

Field	Value
Type	API Doc
Workflow	RAG_Inference
Repository	PacktPublishing/LLM-Engineers-Handbook
Source	vector.py:L138-161, retriever.py:L63-97
Implements	Principle:PacktPublishing_LLM_Engineers_Handbook_Vector_Similarity_Search

API Signature

VectorBaseDocument.search(
    cls,
    query_vector: list,
    limit: int = 3,
    query_filter: Filter | None = None
) -> list[T]

Import

from llm_engineering.domain.base.vector import VectorBaseDocument

Key Code

From vector.py (the base class search method):

@classmethod
def search(cls, query_vector: list, limit: int = 3, query_filter=None) -> list:
    collection_name = cls.get_collection_name()
    qdrant_client = connection.get_qdrant_client()
    hits = qdrant_client.search(
        collection_name=collection_name,
        query_vector=query_vector,
        limit=limit,
        query_filter=query_filter,
    )
    return [cls.from_record(hit) for hit in hits]

From retriever.py (the orchestration logic that calls search across multiple collections):

The retriever embeds each expanded query using the same embedding model, constructs optional metadata filters from the self-query results, and searches across multiple document collections (posts, articles, repositories) in parallel. Results are aggregated and deduplicated before being passed to the reranker.

Parameters

Parameter	Type	Default	Description
query_vector	list[float]	(required)	The embedding vector for the query text
limit	int	3	Maximum number of results to return per collection
query_filter	Filter or None	None	Optional Qdrant filter for metadata-based pre-filtering

Inputs and Outputs

Inputs:

query_vector (list[float]) - Dense vector embedding of the query text
limit (int) - Maximum number of results to return
query_filter (Qdrant Filter) - Optional metadata filter (e.g., filtering by author_id)

Outputs:

list[T] - List of matching documents sorted by cosine similarity score, where T is a subclass of VectorBaseDocument (e.g., EmbeddedChunk)

How It Works

The class method resolves the collection name from the document type (posts, articles, or repositories)
A Qdrant client connection is obtained from the connection pool
The search call is made to Qdrant with the query vector, limit, and optional filter
Qdrant performs ANN search using its HNSW index, applying any metadata filters as pre-filters
Raw search hits are converted to domain objects via cls.from_record(hit)
Results are returned sorted by descending similarity score

External Dependencies

qdrant_client - Python client for the Qdrant vector database

Source Files

llm_engineering/domain/base/vector.py (lines 138-161)
llm_engineering/application/rag/retriever.py (lines 63-97)

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment