Implementation:PacktPublishing LLM Engineers Handbook NoSQLBaseDocument Bulk Find
Appearance
| Type | API Doc |
|---|---|
| API | NoSQLBaseDocument.bulk_find(cls, **filter_options) -> list[T]
|
| Source | llm_engineering/domain/base/nosql.py:L122-130 |
| Repository | PacktPublishing/LLM-Engineers-Handbook |
| Implements | Principle:PacktPublishing_LLM_Engineers_Handbook_Data_Warehouse_Query |
Overview
The bulk_find class method on NoSQLBaseDocument retrieves multiple documents from a MongoDB collection based on arbitrary filter criteria. It is the primary data ingestion API used by the feature engineering pipeline to load raw documents (articles, posts, repositories) from the data warehouse.
API Signature
@classmethod
def bulk_find(cls, **filter_options) -> list:
Parameters
| Parameter | Type | Description |
|---|---|---|
**filter_options |
dict (keyword arguments) |
MongoDB query filter predicates passed directly to collection.find(). Common filters include author_id=uuid to retrieve all documents by a specific author.
|
Return Value
| Type | Description |
|---|---|
list[T] |
A list of deserialized document instances matching the filter. T is the concrete subclass of NoSQLBaseDocument (e.g., ArticleDocument, PostDocument, RepositoryDocument). Returns an empty list if no documents match or if an error occurs.
|
Source Code
@classmethod
def bulk_find(cls, **filter_options) -> list:
collection = cls._get_collection()
try:
instances = collection.find(filter_options)
return [cls.from_mongo(instance) for instance in instances]
except Exception:
logger.exception("Failed to retrieve documents.")
return []
Import
from llm_engineering.domain.base.nosql import NoSQLBaseDocument
In practice, callers import concrete subclasses rather than the base class:
from llm_engineering.domain.documents import ArticleDocument, PostDocument, RepositoryDocument
How It Works
- Collection resolution —
cls._get_collection()returns the PyMongo collection object for the calling class. Each document subclass defines its collection name via an innerSettingsclass with anameattribute. - Query execution — The
filter_optionskeyword arguments are passed directly to PyMongo'scollection.find(), which returns a cursor over matching MongoDB documents. - Deserialization — Each raw MongoDB document (a dictionary with
_idas an ObjectId) is converted into a typed Pydantic model viacls.from_mongo(instance). This method handles the_idtoidfield mapping. - Error handling — If any exception occurs during the query or deserialization, it is logged via loguru and an empty list is returned, ensuring the pipeline does not crash on transient database errors.
Usage Example
from llm_engineering.domain.documents import ArticleDocument
# Retrieve all articles by a specific author
author_uuid = "550e8400-e29b-41d4-a716-446655440000"
articles = ArticleDocument.bulk_find(author_id=author_uuid)
print(f"Found {len(articles)} articles")
for article in articles:
print(f" - {article.id}: {article.content[:80]}...")
External Dependencies
| Dependency | Purpose |
|---|---|
| pymongo | MongoDB driver; provides collection.find() for executing queries
|
| pydantic | Data validation and serialization; NoSQLBaseDocument extends Pydantic's BaseModel
|
| loguru | Structured logging; used for exception reporting on query failure |
Design Notes
- The method is a classmethod, meaning it is called on the document subclass itself (e.g.,
ArticleDocument.bulk_find(...)), and the returned list contains instances of that specific subclass. - The use of
**filter_optionsprovides a flexible, Pythonic interface that maps directly to MongoDB's query syntax without requiring callers to construct query dictionaries manually. - The fail-safe return of an empty list on exception is a deliberate design choice that prioritizes pipeline resilience over strict error propagation.
See Also
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment