Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:PacktPublishing LLM Engineers Handbook NoSQLBaseDocument Bulk Find

From Leeroopedia


Type API Doc
API NoSQLBaseDocument.bulk_find(cls, **filter_options) -> list[T]
Source llm_engineering/domain/base/nosql.py:L122-130
Repository PacktPublishing/LLM-Engineers-Handbook
Implements Principle:PacktPublishing_LLM_Engineers_Handbook_Data_Warehouse_Query

Overview

The bulk_find class method on NoSQLBaseDocument retrieves multiple documents from a MongoDB collection based on arbitrary filter criteria. It is the primary data ingestion API used by the feature engineering pipeline to load raw documents (articles, posts, repositories) from the data warehouse.

API Signature

@classmethod
def bulk_find(cls, **filter_options) -> list:

Parameters

Parameter Type Description
**filter_options dict (keyword arguments) MongoDB query filter predicates passed directly to collection.find(). Common filters include author_id=uuid to retrieve all documents by a specific author.

Return Value

Type Description
list[T] A list of deserialized document instances matching the filter. T is the concrete subclass of NoSQLBaseDocument (e.g., ArticleDocument, PostDocument, RepositoryDocument). Returns an empty list if no documents match or if an error occurs.

Source Code

@classmethod
def bulk_find(cls, **filter_options) -> list:
    collection = cls._get_collection()
    try:
        instances = collection.find(filter_options)
        return [cls.from_mongo(instance) for instance in instances]
    except Exception:
        logger.exception("Failed to retrieve documents.")
        return []

Import

from llm_engineering.domain.base.nosql import NoSQLBaseDocument

In practice, callers import concrete subclasses rather than the base class:

from llm_engineering.domain.documents import ArticleDocument, PostDocument, RepositoryDocument

How It Works

  1. Collection resolutioncls._get_collection() returns the PyMongo collection object for the calling class. Each document subclass defines its collection name via an inner Settings class with a name attribute.
  2. Query execution — The filter_options keyword arguments are passed directly to PyMongo's collection.find(), which returns a cursor over matching MongoDB documents.
  3. Deserialization — Each raw MongoDB document (a dictionary with _id as an ObjectId) is converted into a typed Pydantic model via cls.from_mongo(instance). This method handles the _id to id field mapping.
  4. Error handling — If any exception occurs during the query or deserialization, it is logged via loguru and an empty list is returned, ensuring the pipeline does not crash on transient database errors.

Usage Example

from llm_engineering.domain.documents import ArticleDocument

# Retrieve all articles by a specific author
author_uuid = "550e8400-e29b-41d4-a716-446655440000"
articles = ArticleDocument.bulk_find(author_id=author_uuid)

print(f"Found {len(articles)} articles")
for article in articles:
    print(f"  - {article.id}: {article.content[:80]}...")

External Dependencies

Dependency Purpose
pymongo MongoDB driver; provides collection.find() for executing queries
pydantic Data validation and serialization; NoSQLBaseDocument extends Pydantic's BaseModel
loguru Structured logging; used for exception reporting on query failure

Design Notes

  • The method is a classmethod, meaning it is called on the document subclass itself (e.g., ArticleDocument.bulk_find(...)), and the returned list contains instances of that specific subclass.
  • The use of **filter_options provides a flexible, Pythonic interface that maps directly to MongoDB's query syntax without requiring callers to construct query dictionaries manually.
  • The fail-safe return of an empty list on exception is a deliberate design choice that prioritizes pipeline resilience over strict error propagation.

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment