Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:PacktPublishing LLM Engineers Handbook NoSQLBaseDocument Save

From Leeroopedia


Aspect Detail
Type API Doc
API None, NoSQLBaseDocument.bulk_insert(cls, documents: list[T]) -> bool
Source llm_engineering/domain/base/nosql.py:L67-76 (save), llm_engineering/domain/base/nosql.py:L96-105 (bulk_insert)
Import from llm_engineering.domain.base.nosql import NoSQLBaseDocument
Implements Principle:PacktPublishing_LLM_Engineers_Handbook_Document_Persistence

Overview

The NoSQLBaseDocument base class provides the persistence layer for all domain documents in the Digital Data ETL pipeline. The save() instance method persists a single document to MongoDB, while bulk_insert() class method persists multiple documents in a single batch operation. Both methods handle serialization via to_mongo() and provide error handling with logging.

save() Method

Signature

def save(self) -> T | None:
    try:
        collection = self._get_collection()
        result = collection.insert_one(self.to_mongo())
        return self
    except WriteError:
        logger.exception("Failed to insert document.")
        return None

Parameters

Parameter Type Description
self NoSQLBaseDocument (subclass instance) The document instance to persist. Must be a fully initialized Pydantic model with all required fields populated.

Returns

Return Value Condition
self (the document instance) Successful insertion into MongoDB
None A pymongo.errors.WriteError occurred (logged via loguru)

Behavior

  1. Calls self._get_collection() to obtain the pymongo Collection object for this document type (determined by the Settings.name attribute on the subclass)
  2. Calls self.to_mongo() to serialize the Pydantic model to a MongoDB-compatible dictionary (handling UUID-to-string conversion, datetime serialization, etc.)
  3. Calls collection.insert_one() to persist the serialized document
  4. Returns self on success, allowing method chaining
  5. On WriteError (e.g., duplicate key, validation failure at the database level), logs the exception and returns None

bulk_insert() Method

Signature

@classmethod
def bulk_insert(cls, documents: list, **kwargs) -> bool:
    collection = cls._get_collection()
    try:
        collection.insert_many([doc.to_mongo() for doc in documents])
        return True
    except WriteError:
        logger.exception("Failed to insert documents.")
        return False

Parameters

Parameter Type Description
documents list[T] A list of document instances to persist in a single batch operation

Returns

Return Value Condition
True All documents were successfully inserted
False A pymongo.errors.WriteError occurred (logged via loguru)

Behavior

  1. Calls cls._get_collection() to obtain the target collection
  2. Serializes all documents via list comprehension: [doc.to_mongo() for doc in documents]
  3. Calls collection.insert_many() to insert all documents in a single database operation
  4. Returns True on success
  5. On WriteError, logs the exception and returns False

Note: insert_many is an atomic-like operation -- if any document in the batch fails, the behavior depends on the ordered parameter (default True, which stops at the first error).

Supporting Methods

_get_collection()

@classmethod
def _get_collection(cls) -> Collection:
    # Returns the pymongo Collection object for this document type
    # Uses cls.Settings.name to determine the collection name
    ...

Routes persistence to the correct MongoDB collection based on the document subclass's Settings.name attribute.

to_mongo()

def to_mongo(self) -> dict:
    # Serializes the Pydantic model to a MongoDB-compatible dict
    # Converts UUID fields to strings
    # Maps 'id' to '_id' for MongoDB's document identifier convention
    ...

Handles the translation between Pydantic's Python types and MongoDB's BSON types.

from_mongo()

@classmethod
def from_mongo(cls, data: dict) -> T:
    # Deserializes a MongoDB document dict back to a Pydantic model
    # Converts '_id' back to 'id'
    # Reconstructs UUID and datetime fields
    ...

The inverse of to_mongo(), used by find() and get_or_create().

Document Type Mapping

Document Class Collection Name Used By
UserDocument users User resolution step
ArticleDocument articles MediumCrawler, CustomArticleCrawler
PostDocument posts LinkedInCrawler
RepositoryDocument repositories GithubCrawler

Usage Examples

Single Document Save

from llm_engineering.domain.documents import ArticleDocument

article = ArticleDocument(
    platform="medium",
    link="https://medium.com/@user/article-title",
    content="Full article text content...",
    author_id=user.id,
)

saved = article.save()
if saved is None:
    logger.error("Failed to save article")

Bulk Insert

from llm_engineering.domain.documents import RepositoryDocument

repo_docs = [
    RepositoryDocument(name="file1.py", content="...", link="..."),
    RepositoryDocument(name="file2.py", content="...", link="..."),
    RepositoryDocument(name="file3.py", content="...", link="..."),
]

success = RepositoryDocument.bulk_insert(repo_docs)
if not success:
    logger.error("Failed to bulk insert repository documents")

External Dependencies

Dependency Purpose
pymongo MongoDB driver providing Collection.insert_one() and Collection.insert_many()
pydantic Base model providing data validation, serialization, and type safety
loguru Structured exception logging on persistence failures

Error Handling

Both methods catch pymongo.errors.WriteError and:

  1. Log the full exception traceback via logger.exception()
  2. Return a failure indicator (None for save(), False for bulk_insert())
  3. Do not re-raise the exception, allowing the pipeline to continue processing other documents

Common WriteError causes:

  • Duplicate _id (document already exists)
  • Document exceeds MongoDB's 16MB size limit
  • Database-level validation failures
  • Network or connection errors

Source References

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment