Implementation:PacktPublishing LLM Engineers Handbook NoSQLBaseDocument Save
| Aspect | Detail |
|---|---|
| Type | API Doc |
| API | None, NoSQLBaseDocument.bulk_insert(cls, documents: list[T]) -> bool
|
| Source | llm_engineering/domain/base/nosql.py:L67-76 (save), llm_engineering/domain/base/nosql.py:L96-105 (bulk_insert) |
| Import | from llm_engineering.domain.base.nosql import NoSQLBaseDocument
|
| Implements | Principle:PacktPublishing_LLM_Engineers_Handbook_Document_Persistence |
Overview
The NoSQLBaseDocument base class provides the persistence layer for all domain documents in the Digital Data ETL pipeline. The save() instance method persists a single document to MongoDB, while bulk_insert() class method persists multiple documents in a single batch operation. Both methods handle serialization via to_mongo() and provide error handling with logging.
save() Method
Signature
def save(self) -> T | None:
try:
collection = self._get_collection()
result = collection.insert_one(self.to_mongo())
return self
except WriteError:
logger.exception("Failed to insert document.")
return None
Parameters
| Parameter | Type | Description |
|---|---|---|
self |
NoSQLBaseDocument (subclass instance) |
The document instance to persist. Must be a fully initialized Pydantic model with all required fields populated. |
Returns
| Return Value | Condition |
|---|---|
self (the document instance) |
Successful insertion into MongoDB |
None |
A pymongo.errors.WriteError occurred (logged via loguru)
|
Behavior
- Calls
self._get_collection()to obtain the pymongoCollectionobject for this document type (determined by theSettings.nameattribute on the subclass) - Calls
self.to_mongo()to serialize the Pydantic model to a MongoDB-compatible dictionary (handling UUID-to-string conversion, datetime serialization, etc.) - Calls
collection.insert_one()to persist the serialized document - Returns
selfon success, allowing method chaining - On
WriteError(e.g., duplicate key, validation failure at the database level), logs the exception and returnsNone
bulk_insert() Method
Signature
@classmethod
def bulk_insert(cls, documents: list, **kwargs) -> bool:
collection = cls._get_collection()
try:
collection.insert_many([doc.to_mongo() for doc in documents])
return True
except WriteError:
logger.exception("Failed to insert documents.")
return False
Parameters
| Parameter | Type | Description |
|---|---|---|
documents |
list[T] |
A list of document instances to persist in a single batch operation |
Returns
| Return Value | Condition |
|---|---|
True |
All documents were successfully inserted |
False |
A pymongo.errors.WriteError occurred (logged via loguru)
|
Behavior
- Calls
cls._get_collection()to obtain the target collection - Serializes all documents via list comprehension:
[doc.to_mongo() for doc in documents] - Calls
collection.insert_many()to insert all documents in a single database operation - Returns
Trueon success - On
WriteError, logs the exception and returnsFalse
Note: insert_many is an atomic-like operation -- if any document in the batch fails, the behavior depends on the ordered parameter (default True, which stops at the first error).
Supporting Methods
_get_collection()
@classmethod
def _get_collection(cls) -> Collection:
# Returns the pymongo Collection object for this document type
# Uses cls.Settings.name to determine the collection name
...
Routes persistence to the correct MongoDB collection based on the document subclass's Settings.name attribute.
to_mongo()
def to_mongo(self) -> dict:
# Serializes the Pydantic model to a MongoDB-compatible dict
# Converts UUID fields to strings
# Maps 'id' to '_id' for MongoDB's document identifier convention
...
Handles the translation between Pydantic's Python types and MongoDB's BSON types.
from_mongo()
@classmethod
def from_mongo(cls, data: dict) -> T:
# Deserializes a MongoDB document dict back to a Pydantic model
# Converts '_id' back to 'id'
# Reconstructs UUID and datetime fields
...
The inverse of to_mongo(), used by find() and get_or_create().
Document Type Mapping
| Document Class | Collection Name | Used By |
|---|---|---|
UserDocument |
users |
User resolution step |
ArticleDocument |
articles |
MediumCrawler, CustomArticleCrawler |
PostDocument |
posts |
LinkedInCrawler |
RepositoryDocument |
repositories |
GithubCrawler |
Usage Examples
Single Document Save
from llm_engineering.domain.documents import ArticleDocument
article = ArticleDocument(
platform="medium",
link="https://medium.com/@user/article-title",
content="Full article text content...",
author_id=user.id,
)
saved = article.save()
if saved is None:
logger.error("Failed to save article")
Bulk Insert
from llm_engineering.domain.documents import RepositoryDocument
repo_docs = [
RepositoryDocument(name="file1.py", content="...", link="..."),
RepositoryDocument(name="file2.py", content="...", link="..."),
RepositoryDocument(name="file3.py", content="...", link="..."),
]
success = RepositoryDocument.bulk_insert(repo_docs)
if not success:
logger.error("Failed to bulk insert repository documents")
External Dependencies
| Dependency | Purpose |
|---|---|
| pymongo | MongoDB driver providing Collection.insert_one() and Collection.insert_many()
|
| pydantic | Base model providing data validation, serialization, and type safety |
| loguru | Structured exception logging on persistence failures |
Error Handling
Both methods catch pymongo.errors.WriteError and:
- Log the full exception traceback via
logger.exception() - Return a failure indicator (
Noneforsave(),Falseforbulk_insert()) - Do not re-raise the exception, allowing the pipeline to continue processing other documents
Common WriteError causes:
- Duplicate
_id(document already exists) - Document exceeds MongoDB's 16MB size limit
- Database-level validation failures
- Network or connection errors
Source References
- save method: llm_engineering/domain/base/nosql.py:L67-76
- bulk_insert method: llm_engineering/domain/base/nosql.py:L96-105
- Full base module: llm_engineering/domain/base/nosql.py
See Also
- Principle:PacktPublishing_LLM_Engineers_Handbook_Document_Persistence -- the principle this implements
- Implementation:PacktPublishing_LLM_Engineers_Handbook_UserDocument_Get_Or_Create -- uses
save()internally for user creation - Implementation:PacktPublishing_LLM_Engineers_Handbook_BaseCrawler_Extract -- crawlers that call
save()andbulk_insert() - Environment:PacktPublishing_LLM_Engineers_Handbook_Docker_MongoDB_Qdrant_Infrastructure