Implementation:PacktPublishing LLM Engineers Handbook UserDocument Get Or Create
| Aspect | Detail |
|---|---|
| Type | API Doc |
| API | UserDocument.get_or_create(first_name: str, last_name: str) -> UserDocument
|
| Source | llm_engineering/domain/base/nosql.py:L79-93 (base), steps/etl/get_or_create_user.py:L7-33 (step) |
| Import | from llm_engineering.domain.documents import UserDocument
|
| Implements | Principle:PacktPublishing_LLM_Engineers_Handbook_User_Resolution |
Overview
The UserDocument.get_or_create method resolves a user identity by searching for an existing UserDocument in MongoDB by first and last name, or creating and persisting a new one if no match is found. This is the concrete implementation of the User Resolution principle within the Digital Data ETL pipeline.
Signature
@classmethod
def get_or_create(cls, **filter_options) -> T | None:
# In NoSQLBaseDocument base class
instance = cls.find(**filter_options)
if instance:
return instance
new_instance = cls(**filter_options)
new_instance.save()
return new_instance
When called on UserDocument, the typical invocation is:
user = UserDocument.get_or_create(first_name="Paul", last_name="Iusztin")
Pipeline Step Wrapper
The ETL pipeline wraps this call in a ZenML step:
from zenml import step
from llm_engineering.domain.documents import UserDocument
@step
def get_or_create_user(user_full_name: str) -> UserDocument:
"""ZenML step that resolves user identity from a full name string."""
first_name, last_name = split_user_full_name(user_full_name)
user = UserDocument.get_or_create(first_name=first_name, last_name=last_name)
return user
def split_user_full_name(user_full_name: str) -> tuple[str, str]:
"""Splits a full name into (first_name, last_name)."""
name_tokens = user_full_name.split(" ")
first_name = name_tokens[0]
last_name = name_tokens[-1]
return first_name, last_name
Inputs
| Parameter | Type | Description |
|---|---|---|
first_name |
str |
The user's first name, extracted from the full name string via split_user_full_name()
|
last_name |
str |
The user's last name, extracted from the full name string via split_user_full_name()
|
At the pipeline step level, the input is:
| Parameter | Type | Description |
|---|---|---|
user_full_name |
str |
Full name string (e.g., "Paul Iusztin"), split internally into first and last name
|
Outputs
| Field | Type | Description |
|---|---|---|
id |
UUID4 |
Unique identifier for the user document (auto-generated by Pydantic) |
first_name |
str |
The resolved first name |
last_name |
str |
The resolved last name |
The UserDocument instance is persisted to the MongoDB users collection. If the user already existed, the existing document is returned without modification.
Behavior
- The
split_user_full_name()helper splits the input string on whitespace, taking the first token asfirst_nameand the last token aslast_name UserDocument.get_or_create(first_name=..., last_name=...)is called on the base class method inherited fromNoSQLBaseDocument- Internally,
cls.find(**filter_options)queries MongoDB for a matching document - If a match is found, it is returned directly (no write operation)
- If no match is found, a new
UserDocumentis instantiated with the provided fields and.save()is called to persist it to MongoDB - The resulting UserDocument is returned as the ZenML step output for downstream consumption
External Dependencies
| Dependency | Purpose |
|---|---|
| pymongo | MongoDB driver for database operations (find, insert) |
| pydantic | Data validation and serialization for document models |
| loguru | Structured logging for error reporting |
| zenml | Pipeline orchestration (step decorator and context) |
Source References
- Base class method: llm_engineering/domain/base/nosql.py:L79-93
- Pipeline step: steps/etl/get_or_create_user.py:L7-33
- Domain model: llm_engineering/domain/documents.py
See Also
- Principle:PacktPublishing_LLM_Engineers_Handbook_User_Resolution -- the principle this implements
- Implementation:PacktPublishing_LLM_Engineers_Handbook_NoSQLBaseDocument_Save -- the underlying save mechanism
- Environment:PacktPublishing_LLM_Engineers_Handbook_Docker_MongoDB_Qdrant_Infrastructure