Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:PacktPublishing LLM Engineers Handbook UserDocument Get Or Create

From Leeroopedia


Aspect Detail
Type API Doc
API UserDocument.get_or_create(first_name: str, last_name: str) -> UserDocument
Source llm_engineering/domain/base/nosql.py:L79-93 (base), steps/etl/get_or_create_user.py:L7-33 (step)
Import from llm_engineering.domain.documents import UserDocument
Implements Principle:PacktPublishing_LLM_Engineers_Handbook_User_Resolution

Overview

The UserDocument.get_or_create method resolves a user identity by searching for an existing UserDocument in MongoDB by first and last name, or creating and persisting a new one if no match is found. This is the concrete implementation of the User Resolution principle within the Digital Data ETL pipeline.

Signature

@classmethod
def get_or_create(cls, **filter_options) -> T | None:
    # In NoSQLBaseDocument base class
    instance = cls.find(**filter_options)
    if instance:
        return instance
    new_instance = cls(**filter_options)
    new_instance.save()
    return new_instance

When called on UserDocument, the typical invocation is:

user = UserDocument.get_or_create(first_name="Paul", last_name="Iusztin")

Pipeline Step Wrapper

The ETL pipeline wraps this call in a ZenML step:

from zenml import step

from llm_engineering.domain.documents import UserDocument


@step
def get_or_create_user(user_full_name: str) -> UserDocument:
    """ZenML step that resolves user identity from a full name string."""

    first_name, last_name = split_user_full_name(user_full_name)
    user = UserDocument.get_or_create(first_name=first_name, last_name=last_name)

    return user


def split_user_full_name(user_full_name: str) -> tuple[str, str]:
    """Splits a full name into (first_name, last_name)."""

    name_tokens = user_full_name.split(" ")
    first_name = name_tokens[0]
    last_name = name_tokens[-1]

    return first_name, last_name

Inputs

Parameter Type Description
first_name str The user's first name, extracted from the full name string via split_user_full_name()
last_name str The user's last name, extracted from the full name string via split_user_full_name()

At the pipeline step level, the input is:

Parameter Type Description
user_full_name str Full name string (e.g., "Paul Iusztin"), split internally into first and last name

Outputs

Field Type Description
id UUID4 Unique identifier for the user document (auto-generated by Pydantic)
first_name str The resolved first name
last_name str The resolved last name

The UserDocument instance is persisted to the MongoDB users collection. If the user already existed, the existing document is returned without modification.

Behavior

  1. The split_user_full_name() helper splits the input string on whitespace, taking the first token as first_name and the last token as last_name
  2. UserDocument.get_or_create(first_name=..., last_name=...) is called on the base class method inherited from NoSQLBaseDocument
  3. Internally, cls.find(**filter_options) queries MongoDB for a matching document
  4. If a match is found, it is returned directly (no write operation)
  5. If no match is found, a new UserDocument is instantiated with the provided fields and .save() is called to persist it to MongoDB
  6. The resulting UserDocument is returned as the ZenML step output for downstream consumption

External Dependencies

Dependency Purpose
pymongo MongoDB driver for database operations (find, insert)
pydantic Data validation and serialization for document models
loguru Structured logging for error reporting
zenml Pipeline orchestration (step decorator and context)

Source References

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment