Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Spotify Luigi NoSQL Data Targets

From Leeroopedia
Revision as of 17:36, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Spotify_Luigi_NoSQL_Data_Targets.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Database, NoSQL
Last Updated 2026-02-10 08:00 GMT

Overview

Using NoSQL databases as pipeline data targets and completion markers for flexible schema data processing.

Description

NoSQL data targets represent the practice of integrating document stores, key-value databases, and other non-relational storage systems as first-class output targets within a data pipeline. Traditional pipeline targets are often files on a filesystem, but many modern data workflows need to write results into NoSQL databases such as MongoDB, Cassandra, DynamoDB, or Redis for consumption by applications that require low-latency access, flexible schemas, or horizontal scalability. Treating NoSQL databases as pipeline targets means the pipeline can check whether a task's output already exists in the database (completion checking), write structured documents or records as output, and provide downstream tasks with a reference to the stored data. This integrates NoSQL storage into the pipeline's dependency resolution and idempotency guarantees.

Usage

Use NoSQL data targets when pipeline outputs need to be consumed by applications that read from document stores or key-value databases, when the output data has a flexible or evolving schema that benefits from schemaless storage, or when the pipeline needs to write into NoSQL systems that serve as the operational data layer for web applications or microservices.

Theoretical Basis

NoSQL data targets adapt the target abstraction pattern to non-relational storage systems:

1. Target Identification -- A NoSQL target is identified by a composite key that specifies where the data resides:
   target_id = (connection, database, collection, document_key)
   Unlike file targets identified by a single path, NoSQL targets require multiple coordinates to locate data.
2. Existence Check -- The completion check queries the database for the presence of a specific document or marker:
   IF document with key K exists in collection C THEN task is complete
   This may check for a specific completion marker document or for the presence of expected data records.
3. Write Semantics -- Writing to a NoSQL target involves inserting or upserting documents into the database. Key considerations include:
   * Atomicity -- Individual document writes are typically atomic in document stores, but multi-document writes may not be. The pipeline must handle partial writes.
   * Idempotency -- Writes should use upsert semantics (insert or update) keyed on a deterministic identifier, so that re-running a task produces the same result without duplicating data.
   * Bulk Operations -- For performance, writes are batched into bulk operations rather than individual inserts.
4. Schema Flexibility -- Unlike relational targets, NoSQL targets do not enforce a fixed schema. The pipeline task defines the document structure, and different runs may write documents with evolving structures. This flexibility is both an advantage (adaptability) and a risk (schema drift) that must be managed through validation.
5. Read Access -- Downstream tasks or applications read from the NoSQL target using the database's query interface. The pipeline provides connection information and collection/key references.
6. Connection Management -- Database connections are established using connection strings or client configurations, with connection pooling for efficiency. Authentication credentials are managed securely outside the pipeline definition.

The fundamental design principle is storage-agnostic completion semantics: the pipeline's dependency resolution works the same way whether a target is a file, a database record, or a NoSQL document.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment