Principle:Ucbepic Docetl Output Management
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Observability |
| Last Updated | 2026-02-08 01:40 GMT |
Overview
A persistence and observability principle that manages saving pipeline results, checkpointing intermediate outputs, and tracking execution costs through console logging.
Description
Output Management covers the final phase of pipeline execution: persisting results to disk, maintaining intermediate checkpoints for fault tolerance, and providing visibility into execution progress and costs. In DocETL, this includes:
- Result Persistence: Writing final output as JSON or CSV files
- Intermediate Checkpointing: Saving per-operation results to enable resumption after failures
- Console Logging: Thread-safe console output tracking costs, operation progress, and execution summaries
- Cost Tracking: Aggregating LLM API costs across all operations
Usage
This principle applies whenever a pipeline produces output that needs to be saved, or when operators require visibility into execution progress. It is especially important for long-running pipelines where intermediate checkpointing prevents loss of work.
Theoretical Basis
Output management follows a layered persistence strategy:
- Checkpointing: Save intermediate results after each operation completes
- Final Output: Write complete pipeline results to the configured output path
- Cost Aggregation: Sum per-operation LLM costs into a total
- Logging: Provide real-time execution feedback via console output