Workflow:Ucbepic Docetl Playground Interactive Development
| Knowledge Sources | |
|---|---|
| Domains | LLM_Ops, Interactive_Development, Pipeline_IDE |
| Last Updated | 2026-02-08 03:00 GMT |
Overview
End-to-end process for iteratively building, testing, and refining DocETL pipelines using the DocWrangler interactive web playground.
Description
This workflow covers the visual, interactive approach to pipeline development through DocWrangler, DocETL's web-based IDE. Users upload or connect datasets, create operations via a visual editor with YAML support, configure prompts and schemas, run individual operations or full pipelines in real-time via WebSocket, inspect intermediate results in data tables, and export finalized pipelines as YAML for production use. The playground includes an AI chat assistant for prompt engineering, a prompt improvement dialog with iterative refinement, operation decomposition tools, and bookmarking for annotating outputs. It is built as a Next.js 14 application backed by a FastAPI server.
Usage
Execute this workflow when you are developing a new pipeline and want to iterate rapidly on prompts, schemas, and operation configurations with immediate visual feedback. The playground is ideal for exploratory data analysis, prompt engineering, debugging operation outputs, and building pipelines collaboratively. It is recommended as the starting point for new DocETL users before transitioning to YAML or Python API for production deployment.
Execution Steps
Step 1: Set Up and Launch Playground
Deploy DocWrangler either via Docker (make docker) or manual setup (clone repo, install dependencies, run make run-ui-dev). Configure environment variables: the root .env file for the backend pipeline execution engine and the website/.env.local file for frontend features (AI assistant model, backend connection). Navigate to the playground URL (typically localhost:3000/playground).
Key considerations:
- Docker setup is the quickest path; manual setup is needed for development
- Two separate .env files control backend (pipeline execution) and frontend (UI features) respectively
- The backend FastAPI server handles pipeline execution and file management
- Set a namespace to isolate your pipeline state from other users
Step 2: Upload Dataset and Create Operations
Upload a JSON dataset file through the file explorer panel. Create pipeline operations by adding operation cards in the visual editor. For each operation, select the type (map, reduce, filter, resolve, unnest, split, gather, etc.), write the Jinja2 prompt template, and define the output schema. The editor provides syntax highlighting, operation-specific help documentation, and auto-completion.
Key considerations:
- Datasets can be uploaded from local files or fetched from URLs
- The natural language pipeline dialog allows describing your task in plain English to auto-generate operations
- Each operation card shows its type, prompt, schema, and configuration options
- Tutorial pipelines are available as starting templates
Step 3: Run and Inspect Operations
Execute individual operations or the complete pipeline via the run button. The system streams execution progress through WebSocket connections, displaying real-time logs and status. After execution, inspect results in the output panel's resizable data table with column sorting, search, and pagination. Use the column dialog to examine individual values in detail.
Key considerations:
- Running individual operations helps isolate and debug issues
- The output table shows all fields including those added by the operation
- Cost and timing information is displayed after each run
- Intermediate results are cached for fast re-execution
Step 4: Iterate on Prompts and Configuration
Refine prompts using the AI-powered prompt improvement dialog, which analyzes sample outputs and suggests enhancements. Use the AI chat panel for conversational guidance on prompt engineering. Enable gleaning, adjust sampling, or modify schemas based on output quality. The operation decomposition tool can split complex operations into simpler sequential steps.
Key considerations:
- The prompt improvement dialog uses iterative feedback loops with sample data
- Bookmarks allow annotating specific output rows for reference during iteration
- Decomposition comparison shows before/after results for split operations
- The should-optimize check evaluates whether an operation would benefit from optimization
Step 5: Export Pipeline for Production
Once the pipeline produces satisfactory results, export the configuration as a YAML file using the pipeline settings. This YAML file can be run via the CLI (docetl run) or loaded into the Python API for production deployment. The exported configuration includes all operations, prompts, schemas, and pipeline structure.
Key considerations:
- The exported YAML is a complete, standalone pipeline configuration
- Save pipeline state to localStorage for resuming development sessions
- Pipeline configurations can also be restored from previously exported YAML files