Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Ucbepic Docetl Playground Interactive Development

From Leeroopedia


Knowledge Sources
Domains LLM_Ops, Interactive_Development, Pipeline_IDE
Last Updated 2026-02-08 03:00 GMT

Overview

End-to-end process for iteratively building, testing, and refining DocETL pipelines using the DocWrangler interactive web playground.

Description

This workflow covers the visual, interactive approach to pipeline development through DocWrangler, DocETL's web-based IDE. Users upload or connect datasets, create operations via a visual editor with YAML support, configure prompts and schemas, run individual operations or full pipelines in real-time via WebSocket, inspect intermediate results in data tables, and export finalized pipelines as YAML for production use. The playground includes an AI chat assistant for prompt engineering, a prompt improvement dialog with iterative refinement, operation decomposition tools, and bookmarking for annotating outputs. It is built as a Next.js 14 application backed by a FastAPI server.

Usage

Execute this workflow when you are developing a new pipeline and want to iterate rapidly on prompts, schemas, and operation configurations with immediate visual feedback. The playground is ideal for exploratory data analysis, prompt engineering, debugging operation outputs, and building pipelines collaboratively. It is recommended as the starting point for new DocETL users before transitioning to YAML or Python API for production deployment.

Execution Steps

Step 1: Set Up and Launch Playground

Deploy DocWrangler either via Docker (make docker) or manual setup (clone repo, install dependencies, run make run-ui-dev). Configure environment variables: the root .env file for the backend pipeline execution engine and the website/.env.local file for frontend features (AI assistant model, backend connection). Navigate to the playground URL (typically localhost:3000/playground).

Key considerations:

  • Docker setup is the quickest path; manual setup is needed for development
  • Two separate .env files control backend (pipeline execution) and frontend (UI features) respectively
  • The backend FastAPI server handles pipeline execution and file management
  • Set a namespace to isolate your pipeline state from other users

Step 2: Upload Dataset and Create Operations

Upload a JSON dataset file through the file explorer panel. Create pipeline operations by adding operation cards in the visual editor. For each operation, select the type (map, reduce, filter, resolve, unnest, split, gather, etc.), write the Jinja2 prompt template, and define the output schema. The editor provides syntax highlighting, operation-specific help documentation, and auto-completion.

Key considerations:

  • Datasets can be uploaded from local files or fetched from URLs
  • The natural language pipeline dialog allows describing your task in plain English to auto-generate operations
  • Each operation card shows its type, prompt, schema, and configuration options
  • Tutorial pipelines are available as starting templates

Step 3: Run and Inspect Operations

Execute individual operations or the complete pipeline via the run button. The system streams execution progress through WebSocket connections, displaying real-time logs and status. After execution, inspect results in the output panel's resizable data table with column sorting, search, and pagination. Use the column dialog to examine individual values in detail.

Key considerations:

  • Running individual operations helps isolate and debug issues
  • The output table shows all fields including those added by the operation
  • Cost and timing information is displayed after each run
  • Intermediate results are cached for fast re-execution

Step 4: Iterate on Prompts and Configuration

Refine prompts using the AI-powered prompt improvement dialog, which analyzes sample outputs and suggests enhancements. Use the AI chat panel for conversational guidance on prompt engineering. Enable gleaning, adjust sampling, or modify schemas based on output quality. The operation decomposition tool can split complex operations into simpler sequential steps.

Key considerations:

  • The prompt improvement dialog uses iterative feedback loops with sample data
  • Bookmarks allow annotating specific output rows for reference during iteration
  • Decomposition comparison shows before/after results for split operations
  • The should-optimize check evaluates whether an operation would benefit from optimization

Step 5: Export Pipeline for Production

Once the pipeline produces satisfactory results, export the configuration as a YAML file using the pipeline settings. This YAML file can be run via the CLI (docetl run) or loaded into the Python API for production deployment. The exported configuration includes all operations, prompts, schemas, and pipeline structure.

Key considerations:

  • The exported YAML is a complete, standalone pipeline configuration
  • Save pipeline state to localStorage for resuming development sessions
  • Pipeline configurations can also be restored from previously exported YAML files

Execution Diagram

GitHub URL

Workflow Repository