Workflow:PrefectHQ Prefect AI Data Analyst Agent
| Knowledge Sources | |
|---|---|
| Domains | AI_Agents, Data_Analysis, LLM_Ops |
| Last Updated | 2026-02-09 22:00 GMT |
Overview
End-to-end process for building a resilient AI data analyst agent using Prefect and pydantic-ai that analyses datasets, detects anomalies, and generates structured insights with automatic LLM retry handling and durable execution semantics.
Description
This workflow demonstrates how to combine Prefect's orchestration with pydantic-ai's agent framework to create production-ready AI analysis pipelines. An AI agent is equipped with Python tools for calculating statistics, detecting anomalies, and inspecting dataset structure. The agent autonomously decides which analyses to perform based on the dataset. The PrefectAgent wrapper provides durable execution: LLM calls and tool invocations are wrapped as Prefect tasks with configurable retries, timeouts, and exponential backoff. Structured Pydantic output models ensure the AI returns consistent, validated results.
Key outputs:
- A structured DataAnalysis object containing summary, key findings, recommendations, and columns analysed
- Full observability of every LLM call and tool invocation in the Prefect UI
- Automatic idempotency when deployed via flow.serve()
Scope:
- From a prepared dataset (DataFrame) to structured AI-generated insights
- Handles LLM failures, tool errors, and network issues with automatic retries
Usage
Execute this workflow when you need to programmatically analyse datasets using an LLM-powered agent and want resilience against LLM API failures. It is suitable for automated data quality reports, anomaly detection, and generating actionable insights from tabular data without writing custom analysis code for each dataset.
Execution Steps
Step 1: Prepare Dataset
Load or generate the dataset to be analysed. In production, this would involve reading from a database, file, or API. The dataset is represented as a pandas DataFrame and passed as a dependency to the AI agent.
Key considerations:
- Dataset preparation is tracked as a Prefect task for observability
- The DataFrame serves as the agent's runtime context (deps)
- Ensure the dataset fits in memory for the agent's tool calls
Step 2: Configure AI Agent with Tools
Create the pydantic-ai Agent with registered tools (calculate_statistics, detect_anomalies, get_column_info) and a system prompt directing its analysis strategy. Configure the structured output model (DataAnalysis) with Pydantic validation constraints.
Key considerations:
- Tools are standard Python functions that the AI can call autonomously
- The system prompt guides the agent to start by understanding dataset structure
- Output validation ensures the AI returns exactly the expected schema
Step 3: Wrap Agent for Durable Execution
Wrap the pydantic-ai Agent with PrefectAgent to enable durable execution. Configure retry policies: LLM calls get 3 retries with exponential backoff (1s, 2s, 4s) and 60-second timeout; tool calls get 2 retries with shorter backoff (0.5s, 1s).
Key considerations:
- Each LLM call becomes a Prefect task with independent retry tracking
- Each tool invocation becomes a separate Prefect task
- Timeout prevents runaway LLM calls from blocking the pipeline
Step 4: Run Agent Analysis
Invoke the agent with an analysis prompt and the dataset. The agent autonomously calls tools, processes results, and generates structured findings. All intermediate LLM interactions and tool calls are logged as Prefect task runs.
Key considerations:
- The agent determines which tools to call and in what order
- Failed operations are automatically retried per the configured policies
- The full decision chain is observable in the Prefect UI
Step 5: Return Structured Results
The agent returns a validated DataAnalysis Pydantic model containing a summary, 3-5 key findings, 3-5 recommendations, and the list of columns analysed. Results are displayed and can be persisted or forwarded to downstream systems.
Key considerations:
- Pydantic validation ensures consistent output regardless of LLM variance
- Results can be serialised to JSON for storage or API responses
- When deployed, completed tasks are skipped on retry (idempotency)