Workflow:PrefectHQ Prefect AI Data Analyst Agent

Knowledge Sources	Prefect Prefect Docs pydantic-ai Docs
Domains	AI_Agents, Data_Analysis, LLM_Ops
Last Updated	2026-02-09 22:00 GMT

Overview

End-to-end process for building a resilient AI data analyst agent using Prefect and pydantic-ai that analyses datasets, detects anomalies, and generates structured insights with automatic LLM retry handling and durable execution semantics.

Description

This workflow demonstrates how to combine Prefect's orchestration with pydantic-ai's agent framework to create production-ready AI analysis pipelines. An AI agent is equipped with Python tools for calculating statistics, detecting anomalies, and inspecting dataset structure. The agent autonomously decides which analyses to perform based on the dataset. The PrefectAgent wrapper provides durable execution: LLM calls and tool invocations are wrapped as Prefect tasks with configurable retries, timeouts, and exponential backoff. Structured Pydantic output models ensure the AI returns consistent, validated results.

Key outputs:

A structured DataAnalysis object containing summary, key findings, recommendations, and columns analysed
Full observability of every LLM call and tool invocation in the Prefect UI
Automatic idempotency when deployed via flow.serve()

Scope:

From a prepared dataset (DataFrame) to structured AI-generated insights
Handles LLM failures, tool errors, and network issues with automatic retries

Usage

Execute this workflow when you need to programmatically analyse datasets using an LLM-powered agent and want resilience against LLM API failures. It is suitable for automated data quality reports, anomaly detection, and generating actionable insights from tabular data without writing custom analysis code for each dataset.

Execution Steps

Step 1: Prepare Dataset

Load or generate the dataset to be analysed. In production, this would involve reading from a database, file, or API. The dataset is represented as a pandas DataFrame and passed as a dependency to the AI agent.

Key considerations:

Dataset preparation is tracked as a Prefect task for observability
The DataFrame serves as the agent's runtime context (deps)
Ensure the dataset fits in memory for the agent's tool calls

Step 2: Configure AI Agent with Tools

Create the pydantic-ai Agent with registered tools (calculate_statistics, detect_anomalies, get_column_info) and a system prompt directing its analysis strategy. Configure the structured output model (DataAnalysis) with Pydantic validation constraints.

Key considerations:

Tools are standard Python functions that the AI can call autonomously
The system prompt guides the agent to start by understanding dataset structure
Output validation ensures the AI returns exactly the expected schema

Step 3: Wrap Agent for Durable Execution

Wrap the pydantic-ai Agent with PrefectAgent to enable durable execution. Configure retry policies: LLM calls get 3 retries with exponential backoff (1s, 2s, 4s) and 60-second timeout; tool calls get 2 retries with shorter backoff (0.5s, 1s).

Key considerations:

Each LLM call becomes a Prefect task with independent retry tracking
Each tool invocation becomes a separate Prefect task
Timeout prevents runaway LLM calls from blocking the pipeline

Step 4: Run Agent Analysis

Invoke the agent with an analysis prompt and the dataset. The agent autonomously calls tools, processes results, and generates structured findings. All intermediate LLM interactions and tool calls are logged as Prefect task runs.

Key considerations:

The agent determines which tools to call and in what order
Failed operations are automatically retried per the configured policies
The full decision chain is observable in the Prefect UI

Step 5: Return Structured Results

The agent returns a validated DataAnalysis Pydantic model containing a summary, 3-5 key findings, 3-5 recommendations, and the list of columns analysed. Results are displayed and can be persisted or forwarded to downstream systems.

Key considerations:

Pydantic validation ensures consistent output regardless of LLM variance
Results can be serialised to JSON for storage or API responses
When deployed, completed tasks are skipped on retry (idempotency)

Execution Diagram

GitHub URL

Workflow Repository