Workflow:Microsoft Playwright AI agent driven testing

Knowledge Sources	Playwright Playwright Docs
Domains	AI_Testing, LLM_Integration, Test_Automation
Last Updated	2026-02-11 22:00 GMT

Overview

End-to-end process for writing and running AI-agent-driven browser tests that use large language models to perform page actions, verify expectations, and extract data from natural language instructions.

Description

This workflow covers Playwright's AI agent subsystem, which enables writing tests using natural language descriptions instead of explicit locator and action code. The agent uses an LLM (such as Anthropic Claude or OpenAI) to interpret high-level instructions (e.g., "add a todo item"), generate structured browser actions, execute them against the page, and verify outcomes. The system maintains conversation context across steps, captures page snapshots for the LLM to reason about, and supports three primary operations: perform (execute actions), expect (verify conditions), and extract (retrieve data).

Usage

Execute this workflow when you want to write tests as high-level business descriptions rather than detailed code, when testing complex UI flows where writing explicit selectors is impractical, or when you want to leverage AI for exploratory testing. Requires an LLM API key (e.g., Anthropic API key) configured in the environment.

Execution Steps

Step 1: Configure AI agent provider

Set up the LLM provider by configuring API credentials and model selection. The agent system supports multiple providers (Anthropic, OpenAI, etc.) and requires an API key to be available in the environment or test configuration. Configure agent options including the model name and chat API endpoint.

Key considerations:

Set the appropriate API key environment variable (e.g., ANTHROPIC_API_KEY)
Configure model selection in the Playwright config or test fixtures
The agent system communicates with the LLM via a chat API interface
Token budgets and context windows affect how much page content the agent can process

Step 2: Create test with agent fixture

Write test files that use the agent fixture provided by Playwright Test. The agent fixture wraps a browser page with AI capabilities. Structure tests using standard test() blocks but replace explicit locator/action code with agent method calls that accept natural language instructions.

Key considerations:

Import and extend test fixtures to include the agent
The agent operates on the current page context
Tests can mix agent calls with traditional Playwright API calls
Custom system prompts can guide the agent's behavior

Step 3: Define agent actions with perform

Use agent.perform() to instruct the AI to execute browser actions described in natural language. The agent takes a page snapshot, reasons about the current state, generates structured actions (click, fill, press, etc.), and executes them against the page. Multiple actions may be generated in a single perform call.

Key considerations:

Instructions should be clear and specific about the desired action
The agent captures ARIA snapshots to understand page structure
Actions are validated against Zod schemas before execution
The agent may retry or adjust actions based on page state changes

Step 4: Verify expectations with expect

Use agent.expect() to verify page conditions using natural language descriptions. The agent examines the current page state (DOM, ARIA tree, visual appearance) and determines whether the described condition is met. This replaces traditional assertion code with human-readable expectation statements.

Key considerations:

Expectations should describe observable page state
The agent evaluates truthfulness of the expectation against the live page
Failed expectations produce descriptive error messages from the LLM
Can verify complex visual or structural conditions that are hard to express with locators

Step 5: Extract data with extract

Use agent.extract() to pull structured data from the page based on natural language descriptions. The agent reads the page content and returns the requested information in the specified format. This is useful for data validation, comparing values, or feeding extracted data into subsequent test steps.

Key considerations:

Specify what data to extract and in what format
The agent parses visible page content to fulfill extraction requests
Extracted data can be used in subsequent assertions or test logic
Works well for dynamic content where selectors may be fragile

Step 6: Execute and analyze results

Run the AI-agent-driven tests using the standard Playwright Test runner. Review results including the actions the agent performed, any LLM reasoning traces, and test pass/fail outcomes. Traces capture both the agent's decision-making process and the browser state at each step.

Key considerations:

Agent tests may be slower due to LLM API latency
Token usage and API costs scale with test complexity and page size
Trace output includes agent action logs for debugging
Non-determinism from the LLM may cause occasional test flakiness

Execution Diagram

GitHub URL

Workflow Repository