Implementation:Microsoft Playwright Agent Perform
| Knowledge Sources | |
|---|---|
| Domains | AI_Testing, Browser_Automation, Natural_Language_Processing |
| Last Updated | 2026-02-11 00:00 GMT |
Overview
Concrete API for instructing an AI agent to perform multi-step browser tasks described in natural language, provided by the Playwright library.
Description
The agent.perform() method accepts a natural language task string and autonomously executes a sequence of browser actions to accomplish it. The method runs an agentic loop where the LLM:
- Receives the task instruction and current page state (via accessibility snapshot)
- Selects the next browser action from 10 available tools
- Executes the action against the page
- Observes the result and decides whether to continue or report completion
The method is bounded by configurable limits: maxActions (default 10) caps the number of browser actions per call, maxActionRetries (default 3) controls retry attempts on action failures, and maxTokens bounds total token consumption.
A cacheKey option (defaulting to the task text itself) enables response caching. When running in cached mode, the agent replays stored LLM responses instead of making live API calls, enabling fast CI runs without LLM costs.
The method returns a usage summary containing the number of turns (LLM round-trips), input tokens consumed, and output tokens generated.
Usage
Use agent.perform() when:
- You need to execute a multi-step browser workflow described in plain English
- You want the agent to autonomously navigate, click, type, and interact with the page
- You are testing user journeys that involve multiple pages or complex interactions
- You want to avoid writing brittle selector-based automation code
Code Reference
Source Location
- Repository: playwright
- File:
packages/playwright-core/src/client/pageAgent.ts:L44-48(client-side proxy) - File:
packages/playwright-core/src/server/agent/pageAgent.ts:L40-57(server-side implementation)
Signature
agent.perform(
task: string,
options?: {
cacheKey?: string;
maxActions?: number; // default: 10
maxActionRetries?: number; // default: 3
maxTokens?: number;
timeout?: number;
}
): Promise<{
usage: {
turns: number;
inputTokens: number;
outputTokens: number;
};
}>
Import
// perform() is a method on PageAgent, obtained via page.agent()
import { test } from '@playwright/test';
test('example', async ({ agent }) => {
const result = await agent.perform('Click the login button');
console.log(`Used ${result.usage.turns} turns`);
});
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| task | string |
Yes | Natural language description of the browser task to perform |
| options.cacheKey | string |
No | Cache key for storing/replaying LLM responses (defaults to task text) |
| options.maxActions | number |
No | Maximum number of browser actions the agent may execute (default: 10) |
| options.maxActionRetries | number |
No | Maximum retry attempts when an action fails (default: 3) |
| options.maxTokens | number |
No | Maximum total tokens the agent may consume for this task |
| options.timeout | number |
No | Timeout in milliseconds for the entire perform() call |
Outputs
| Name | Type | Description |
|---|---|---|
| usage.turns | number |
Number of LLM round-trips (observation-decision cycles) executed |
| usage.inputTokens | number |
Total input tokens consumed across all LLM calls |
| usage.outputTokens | number |
Total output tokens generated across all LLM calls |
Available Browser Tools
The agent has access to 10 browser action tools during perform():
| Tool Name | Playwright Equivalent | Description |
|---|---|---|
| browser_navigate | page.goto(url) |
Navigate the browser to a specified URL |
| browser_snapshot | accessibility snapshot | Capture the current page state as an accessibility tree |
| browser_click | locator.click() |
Click on a page element identified by the agent |
| browser_drag | locator.drag() |
Drag an element to a target location |
| browser_hover | locator.hover() |
Hover the mouse over a page element |
| browser_select_option | locator.selectOption() |
Select an option from a dropdown menu |
| browser_press_key | keyboard.press() |
Press a keyboard key or key combination |
| browser_type | keyboard.type() |
Type text character by character |
| browser_fill_form | locator.fill() |
Fill a form input field with text |
| browser_set_checked | locator.setChecked() |
Set the checked state of a checkbox or radio button |
Usage Examples
Basic Example
import { test, expect } from '@playwright/test';
test('complete a purchase flow', async ({ agent }) => {
await agent.perform('Navigate to https://shop.example.com');
await agent.perform('Search for "laptop stand" in the search bar');
await agent.perform('Add the first product to the cart');
await agent.perform('Go to the cart and proceed to checkout');
await agent.perform('Fill in the shipping address with test data and place the order');
});
Example with Custom Limits
import { test } from '@playwright/test';
test('complex multi-page workflow', async ({ agent }) => {
const result = await agent.perform(
'Navigate to the admin panel, create a new user with email test@example.com, ' +
'assign them the "editor" role, and verify the user appears in the user list',
{
maxActions: 25,
maxActionRetries: 5,
maxTokens: 80000,
timeout: 120000,
}
);
console.log(`Completed in ${result.usage.turns} turns`);
console.log(`Tokens used: ${result.usage.inputTokens + result.usage.outputTokens}`);
});
Example with Cache Key
import { test } from '@playwright/test';
test('login flow', async ({ agent }) => {
// Using a stable cache key ensures cache hits even if task text changes slightly
await agent.perform('Log in to the application with test credentials', {
cacheKey: 'login-flow-v1',
});
});