Implementation:Microsoft Playwright Agent Perform

Knowledge Sources	Playwright Playwright AI Testing
Domains	AI_Testing, Browser_Automation, Natural_Language_Processing
Last Updated	2026-02-11 00:00 GMT

Overview

Concrete API for instructing an AI agent to perform multi-step browser tasks described in natural language, provided by the Playwright library.

Description

The agent.perform() method accepts a natural language task string and autonomously executes a sequence of browser actions to accomplish it. The method runs an agentic loop where the LLM:

Receives the task instruction and current page state (via accessibility snapshot)
Selects the next browser action from 10 available tools
Executes the action against the page
Observes the result and decides whether to continue or report completion

The method is bounded by configurable limits: maxActions (default 10) caps the number of browser actions per call, maxActionRetries (default 3) controls retry attempts on action failures, and maxTokens bounds total token consumption.

A cacheKey option (defaulting to the task text itself) enables response caching. When running in cached mode, the agent replays stored LLM responses instead of making live API calls, enabling fast CI runs without LLM costs.

The method returns a usage summary containing the number of turns (LLM round-trips), input tokens consumed, and output tokens generated.

Usage

Use agent.perform() when:

You need to execute a multi-step browser workflow described in plain English
You want the agent to autonomously navigate, click, type, and interact with the page
You are testing user journeys that involve multiple pages or complex interactions
You want to avoid writing brittle selector-based automation code

Code Reference

Source Location

Repository: playwright
File: packages/playwright-core/src/client/pageAgent.ts:L44-48 (client-side proxy)
File: packages/playwright-core/src/server/agent/pageAgent.ts:L40-57 (server-side implementation)

Signature

agent.perform(
  task: string,
  options?: {
    cacheKey?: string;
    maxActions?: number;       // default: 10
    maxActionRetries?: number; // default: 3
    maxTokens?: number;
    timeout?: number;
  }
): Promise<{
  usage: {
    turns: number;
    inputTokens: number;
    outputTokens: number;
  };
}>

Import

// perform() is a method on PageAgent, obtained via page.agent()
import { test } from '@playwright/test';

test('example', async ({ agent }) => {
  const result = await agent.perform('Click the login button');
  console.log(`Used ${result.usage.turns} turns`);
});

I/O Contract

Inputs

Name	Type	Required	Description
task	`string`	Yes	Natural language description of the browser task to perform
options.cacheKey	`string`	No	Cache key for storing/replaying LLM responses (defaults to task text)
options.maxActions	`number`	No	Maximum number of browser actions the agent may execute (default: 10)
options.maxActionRetries	`number`	No	Maximum retry attempts when an action fails (default: 3)
options.maxTokens	`number`	No	Maximum total tokens the agent may consume for this task
options.timeout	`number`	No	Timeout in milliseconds for the entire perform() call

Outputs

Name	Type	Description
usage.turns	`number`	Number of LLM round-trips (observation-decision cycles) executed
usage.inputTokens	`number`	Total input tokens consumed across all LLM calls
usage.outputTokens	`number`	Total output tokens generated across all LLM calls

Available Browser Tools

The agent has access to 10 browser action tools during perform():

Tool Name	Playwright Equivalent	Description
browser_navigate	`page.goto(url)`	Navigate the browser to a specified URL
browser_snapshot	accessibility snapshot	Capture the current page state as an accessibility tree
browser_click	`locator.click()`	Click on a page element identified by the agent
browser_drag	`locator.drag()`	Drag an element to a target location
browser_hover	`locator.hover()`	Hover the mouse over a page element
browser_select_option	`locator.selectOption()`	Select an option from a dropdown menu
browser_press_key	`keyboard.press()`	Press a keyboard key or key combination
browser_type	`keyboard.type()`	Type text character by character
browser_fill_form	`locator.fill()`	Fill a form input field with text
browser_set_checked	`locator.setChecked()`	Set the checked state of a checkbox or radio button

Usage Examples

Basic Example

import { test, expect } from '@playwright/test';

test('complete a purchase flow', async ({ agent }) => {
  await agent.perform('Navigate to https://shop.example.com');
  await agent.perform('Search for "laptop stand" in the search bar');
  await agent.perform('Add the first product to the cart');
  await agent.perform('Go to the cart and proceed to checkout');
  await agent.perform('Fill in the shipping address with test data and place the order');
});

Example with Custom Limits

import { test } from '@playwright/test';

test('complex multi-page workflow', async ({ agent }) => {
  const result = await agent.perform(
    'Navigate to the admin panel, create a new user with email test@example.com, ' +
    'assign them the "editor" role, and verify the user appears in the user list',
    {
      maxActions: 25,
      maxActionRetries: 5,
      maxTokens: 80000,
      timeout: 120000,
    }
  );

  console.log(`Completed in ${result.usage.turns} turns`);
  console.log(`Tokens used: ${result.usage.inputTokens + result.usage.outputTokens}`);
});

Example with Cache Key

import { test } from '@playwright/test';

test('login flow', async ({ agent }) => {
  // Using a stable cache key ensures cache hits even if task text changes slightly
  await agent.perform('Log in to the application with test credentials', {
    cacheKey: 'login-flow-v1',
  });
});

Related Pages

Implements Principle

Principle:Microsoft_Playwright_Define_Agent_Actions_with_Perform

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment