Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Microsoft Playwright Agent Perform

From Leeroopedia
Revision as of 11:35, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Microsoft_Playwright_Agent_Perform.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains AI_Testing, Browser_Automation, Natural_Language_Processing
Last Updated 2026-02-11 00:00 GMT

Overview

Concrete API for instructing an AI agent to perform multi-step browser tasks described in natural language, provided by the Playwright library.

Description

The agent.perform() method accepts a natural language task string and autonomously executes a sequence of browser actions to accomplish it. The method runs an agentic loop where the LLM:

  1. Receives the task instruction and current page state (via accessibility snapshot)
  2. Selects the next browser action from 10 available tools
  3. Executes the action against the page
  4. Observes the result and decides whether to continue or report completion

The method is bounded by configurable limits: maxActions (default 10) caps the number of browser actions per call, maxActionRetries (default 3) controls retry attempts on action failures, and maxTokens bounds total token consumption.

A cacheKey option (defaulting to the task text itself) enables response caching. When running in cached mode, the agent replays stored LLM responses instead of making live API calls, enabling fast CI runs without LLM costs.

The method returns a usage summary containing the number of turns (LLM round-trips), input tokens consumed, and output tokens generated.

Usage

Use agent.perform() when:

  • You need to execute a multi-step browser workflow described in plain English
  • You want the agent to autonomously navigate, click, type, and interact with the page
  • You are testing user journeys that involve multiple pages or complex interactions
  • You want to avoid writing brittle selector-based automation code

Code Reference

Source Location

  • Repository: playwright
  • File: packages/playwright-core/src/client/pageAgent.ts:L44-48 (client-side proxy)
  • File: packages/playwright-core/src/server/agent/pageAgent.ts:L40-57 (server-side implementation)

Signature

agent.perform(
  task: string,
  options?: {
    cacheKey?: string;
    maxActions?: number;       // default: 10
    maxActionRetries?: number; // default: 3
    maxTokens?: number;
    timeout?: number;
  }
): Promise<{
  usage: {
    turns: number;
    inputTokens: number;
    outputTokens: number;
  };
}>

Import

// perform() is a method on PageAgent, obtained via page.agent()
import { test } from '@playwright/test';

test('example', async ({ agent }) => {
  const result = await agent.perform('Click the login button');
  console.log(`Used ${result.usage.turns} turns`);
});

I/O Contract

Inputs

Name Type Required Description
task string Yes Natural language description of the browser task to perform
options.cacheKey string No Cache key for storing/replaying LLM responses (defaults to task text)
options.maxActions number No Maximum number of browser actions the agent may execute (default: 10)
options.maxActionRetries number No Maximum retry attempts when an action fails (default: 3)
options.maxTokens number No Maximum total tokens the agent may consume for this task
options.timeout number No Timeout in milliseconds for the entire perform() call

Outputs

Name Type Description
usage.turns number Number of LLM round-trips (observation-decision cycles) executed
usage.inputTokens number Total input tokens consumed across all LLM calls
usage.outputTokens number Total output tokens generated across all LLM calls

Available Browser Tools

The agent has access to 10 browser action tools during perform():

Tool Name Playwright Equivalent Description
browser_navigate page.goto(url) Navigate the browser to a specified URL
browser_snapshot accessibility snapshot Capture the current page state as an accessibility tree
browser_click locator.click() Click on a page element identified by the agent
browser_drag locator.drag() Drag an element to a target location
browser_hover locator.hover() Hover the mouse over a page element
browser_select_option locator.selectOption() Select an option from a dropdown menu
browser_press_key keyboard.press() Press a keyboard key or key combination
browser_type keyboard.type() Type text character by character
browser_fill_form locator.fill() Fill a form input field with text
browser_set_checked locator.setChecked() Set the checked state of a checkbox or radio button

Usage Examples

Basic Example

import { test, expect } from '@playwright/test';

test('complete a purchase flow', async ({ agent }) => {
  await agent.perform('Navigate to https://shop.example.com');
  await agent.perform('Search for "laptop stand" in the search bar');
  await agent.perform('Add the first product to the cart');
  await agent.perform('Go to the cart and proceed to checkout');
  await agent.perform('Fill in the shipping address with test data and place the order');
});

Example with Custom Limits

import { test } from '@playwright/test';

test('complex multi-page workflow', async ({ agent }) => {
  const result = await agent.perform(
    'Navigate to the admin panel, create a new user with email test@example.com, ' +
    'assign them the "editor" role, and verify the user appears in the user list',
    {
      maxActions: 25,
      maxActionRetries: 5,
      maxTokens: 80000,
      timeout: 120000,
    }
  );

  console.log(`Completed in ${result.usage.turns} turns`);
  console.log(`Tokens used: ${result.usage.inputTokens + result.usage.outputTokens}`);
});

Example with Cache Key

import { test } from '@playwright/test';

test('login flow', async ({ agent }) => {
  // Using a stable cache key ensures cache hits even if task text changes slightly
  await agent.perform('Log in to the application with test credentials', {
    cacheKey: 'login-flow-v1',
  });
});

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment