Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Microsoft Playwright Page Agent

From Leeroopedia
Revision as of 11:37, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Microsoft_Playwright_Page_Agent.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains AI_Testing, Browser_Automation, LLM_Configuration
Last Updated 2026-02-11 00:00 GMT

Overview

Concrete API for creating an AI-powered browser automation agent from a Playwright Page instance, provided by the Playwright library.

Description

The page.agent() method is the primary entry point for AI-agent-driven testing in Playwright. It creates a PageAgent instance that bridges a browser page with an LLM provider, enabling natural language control of the browser. The method accepts configuration for the LLM provider (API type, credentials, model), operational limits (token budgets, action caps, retry policies), an optional response cache, secrets for sensitive data, and a system prompt override.

Internally, page.agent() constructs a client-side PageAgent proxy that communicates with a PageAgentDispatcher on the server side. The server-side dispatcher manages the agentic loop, tool execution, and communication with the LLM provider. This client-server architecture ensures that the agent can operate across Playwright's process boundary (e.g., when using remote browsers).

The method resolves provider configuration by merging defaults with the supplied options. If no provider is specified, it falls back to environment variables (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY) to auto-detect the provider.

Usage

Use page.agent() when:

  • You want to create an AI agent that can autonomously interact with a web page
  • You need to configure a specific LLM provider and model for browser automation
  • You are building custom test utilities outside the standard test fixture system
  • You require fine-grained control over agent limits, caching, or system prompts

Code Reference

Source Location

  • Repository: playwright
  • File: packages/playwright-core/src/client/page.ts:L836-857
  • File: packages/playwright/src/index.ts:L458-497

Signature

page.agent(options?: {
  provider?: {
    api: "openai" | "openai-compatible" | "anthropic" | "google";
    apiKey: string;
    model: string;
    apiEndpoint?: string;
    apiTimeout?: number;
  };
  cache?: AgentCache;
  limits?: {
    maxTokens?: number;
    maxActions?: number;       // default: 10
    maxActionRetries?: number; // default: 3
  };
  secrets?: { name: string; value: string }[];
  systemPrompt?: string;
}): Promise<PageAgent>

Import

// PageAgent is obtained from a Page instance, not imported directly
import { chromium } from 'playwright';

const browser = await chromium.launch();
const page = await browser.newPage();
const agent = await page.agent({
  provider: {
    api: 'openai',
    apiKey: process.env.OPENAI_API_KEY,
    model: 'gpt-4o',
  },
});

I/O Contract

Inputs

Name Type Required Description
provider.api "openai-compatible" | "anthropic" | "google" Yes (or via env) The LLM provider API type to use
provider.apiKey string Yes (or via env) API key for authenticating with the LLM provider
provider.model string Yes (or via env) The specific model identifier (e.g., "gpt-4o", "claude-sonnet-4-20250514")
provider.apiEndpoint string No Custom API endpoint URL for self-hosted or proxy deployments
provider.apiTimeout number No Request timeout in milliseconds for LLM API calls
cache AgentCache No Cache instance for storing and replaying LLM responses
limits.maxTokens number No Maximum total tokens the agent may consume
limits.maxActions number No Maximum browser actions per perform() call (default: 10)
limits.maxActionRetries number No Maximum retries on action failure (default: 3)
secrets { name: string; value: string }[] No Sensitive values (e.g., passwords) passed securely to the agent
systemPrompt string No Custom system prompt to override the default agent instructions

Outputs

Name Type Description
PageAgent PageAgent A client-side proxy to the server-side PageAgentDispatcher, exposing perform(), expect(), and extract() methods for AI-driven browser interaction

Usage Examples

Basic Example

import { chromium } from 'playwright';

const browser = await chromium.launch({ headless: true });
const page = await browser.newPage();

// Create an agent with OpenAI provider
const agent = await page.agent({
  provider: {
    api: 'openai',
    apiKey: process.env.OPENAI_API_KEY,
    model: 'gpt-4o',
  },
  limits: {
    maxActions: 15,
    maxTokens: 50000,
  },
});

await agent.perform('Navigate to https://example.com and click the "More information" link');
await agent.expect('The page contains information about IANA');

await browser.close();

Advanced Example with Anthropic Provider

import { chromium } from 'playwright';

const browser = await chromium.launch();
const page = await browser.newPage();

const agent = await page.agent({
  provider: {
    api: 'anthropic',
    apiKey: process.env.ANTHROPIC_API_KEY,
    model: 'claude-sonnet-4-20250514',
  },
  limits: {
    maxActions: 20,
    maxActionRetries: 5,
    maxTokens: 100000,
  },
  secrets: [
    { name: 'TEST_PASSWORD', value: process.env.TEST_PASSWORD },
  ],
  systemPrompt: 'You are a meticulous QA tester. Always verify actions succeeded before proceeding.',
});

await agent.perform('Log in with username "testuser" and password TEST_PASSWORD');

Example with Google Gemini Provider

import { chromium } from 'playwright';

const browser = await chromium.launch();
const page = await browser.newPage();

const agent = await page.agent({
  provider: {
    api: 'google',
    apiKey: process.env.GOOGLE_API_KEY,
    model: 'gemini-2.0-flash',
  },
});

await agent.perform('Search for "Playwright testing" and click the first result');

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment