Principle:Microsoft Playwright Define Agent Actions with Perform

Knowledge Sources	Playwright AI Testing LLM Tool Use Patterns
Domains	AI_Testing, Browser_Automation, Natural_Language_Processing
Last Updated	2026-02-11 00:00 GMT

Overview

Instructing an AI agent to perform multi-step browser tasks described in natural language enables autonomous browser interaction where the agent selects and executes appropriate actions without explicit programming of each step.

Description

Traditional browser automation requires the test author to specify every interaction explicitly: click this selector, type into that field, wait for this element. AI-agent-driven task execution inverts this model. The test author describes what should happen in natural language, and the agent autonomously determines how to accomplish it.

This principle encompasses:

Natural language task specification: The test author writes human-readable instructions like "Fill out the registration form with valid data and submit it" rather than scripting individual clicks and keystrokes.
Autonomous action selection: The LLM observes the current page state (via accessibility snapshots or screenshots), reasons about the task, and selects the appropriate browser action from a defined tool set.
Multi-step execution: A single natural language task may require multiple sequential browser actions. The agent loops: observe the page, decide the next action, execute it, observe the result, and repeat until the task is complete or limits are reached.
Action budgeting: Each task execution is bounded by maximum action counts and token limits to prevent infinite loops and control costs.
Retry logic: When an action fails (e.g., element not found, navigation timeout), the agent can retry with adjusted parameters up to a configurable retry limit.

Usage

Apply this principle when:

You want to automate browser workflows using natural language instead of explicit selectors
The page structure may change frequently, making selector-based tests brittle
You need to test complex user journeys that span multiple pages and interactions
You want to reduce test maintenance overhead by abstracting away DOM details
You are building exploratory tests that adapt to varying page states

Theoretical Basis

The agent task execution follows a ReAct (Reasoning + Acting) loop pattern:

AgentLoop(task):
  history = []
  for step in 1..maxActions:
    observation = snapshot(page)           // Capture current page state
    history.append(observation)

    action = LLM.decide(task, history)     // LLM selects next action

    if action == TASK_COMPLETE:
      return SUCCESS

    try:
      result = execute(page, action)       // Run browser action
      history.append(result)
    catch error:
      if retries < maxActionRetries:
        history.append(error)
        retries += 1
        continue
      else:
        return FAILURE

  return EXCEEDED_ACTION_LIMIT

Tool set design:

The agent operates with a fixed set of browser action tools. Each tool maps to a fundamental browser interaction:

Tool	Browser Action	Description
browser_navigate	page.goto()	Navigate to a URL
browser_snapshot	accessibility tree	Capture page state for observation
browser_click	locator.click()	Click an element
browser_drag	locator.drag()	Drag an element to a target
browser_hover	locator.hover()	Hover over an element
browser_select_option	locator.selectOption()	Select from a dropdown
browser_press_key	keyboard.press()	Press a keyboard key
browser_type	keyboard.type()	Type text character by character
browser_fill_form	locator.fill()	Fill a form field with text
browser_set_checked	locator.setChecked()	Set checkbox/radio state

The observation-action cycle:

Each iteration of the loop produces an observation (page snapshot) and consumes it to produce an action. The LLM maintains context through the conversation history, which accumulates observations and action results. This enables the agent to:

Track progress toward the task goal
Recover from unexpected page states
Adapt to dynamic content loading
Handle multi-page workflows

Token economy:

Each observation and action decision consumes tokens. The total token budget constrains the complexity of tasks the agent can handle. Efficient agents minimize unnecessary observations and make decisive actions.

Related Pages

Implemented By

Implementation:Microsoft_Playwright_Agent_Perform

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment