Principle:Microsoft Playwright Execute and Analyze Agent Results

Knowledge Sources	Playwright AI Testing Playwright Test CLI
Domains	AI_Testing, Browser_Automation, Test_Execution
Last Updated	2026-02-11 00:00 GMT

Overview

Executing AI agent-generated browser actions against the page and managing the agentic loop lifecycle, including caching, token budgeting, and retry logic, is the runtime foundation that translates LLM decisions into concrete browser interactions.

Description

Between the high-level API (perform, expect, extract) and the browser itself lies the action execution layer. This layer is responsible for:

Action dispatch: Translating structured action objects (e.g., "click element with ref 42") into actual Playwright API calls (e.g., locator.click()).
Agentic loop management: Orchestrating the observe-decide-act cycle, managing conversation history, and determining when to stop.
Caching: Storing LLM responses keyed by task and page state, and replaying them in subsequent runs to eliminate LLM costs.
Token budgeting: Tracking cumulative token usage across turns and halting execution when the budget is exhausted.
Retry logic: Detecting action failures (e.g., element not found, timeout) and providing error feedback to the LLM for self-correction.
Result analysis: Collecting execution metrics (turns, tokens, actions) and producing structured output for reporting and debugging.

This layer is typically invisible to test authors but is critical for reliability, performance, and cost control in production AI testing systems.

Usage

Apply this principle when:

Designing the runtime architecture for an AI testing framework
Implementing caching strategies to minimize LLM API calls in CI
Building cost monitoring and token budgeting systems
Debugging agent behavior by analyzing action execution logs
Optimizing the agentic loop for faster test execution
Implementing the bridge between LLM tool calls and actual browser automation APIs

Theoretical Basis

The action execution layer operates at the intersection of three subsystems:

┌─────────────┐     ┌──────────────┐     ┌───────────────┐
│  LLM Layer  │────>│ Action Layer │────>│ Browser Layer  │
│             │     │              │     │               │
│ - Reasoning │     │ - Dispatch   │     │ - Playwright   │
│ - Tool use  │     │ - Retry      │     │ - Page API    │
│ - History   │     │ - Caching    │     │ - Locators    │
│             │     │ - Budgeting  │     │               │
└─────────────┘     └──────────────┘     └───────────────┘

Action dispatch model:

Each action type maps to a specific Playwright API call. The dispatcher pattern ensures a clean separation between the LLM's abstract tool calls and the concrete browser API:

ActionDispatch(action):
  switch action.type:
    case "navigate":     page.goto(action.url)
    case "click":        locator(action.ref).click()
    case "drag":         locator(action.ref).drag(target)
    case "hover":        locator(action.ref).hover()
    case "fill":         locator(action.ref).fill(action.value)
    case "select_option": locator(action.ref).selectOption(action.value)
    case "press_key":    keyboard.press(action.key)
    case "type":         keyboard.type(action.text)
    case "set_checked":  locator(action.ref).setChecked(action.checked)
    case "snapshot":     captureAccessibilityTree(page)
    // ... assertion and extraction actions

Execution modes:

The runtime supports three execution modes, controlled via the --run-agents CLI flag:

ExecutionMode:
  'none':    Cache-only mode
             - Only execute if LLM response exists in cache
             - Fail or skip if cache miss
             - Zero LLM cost

  'missing': Incremental mode
             - Check cache first
             - Call LLM only for uncached tasks
             - Optimal for CI with evolving test suites

  'all':     Full mode
             - Always call LLM regardless of cache
             - Regenerates cache on pass
             - Used for cache refresh or initial setup

Context and history management:

The agentic loop maintains a Context object that tracks:

Page reference: The Playwright Page instance being controlled
Agent parameters: Provider config, limits, secrets, system prompt
Conversation history: Accumulated observations and actions for LLM context
Token budget: Running total of tokens consumed, compared against maxTokens

Context = {
  page:        Page              // Active browser page
  agentParams: AgentParams       // Provider, limits, secrets
  history:     Message[]         // LLM conversation history
  tokenBudget: {
    used:      number            // Tokens consumed so far
    limit:     number            // Maximum allowed tokens
  }
}

Retry strategy:

When an action fails, the retry logic follows this pattern:

RetryLogic(action, maxRetries):
  for attempt in 1..maxRetries:
    try:
      execute(action)
      return SUCCESS
    catch error:
      feedback = formatError(error)
      history.append(feedback)

      // Ask LLM for corrected action
      correctedAction = LLM.decide(history)
      action = correctedAction

  return FAILURE("Exceeded max retries")

Cache architecture:

The cache stores LLM responses indexed by a composite key:

CacheKey = hash(cacheKey || taskText, pageSnapshot)

CacheEntry = {
  key:      CacheKey
  response: LLMResponse
  metadata: {
    model:     string
    timestamp: number
    tokens:    { input: number, output: number }
  }
}

Caching is particularly important for CI pipelines where:

Tests run frequently and consistently
LLM costs accumulate rapidly
Deterministic test results are preferred
Network latency to LLM APIs adds to test duration

Output artifacts:

After execution, the system produces:

agent-usage.json: Per-test usage statistics (turns, tokens, actions)
Cache files: Persisted LLM responses for future replay
Trace files: Detailed execution traces for debugging
Test results: Standard pass/fail outcomes integrated with the test runner

Related Pages

Implemented By

Implementation:Microsoft_Playwright_ActionRunner_RunAction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment