Principle:Microsoft Playwright Execute and Analyze Agent Results
| Knowledge Sources | |
|---|---|
| Domains | AI_Testing, Browser_Automation, Test_Execution |
| Last Updated | 2026-02-11 00:00 GMT |
Overview
Executing AI agent-generated browser actions against the page and managing the agentic loop lifecycle, including caching, token budgeting, and retry logic, is the runtime foundation that translates LLM decisions into concrete browser interactions.
Description
Between the high-level API (perform, expect, extract) and the browser itself lies the action execution layer. This layer is responsible for:
- Action dispatch: Translating structured action objects (e.g., "click element with ref 42") into actual Playwright API calls (e.g.,
locator.click()). - Agentic loop management: Orchestrating the observe-decide-act cycle, managing conversation history, and determining when to stop.
- Caching: Storing LLM responses keyed by task and page state, and replaying them in subsequent runs to eliminate LLM costs.
- Token budgeting: Tracking cumulative token usage across turns and halting execution when the budget is exhausted.
- Retry logic: Detecting action failures (e.g., element not found, timeout) and providing error feedback to the LLM for self-correction.
- Result analysis: Collecting execution metrics (turns, tokens, actions) and producing structured output for reporting and debugging.
This layer is typically invisible to test authors but is critical for reliability, performance, and cost control in production AI testing systems.
Usage
Apply this principle when:
- Designing the runtime architecture for an AI testing framework
- Implementing caching strategies to minimize LLM API calls in CI
- Building cost monitoring and token budgeting systems
- Debugging agent behavior by analyzing action execution logs
- Optimizing the agentic loop for faster test execution
- Implementing the bridge between LLM tool calls and actual browser automation APIs
Theoretical Basis
The action execution layer operates at the intersection of three subsystems:
┌─────────────┐ ┌──────────────┐ ┌───────────────┐
│ LLM Layer │────>│ Action Layer │────>│ Browser Layer │
│ │ │ │ │ │
│ - Reasoning │ │ - Dispatch │ │ - Playwright │
│ - Tool use │ │ - Retry │ │ - Page API │
│ - History │ │ - Caching │ │ - Locators │
│ │ │ - Budgeting │ │ │
└─────────────┘ └──────────────┘ └───────────────┘
Action dispatch model:
Each action type maps to a specific Playwright API call. The dispatcher pattern ensures a clean separation between the LLM's abstract tool calls and the concrete browser API:
ActionDispatch(action):
switch action.type:
case "navigate": page.goto(action.url)
case "click": locator(action.ref).click()
case "drag": locator(action.ref).drag(target)
case "hover": locator(action.ref).hover()
case "fill": locator(action.ref).fill(action.value)
case "select_option": locator(action.ref).selectOption(action.value)
case "press_key": keyboard.press(action.key)
case "type": keyboard.type(action.text)
case "set_checked": locator(action.ref).setChecked(action.checked)
case "snapshot": captureAccessibilityTree(page)
// ... assertion and extraction actions
Execution modes:
The runtime supports three execution modes, controlled via the --run-agents CLI flag:
ExecutionMode:
'none': Cache-only mode
- Only execute if LLM response exists in cache
- Fail or skip if cache miss
- Zero LLM cost
'missing': Incremental mode
- Check cache first
- Call LLM only for uncached tasks
- Optimal for CI with evolving test suites
'all': Full mode
- Always call LLM regardless of cache
- Regenerates cache on pass
- Used for cache refresh or initial setup
Context and history management:
The agentic loop maintains a Context object that tracks:
- Page reference: The Playwright Page instance being controlled
- Agent parameters: Provider config, limits, secrets, system prompt
- Conversation history: Accumulated observations and actions for LLM context
- Token budget: Running total of tokens consumed, compared against maxTokens
Context = {
page: Page // Active browser page
agentParams: AgentParams // Provider, limits, secrets
history: Message[] // LLM conversation history
tokenBudget: {
used: number // Tokens consumed so far
limit: number // Maximum allowed tokens
}
}
Retry strategy:
When an action fails, the retry logic follows this pattern:
RetryLogic(action, maxRetries):
for attempt in 1..maxRetries:
try:
execute(action)
return SUCCESS
catch error:
feedback = formatError(error)
history.append(feedback)
// Ask LLM for corrected action
correctedAction = LLM.decide(history)
action = correctedAction
return FAILURE("Exceeded max retries")
Cache architecture:
The cache stores LLM responses indexed by a composite key:
CacheKey = hash(cacheKey || taskText, pageSnapshot)
CacheEntry = {
key: CacheKey
response: LLMResponse
metadata: {
model: string
timestamp: number
tokens: { input: number, output: number }
}
}
Caching is particularly important for CI pipelines where:
- Tests run frequently and consistently
- LLM costs accumulate rapidly
- Deterministic test results are preferred
- Network latency to LLM APIs adds to test duration
Output artifacts:
After execution, the system produces:
- agent-usage.json: Per-test usage statistics (turns, tokens, actions)
- Cache files: Persisted LLM responses for future replay
- Trace files: Detailed execution traces for debugging
- Test results: Standard pass/fail outcomes integrated with the test runner