Principle:Microsoft Playwright Record Interactions
| Knowledge Sources | |
|---|---|
| Domains | Testing, Code_Generation, Event_Capture |
| Last Updated | 2026-02-11 00:00 GMT |
Overview
Capturing user interactions (clicks, fills, navigations, selections) in real-time and converting them to structured action representations is the core mechanism that enables automated test generation from live browser sessions.
Description
User interaction recording is the heart of any record-and-replay test generation system. The recorder must observe every meaningful user action in the browser, classify it by type, identify the target element using robust selectors, capture any associated values, and emit a structured representation that downstream code generators can consume.
The recording process involves several interleaved concerns:
- Event interception: The recorder injects scripts into every frame of every page in the browser context. These scripts listen for DOM events (click, input, change, keydown, submit, etc.) and report them back to the server-side recorder engine.
- Action classification: Raw DOM events are mapped to high-level action types: click, fill, check, uncheck, select, press, navigate, closePage, openPage, and others. Multiple low-level events may collapse into a single high-level action (e.g., focus + multiple keydown events become a single "fill" action).
- Selector generation: For each target element, the recorder generates multiple candidate selectors (test ID, role-based, CSS, text, XPath) and selects the most stable and readable one. The preferred selector strategy prioritizes test IDs and ARIA roles over fragile CSS paths.
- Signal processing: Actions may trigger secondary effects such as navigation, popup creation, dialog appearance, or file download. These signals are captured alongside the primary action and attached to the action record, ensuring the generated test includes appropriate waiters and handlers.
- Action deduplication and collapsing: Consecutive actions of the same type on the same target may be collapsed. For example, multiple individual character inputs are collapsed into a single "fill" action with the complete text value.
The recorder operates in two distinct execution contexts:
- In-page context: JavaScript injected into the web page observes DOM events and communicates via page bindings.
- Server-side context: The Recorder class processes incoming action reports, applies signal processing, and emits finalized action records.
This dual-context architecture is necessary because DOM events are only observable from within the page, but the recording logic requires access to the full browser context (multiple pages, frames, popups) which is only available server-side.
Usage
Apply this principle when:
- Building test recorders: Any tool that generates tests from user behavior must implement interaction recording.
- Creating analytics capture systems: Session replay tools use similar techniques to record user journeys for analysis.
- Implementing macro recorders: Browser automation macros rely on the same event capture and replay pipeline.
- Debugging user-reported issues: Recording the exact sequence of actions that led to a bug enables reliable reproduction.
Key considerations:
- Selector stability: Prefer selectors that survive DOM refactoring (test IDs, ARIA roles) over those tied to structure (nth-child, deep CSS paths).
- Action granularity: Record at the semantic level (fill, click, select) rather than the event level (mousedown, mouseup, keypress) to produce readable tests.
- Frame awareness: Recording must work across iframes and shadow DOMs, not just the top-level document.
Theoretical Basis
The interaction recording pipeline can be modeled as a stream processing system:
DOM_EVENT_STREAM
-> FILTER(relevant events only: click, input, change, keydown, navigation)
-> CLASSIFY(event -> ActionType)
-> ENRICH(action + selector + value + frame_info)
-> COLLAPSE(merge consecutive same-type actions on same target)
-> SIGNAL_PROCESS(attach navigation, popup, dialog, download signals)
-> EMIT(ActionInContext record)
Each stage is a pure transformation, making the pipeline testable and extensible. New action types can be added by extending the CLASSIFY stage, and new selector strategies by extending the ENRICH stage.
The ActionInContext record is the universal intermediate representation:
ActionInContext {
frame: FrameDescription // which frame the action occurred in
action: Action // type, selector, value, modifiers, position
timestamp: number // when the action occurred
startTime: number // for duration tracking
signals: Signal[] // navigation, popup, dialog, download
}
This representation is intentionally language-agnostic, serving as the contract between the recorder and the code generators.