Implementation:Microsoft Playwright Agent Extract
| Knowledge Sources | |
|---|---|
| Domains | AI_Testing, Browser_Automation, Data_Extraction |
| Last Updated | 2026-02-11 00:00 GMT |
Overview
Concrete API for extracting structured, schema-validated data from web pages using natural language queries powered by an AI agent, provided by the Playwright library.
Description
The agent.extract() method accepts a natural language query describing the data to extract and a Zod schema defining the expected output structure. The LLM reads the current page content and produces structured output that conforms to the provided schema.
Critically, extract() uses no browser tools. At the server side (pageAgent.ts:L86), an empty tool array ([]) is passed to the agentic loop. The LLM is explicitly instructed "Do not perform any actions, just extract." The only mechanism for returning data is a built-in report_result tool whose schema is derived from the user-provided Zod schema. This ensures that:
- Extraction is completely side-effect-free (no clicks, no navigation)
- The output is guaranteed to match the Zod schema (validated at runtime)
- The result is fully typed via TypeScript's
InferZodSchema<Schema>
The method returns an object containing both the typed result and usage statistics (turns, input tokens, output tokens).
Usage
Use agent.extract() when:
- You need to pull structured data from a page for validation or comparison
- You want type-safe extraction results guaranteed by a Zod schema
- You need to extract tabular data, lists, or complex nested data structures
- You want to avoid writing CSS selectors or XPath queries for data retrieval
Code Reference
Source Location
- Repository: playwright
- File:
packages/playwright-core/src/client/pageAgent.ts:L50-54(client-side proxy) - File:
packages/playwright-core/src/server/agent/pageAgent.ts:L77-88(server-side implementation)
Signature
agent.extract<Schema extends ZodSchema>(
query: string,
schema: Schema
): Promise<{
result: InferZodSchema<Schema>;
usage: {
turns: number;
inputTokens: number;
outputTokens: number;
};
}>
Import
import { test } from '@playwright/test';
import { z } from 'zod';
test('extract data', async ({ agent }) => {
const { result } = await agent.extract(
'Get the page heading',
z.object({ heading: z.string() })
);
console.log(result.heading);
});
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| query | string |
Yes | Natural language description of the data to extract from the page |
| schema | ZodSchema |
Yes | Zod schema defining the expected shape of the extracted data |
Outputs
| Name | Type | Description |
|---|---|---|
| result | InferZodSchema<Schema> |
The extracted data, typed according to the provided Zod schema |
| usage.turns | number |
Number of LLM round-trips executed during extraction |
| usage.inputTokens | number |
Total input tokens consumed |
| usage.outputTokens | number |
Total output tokens generated |
Usage Examples
Basic Example
import { test, expect } from '@playwright/test';
import { z } from 'zod';
test('extract product details', async ({ agent }) => {
await agent.perform('Navigate to https://shop.example.com/products/laptop-stand');
const { result } = await agent.extract(
'Extract the product name, price, and availability status',
z.object({
name: z.string(),
price: z.number(),
inStock: z.boolean(),
})
);
expect(result.name).toBe('Laptop Stand');
expect(result.price).toBeGreaterThan(0);
expect(result.inStock).toBe(true);
});
Extract a List of Items
import { test, expect } from '@playwright/test';
import { z } from 'zod';
test('extract product catalog', async ({ agent }) => {
await agent.perform('Navigate to the product catalog page');
const { result } = await agent.extract(
'Extract all products with their names, prices, and ratings',
z.object({
products: z.array(z.object({
name: z.string(),
price: z.number(),
rating: z.number().min(0).max(5),
})),
})
);
expect(result.products.length).toBeGreaterThan(0);
for (const product of result.products) {
expect(product.price).toBeGreaterThan(0);
expect(product.rating).toBeGreaterThanOrEqual(0);
}
});
Extract with Nested Schema
import { test } from '@playwright/test';
import { z } from 'zod';
test('extract order summary', async ({ agent }) => {
await agent.perform('Navigate to the order confirmation page');
const { result, usage } = await agent.extract(
'Extract the complete order summary including items, shipping address, and total',
z.object({
orderNumber: z.string(),
items: z.array(z.object({
name: z.string(),
quantity: z.number(),
unitPrice: z.number(),
})),
shippingAddress: z.object({
street: z.string(),
city: z.string(),
state: z.string(),
zip: z.string(),
}),
total: z.number(),
})
);
console.log(`Order ${result.orderNumber}: $${result.total}`);
console.log(`Extraction used ${usage.inputTokens + usage.outputTokens} tokens`);
});
Extract with Enum Values
import { test, expect } from '@playwright/test';
import { z } from 'zod';
test('extract user profile status', async ({ agent }) => {
await agent.perform('Navigate to the user profile page');
const { result } = await agent.extract(
'Extract the user profile information including their subscription tier',
z.object({
username: z.string(),
email: z.string().email(),
subscriptionTier: z.enum(['free', 'basic', 'premium', 'enterprise']),
isVerified: z.boolean(),
})
);
expect(result.subscriptionTier).toBe('premium');
expect(result.isVerified).toBe(true);
});