Implementation:Microsoft Playwright Agent Extract

Knowledge Sources	Playwright Playwright AI Testing Zod Schema Library
Domains	AI_Testing, Browser_Automation, Data_Extraction
Last Updated	2026-02-11 00:00 GMT

Overview

Concrete API for extracting structured, schema-validated data from web pages using natural language queries powered by an AI agent, provided by the Playwright library.

Description

The agent.extract() method accepts a natural language query describing the data to extract and a Zod schema defining the expected output structure. The LLM reads the current page content and produces structured output that conforms to the provided schema.

Critically, extract() uses no browser tools. At the server side (pageAgent.ts:L86), an empty tool array ([]) is passed to the agentic loop. The LLM is explicitly instructed "Do not perform any actions, just extract." The only mechanism for returning data is a built-in report_result tool whose schema is derived from the user-provided Zod schema. This ensures that:

Extraction is completely side-effect-free (no clicks, no navigation)
The output is guaranteed to match the Zod schema (validated at runtime)
The result is fully typed via TypeScript's InferZodSchema<Schema>

The method returns an object containing both the typed result and usage statistics (turns, input tokens, output tokens).

Usage

Use agent.extract() when:

You need to pull structured data from a page for validation or comparison
You want type-safe extraction results guaranteed by a Zod schema
You need to extract tabular data, lists, or complex nested data structures
You want to avoid writing CSS selectors or XPath queries for data retrieval

Code Reference

Source Location

Repository: playwright
File: packages/playwright-core/src/client/pageAgent.ts:L50-54 (client-side proxy)
File: packages/playwright-core/src/server/agent/pageAgent.ts:L77-88 (server-side implementation)

Signature

agent.extract<Schema extends ZodSchema>(
  query: string,
  schema: Schema
): Promise<{
  result: InferZodSchema<Schema>;
  usage: {
    turns: number;
    inputTokens: number;
    outputTokens: number;
  };
}>

Import

import { test } from '@playwright/test';
import { z } from 'zod';

test('extract data', async ({ agent }) => {
  const { result } = await agent.extract(
    'Get the page heading',
    z.object({ heading: z.string() })
  );
  console.log(result.heading);
});

I/O Contract

Inputs

Name	Type	Required	Description
query	`string`	Yes	Natural language description of the data to extract from the page
schema	`ZodSchema`	Yes	Zod schema defining the expected shape of the extracted data

Outputs

Name	Type	Description
result	`InferZodSchema<Schema>`	The extracted data, typed according to the provided Zod schema
usage.turns	`number`	Number of LLM round-trips executed during extraction
usage.inputTokens	`number`	Total input tokens consumed
usage.outputTokens	`number`	Total output tokens generated

Usage Examples

Basic Example

import { test, expect } from '@playwright/test';
import { z } from 'zod';

test('extract product details', async ({ agent }) => {
  await agent.perform('Navigate to https://shop.example.com/products/laptop-stand');

  const { result } = await agent.extract(
    'Extract the product name, price, and availability status',
    z.object({
      name: z.string(),
      price: z.number(),
      inStock: z.boolean(),
    })
  );

  expect(result.name).toBe('Laptop Stand');
  expect(result.price).toBeGreaterThan(0);
  expect(result.inStock).toBe(true);
});

Extract a List of Items

import { test, expect } from '@playwright/test';
import { z } from 'zod';

test('extract product catalog', async ({ agent }) => {
  await agent.perform('Navigate to the product catalog page');

  const { result } = await agent.extract(
    'Extract all products with their names, prices, and ratings',
    z.object({
      products: z.array(z.object({
        name: z.string(),
        price: z.number(),
        rating: z.number().min(0).max(5),
      })),
    })
  );

  expect(result.products.length).toBeGreaterThan(0);
  for (const product of result.products) {
    expect(product.price).toBeGreaterThan(0);
    expect(product.rating).toBeGreaterThanOrEqual(0);
  }
});

Extract with Nested Schema

import { test } from '@playwright/test';
import { z } from 'zod';

test('extract order summary', async ({ agent }) => {
  await agent.perform('Navigate to the order confirmation page');

  const { result, usage } = await agent.extract(
    'Extract the complete order summary including items, shipping address, and total',
    z.object({
      orderNumber: z.string(),
      items: z.array(z.object({
        name: z.string(),
        quantity: z.number(),
        unitPrice: z.number(),
      })),
      shippingAddress: z.object({
        street: z.string(),
        city: z.string(),
        state: z.string(),
        zip: z.string(),
      }),
      total: z.number(),
    })
  );

  console.log(`Order ${result.orderNumber}: $${result.total}`);
  console.log(`Extraction used ${usage.inputTokens + usage.outputTokens} tokens`);
});

Extract with Enum Values

import { test, expect } from '@playwright/test';
import { z } from 'zod';

test('extract user profile status', async ({ agent }) => {
  await agent.perform('Navigate to the user profile page');

  const { result } = await agent.extract(
    'Extract the user profile information including their subscription tier',
    z.object({
      username: z.string(),
      email: z.string().email(),
      subscriptionTier: z.enum(['free', 'basic', 'premium', 'enterprise']),
      isVerified: z.boolean(),
    })
  );

  expect(result.subscriptionTier).toBe('premium');
  expect(result.isVerified).toBe(true);
});

Related Pages

Implements Principle

Principle:Microsoft_Playwright_Extract_Data_with_Agent

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment