Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Microsoft Playwright Agent Extract

From Leeroopedia
Knowledge Sources
Domains AI_Testing, Browser_Automation, Data_Extraction
Last Updated 2026-02-11 00:00 GMT

Overview

Concrete API for extracting structured, schema-validated data from web pages using natural language queries powered by an AI agent, provided by the Playwright library.

Description

The agent.extract() method accepts a natural language query describing the data to extract and a Zod schema defining the expected output structure. The LLM reads the current page content and produces structured output that conforms to the provided schema.

Critically, extract() uses no browser tools. At the server side (pageAgent.ts:L86), an empty tool array ([]) is passed to the agentic loop. The LLM is explicitly instructed "Do not perform any actions, just extract." The only mechanism for returning data is a built-in report_result tool whose schema is derived from the user-provided Zod schema. This ensures that:

  • Extraction is completely side-effect-free (no clicks, no navigation)
  • The output is guaranteed to match the Zod schema (validated at runtime)
  • The result is fully typed via TypeScript's InferZodSchema<Schema>

The method returns an object containing both the typed result and usage statistics (turns, input tokens, output tokens).

Usage

Use agent.extract() when:

  • You need to pull structured data from a page for validation or comparison
  • You want type-safe extraction results guaranteed by a Zod schema
  • You need to extract tabular data, lists, or complex nested data structures
  • You want to avoid writing CSS selectors or XPath queries for data retrieval

Code Reference

Source Location

  • Repository: playwright
  • File: packages/playwright-core/src/client/pageAgent.ts:L50-54 (client-side proxy)
  • File: packages/playwright-core/src/server/agent/pageAgent.ts:L77-88 (server-side implementation)

Signature

agent.extract<Schema extends ZodSchema>(
  query: string,
  schema: Schema
): Promise<{
  result: InferZodSchema<Schema>;
  usage: {
    turns: number;
    inputTokens: number;
    outputTokens: number;
  };
}>

Import

import { test } from '@playwright/test';
import { z } from 'zod';

test('extract data', async ({ agent }) => {
  const { result } = await agent.extract(
    'Get the page heading',
    z.object({ heading: z.string() })
  );
  console.log(result.heading);
});

I/O Contract

Inputs

Name Type Required Description
query string Yes Natural language description of the data to extract from the page
schema ZodSchema Yes Zod schema defining the expected shape of the extracted data

Outputs

Name Type Description
result InferZodSchema<Schema> The extracted data, typed according to the provided Zod schema
usage.turns number Number of LLM round-trips executed during extraction
usage.inputTokens number Total input tokens consumed
usage.outputTokens number Total output tokens generated

Usage Examples

Basic Example

import { test, expect } from '@playwright/test';
import { z } from 'zod';

test('extract product details', async ({ agent }) => {
  await agent.perform('Navigate to https://shop.example.com/products/laptop-stand');

  const { result } = await agent.extract(
    'Extract the product name, price, and availability status',
    z.object({
      name: z.string(),
      price: z.number(),
      inStock: z.boolean(),
    })
  );

  expect(result.name).toBe('Laptop Stand');
  expect(result.price).toBeGreaterThan(0);
  expect(result.inStock).toBe(true);
});

Extract a List of Items

import { test, expect } from '@playwright/test';
import { z } from 'zod';

test('extract product catalog', async ({ agent }) => {
  await agent.perform('Navigate to the product catalog page');

  const { result } = await agent.extract(
    'Extract all products with their names, prices, and ratings',
    z.object({
      products: z.array(z.object({
        name: z.string(),
        price: z.number(),
        rating: z.number().min(0).max(5),
      })),
    })
  );

  expect(result.products.length).toBeGreaterThan(0);
  for (const product of result.products) {
    expect(product.price).toBeGreaterThan(0);
    expect(product.rating).toBeGreaterThanOrEqual(0);
  }
});

Extract with Nested Schema

import { test } from '@playwright/test';
import { z } from 'zod';

test('extract order summary', async ({ agent }) => {
  await agent.perform('Navigate to the order confirmation page');

  const { result, usage } = await agent.extract(
    'Extract the complete order summary including items, shipping address, and total',
    z.object({
      orderNumber: z.string(),
      items: z.array(z.object({
        name: z.string(),
        quantity: z.number(),
        unitPrice: z.number(),
      })),
      shippingAddress: z.object({
        street: z.string(),
        city: z.string(),
        state: z.string(),
        zip: z.string(),
      }),
      total: z.number(),
    })
  );

  console.log(`Order ${result.orderNumber}: $${result.total}`);
  console.log(`Extraction used ${usage.inputTokens + usage.outputTokens} tokens`);
});

Extract with Enum Values

import { test, expect } from '@playwright/test';
import { z } from 'zod';

test('extract user profile status', async ({ agent }) => {
  await agent.perform('Navigate to the user profile page');

  const { result } = await agent.extract(
    'Extract the user profile information including their subscription tier',
    z.object({
      username: z.string(),
      email: z.string().email(),
      subscriptionTier: z.enum(['free', 'basic', 'premium', 'enterprise']),
      isVerified: z.boolean(),
    })
  );

  expect(result.subscriptionTier).toBe('premium');
  expect(result.isVerified).toBe(true);
});

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment