Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Puppeteer Puppeteer Data Extraction

From Leeroopedia
Knowledge Sources
Domains Browser_Automation, Web_Scraping, Data_Processing
Last Updated 2026-02-11 23:00 GMT

Overview

A technique that executes JavaScript functions within the browser context to extract structured data from the DOM and transfer it to the Node.js environment.

Description

Data Extraction bridges the gap between the browser's JavaScript runtime (where DOM elements live) and the Node.js process (where Puppeteer runs). Since DOM objects cannot be directly serialized across this boundary, Puppeteer provides methods to execute JavaScript functions inside the browser and return their serializable results to Node.js.

Key methods:

  • evaluate(): Execute a function in the browser and return the result. Arguments and return values are serialized via JSON.
  • $$eval(): Query all elements matching a selector, pass them to a function, and return the result. Useful for extracting data from multiple elements in a single call.
  • $eval(): Query a single element and pass it to a function.

The serialization boundary means:

  • DOM elements, functions, and Symbols cannot be returned directly
  • Return values must be JSON-serializable (strings, numbers, objects, arrays)
  • For DOM references, use evaluateHandle() to get a JSHandle or ElementHandle

Usage

Use data extraction after the page has loaded and dynamic content has settled. Prefer $$eval() when extracting data from multiple elements matching a pattern (e.g., all table rows, all search results). Use evaluate() for general-purpose JavaScript execution.

Theoretical Basis

# Data extraction across the serialization boundary
Node.js Process                 Browser Process
─────────────                   ───────────────
1. Serialize function            →
2. Serialize arguments           →
                                3. Deserialize and execute function
                                4. Serialize return value
                                ←  5. Transfer result
6. Deserialize result
7. Return to caller

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment