Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ucbepic Docetl ScraperRoute

From Leeroopedia


Knowledge Sources
Domains Frontend, React_UI
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for the Next.js API route that powers the AI-driven web scraper agent in the DocETL frontend.

Description

This file implements a Next.js API route at /api/scraper that provides an AI agent capable of writing and executing Python scraping code in a sandboxed Modal environment. It uses Azure OpenAI with tool-calling to iteratively search the web, write scraping scripts, execute them in cloud sandboxes, and collect structured datasets. The route also integrates with Supabase for usage tracking and session management.

Usage

This route is called by the ScraperPage component when the user interacts with the scraper chat interface. It streams AI responses and tool invocations back to the client using the Vercel AI SDK.

Code Reference

Source Location

Signature

export async function POST(req: Request): Promise<Response>

// Key internal functions:
async function initializeModal(): Promise<void>
async function executeModalSandbox(code: string, sessionId: string): Promise<{
  success: boolean;
  output: string;
  error: string | null;
  stdout?: string;
  stderr?: string;
}>

Import

// This is an API route, not directly imported. Called via:
fetch("/api/scraper", { method: "POST", body: JSON.stringify({ messages, userQuery, schema, sessionId }) })

I/O Contract

Inputs (Props)

Name Type Required Description
messages CoreMessage[] Yes Chat message history
userQuery string Yes The user's scraping task description
schema string No Optional schema for the scraped dataset
sessionId string No Session identifier for persistent sandbox storage

Outputs

Name Type Description
stream ReadableStream Streamed AI text response with tool call results

Usage Examples

const { messages, handleSubmit } = useChat({
  api: "/api/scraper",
  body: {
    userQuery: "Scrape all product prices from example.com",
    schema: '{"name": "string", "price": "number"}',
    sessionId: "abc123",
  },
});

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment