Implementation:Ucbepic Docetl ScraperRoute
| Knowledge Sources | |
|---|---|
| Domains | Frontend, React_UI |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for the Next.js API route that powers the AI-driven web scraper agent in the DocETL frontend.
Description
This file implements a Next.js API route at /api/scraper that provides an AI agent capable of writing and executing Python scraping code in a sandboxed Modal environment. It uses Azure OpenAI with tool-calling to iteratively search the web, write scraping scripts, execute them in cloud sandboxes, and collect structured datasets. The route also integrates with Supabase for usage tracking and session management.
Usage
This route is called by the ScraperPage component when the user interacts with the scraper chat interface. It streams AI responses and tool invocations back to the client using the Vercel AI SDK.
Code Reference
Source Location
- Repository: Ucbepic_Docetl
- File: website/src/app/api/scraper/route.ts
- Lines: 1-829
Signature
export async function POST(req: Request): Promise<Response>
// Key internal functions:
async function initializeModal(): Promise<void>
async function executeModalSandbox(code: string, sessionId: string): Promise<{
success: boolean;
output: string;
error: string | null;
stdout?: string;
stderr?: string;
}>
Import
// This is an API route, not directly imported. Called via:
fetch("/api/scraper", { method: "POST", body: JSON.stringify({ messages, userQuery, schema, sessionId }) })
I/O Contract
Inputs (Props)
| Name | Type | Required | Description |
|---|---|---|---|
| messages | CoreMessage[] | Yes | Chat message history |
| userQuery | string | Yes | The user's scraping task description |
| schema | string | No | Optional schema for the scraped dataset |
| sessionId | string | No | Session identifier for persistent sandbox storage |
Outputs
| Name | Type | Description |
|---|---|---|
| stream | ReadableStream | Streamed AI text response with tool call results |
Usage Examples
const { messages, handleSubmit } = useChat({
api: "/api/scraper",
body: {
userQuery: "Scrape all product prices from example.com",
schema: '{"name": "string", "price": "number"}',
sessionId: "abc123",
},
});