Principle:Langfuse Langfuse Export Job Creation
| Knowledge Sources | |
|---|---|
| Domains | Batch Export, Job Scheduling, Data Pipeline |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Export Job Creation is the principle of initiating a long-running data export by persisting a job record and delegating the actual work to an asynchronous queue, thereby decoupling the user-facing request from the resource-intensive export processing.
Description
When a user requests a bulk data export in an LLM engineering platform, the operation may involve streaming millions of rows from an analytics database, transforming them into a specific file format, and uploading the result to blob storage. Performing this synchronously within an HTTP request-response cycle would lead to timeouts, memory pressure on the web server, and a poor user experience.
Export Job Creation addresses this by splitting the operation into two phases:
- Record Phase: A lightweight mutation creates a persistent job record in the primary database (PostgreSQL) with an initial status of QUEUED. This record serves as the single source of truth for the export's lifecycle. The record captures the user's query parameters, desired output format, project context, and the identity of the requesting user.
- Dispatch Phase: After the record is persisted, a message is enqueued onto a durable job queue (such as BullMQ backed by Redis). The message contains only the identifiers needed for the worker to retrieve full job details: the export record ID and the project ID. The queue provides durability, retry semantics, and rate limiting.
This two-phase approach provides several guarantees:
- Idempotency: The export record ID can be used as a deduplication key on the queue, preventing duplicate processing if the same job is enqueued more than once.
- Auditability: An audit log entry is created alongside the job record, tracking who initiated the export and when.
- RBAC enforcement: Authorization checks are performed at the creation boundary, before any work is dispatched. The user must hold the `batchExports:create` scope for the target project.
- Failure isolation: If the worker fails, the web server is unaffected. The job record's status can be updated to FAILED with diagnostic information.
Usage
Use Export Job Creation whenever:
- A data retrieval operation is expected to take longer than a typical HTTP timeout (30 seconds or more).
- The result is a file artifact (CSV, JSON, JSONL) that must be uploaded to external storage.
- You need to track the lifecycle of the operation (queued, processing, completed, failed, cancelled).
- Multiple users may initiate exports concurrently, requiring queue-based backpressure.
Theoretical Basis
The pattern follows the Command Query Responsibility Segregation (CQRS) and Asynchronous Command patterns. The theoretical workflow is:
1. User submits export request with parameters:
{ projectId, query: { tableName, filter, orderBy }, format, name }
2. Authorization check:
ASSERT user.hasScope("batchExports:create", projectId)
3. Persist job record:
record = INSERT INTO batchExport
(projectId, userId, status="QUEUED", name, format, query)
RETURN record.id
4. Create audit trail:
INSERT INTO auditLog
(resourceType="batchExport", resourceId=record.id, action="create")
5. Enqueue asynchronous job:
ENQUEUE(queue="batch-export", payload={batchExportId: record.id, projectId})
-- Use record.id as deduplication key
6. Return success to user immediately
-- The user can poll the job status via a separate query endpoint
The deduplication key (step 5) ensures that even if the enqueue operation is retried due to transient failures, the worker will process the job at most once per unique export record. The status field on the record acts as a state machine with transitions: QUEUED -> PROCESSING -> COMPLETED | FAILED, with CANCELLED as a terminal state reachable from QUEUED.