Heuristic:Langfuse Langfuse Ingestion Date Boundary Delay
| Knowledge Sources | |
|---|---|
| Domains | Ingestion, Data_Integrity |
| Last Updated | 2026-02-14 06:00 GMT |
Overview
Adds a configurable delay (default 15 seconds) to ingestion queue processing near UTC midnight (23:45-00:15) to prevent duplicate events from out-of-order processing across ClickHouse date partition boundaries.
Description
ClickHouse partitions data by date. When events arrive near the UTC midnight boundary, out-of-order processing can cause the same event to be written to two different date partitions, resulting in duplicates. This heuristic adds a processing delay during the 30-minute window around midnight to allow events to settle into the correct partition before processing.
Usage
This heuristic is automatically applied by the ingestion pipeline. It activates when the current UTC time is between 23:45 and 00:15. The delay value is controlled by the LANGFUSE_INGESTION_QUEUE_DELAY_MS environment variable (default: 15,000ms / 15 seconds). Non-OTel sources also receive a minimum 5-second delay outside this window.
The Insight (Rule of Thumb)
- Action: Add a processing delay to ingestion events near UTC midnight boundaries.
- Value: 15 seconds default delay during 23:45-00:15 UTC; 5 seconds minimum for non-OTel sources at all times.
- Trade-off: Slightly increased latency for events processed near midnight in exchange for eliminated duplicate records.
- Scope: Applies to all ingestion queue jobs; OTel sources get zero delay outside the boundary window.
Reasoning
ClickHouse uses date-based partitioning for efficient data management. When an event timestamp falls on the boundary between two dates (e.g., 23:59:59.999 UTC), and processing happens asynchronously, there is a risk that the same event body is processed twice into different date partitions due to timestamp precision loss or out-of-order delivery. The 15-second delay ensures all events for a given date window have been received before processing begins, at the cost of slightly increased end-to-end latency.
The comment in the source code states: "We need the delay around date boundaries to avoid duplicates for out-of-order processing of events."
// From packages/shared/src/server/ingestion/processEventBatch.ts
// Between 23:45 UTC and 00:15 UTC (date boundary), add delay
if ((hours === 23 && minutes >= 45) || (hours === 0 && minutes <= 15)) {
return env.LANGFUSE_INGESTION_QUEUE_DELAY_MS; // Default: 15,000ms
}