Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Langfuse Langfuse Ingestion Date Boundary Delay

From Leeroopedia
Knowledge Sources
Domains Ingestion, Data_Integrity
Last Updated 2026-02-14 06:00 GMT

Overview

Adds a configurable delay (default 15 seconds) to ingestion queue processing near UTC midnight (23:45-00:15) to prevent duplicate events from out-of-order processing across ClickHouse date partition boundaries.

Description

ClickHouse partitions data by date. When events arrive near the UTC midnight boundary, out-of-order processing can cause the same event to be written to two different date partitions, resulting in duplicates. This heuristic adds a processing delay during the 30-minute window around midnight to allow events to settle into the correct partition before processing.

Usage

This heuristic is automatically applied by the ingestion pipeline. It activates when the current UTC time is between 23:45 and 00:15. The delay value is controlled by the LANGFUSE_INGESTION_QUEUE_DELAY_MS environment variable (default: 15,000ms / 15 seconds). Non-OTel sources also receive a minimum 5-second delay outside this window.

The Insight (Rule of Thumb)

  • Action: Add a processing delay to ingestion events near UTC midnight boundaries.
  • Value: 15 seconds default delay during 23:45-00:15 UTC; 5 seconds minimum for non-OTel sources at all times.
  • Trade-off: Slightly increased latency for events processed near midnight in exchange for eliminated duplicate records.
  • Scope: Applies to all ingestion queue jobs; OTel sources get zero delay outside the boundary window.

Reasoning

ClickHouse uses date-based partitioning for efficient data management. When an event timestamp falls on the boundary between two dates (e.g., 23:59:59.999 UTC), and processing happens asynchronously, there is a risk that the same event body is processed twice into different date partitions due to timestamp precision loss or out-of-order delivery. The 15-second delay ensures all events for a given date window have been received before processing begins, at the cost of slightly increased end-to-end latency.

The comment in the source code states: "We need the delay around date boundaries to avoid duplicates for out-of-order processing of events."

// From packages/shared/src/server/ingestion/processEventBatch.ts
// Between 23:45 UTC and 00:15 UTC (date boundary), add delay
if ((hours === 23 && minutes >= 45) || (hours === 0 && minutes <= 15)) {
  return env.LANGFUSE_INGESTION_QUEUE_DELAY_MS; // Default: 15,000ms
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment