Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Heuristic:Helicone Helicone ClickHouse ReplacingMergeTree FINAL

From Leeroopedia
Knowledge Sources
Domains Analytics_Storage, ClickHouse
Last Updated 2026-02-14 06:00 GMT

Overview

Use `ReplacingMergeTree(updated_at)` for mutable request/response data and always query with `FINAL` to force deduplication before aggregation.

Description

ClickHouse's `ReplacingMergeTree` engine handles deduplication by keeping only the latest row (by `updated_at` timestamp) for each unique primary key combination. However, deduplication only occurs during background merges, which are asynchronous. Without the `FINAL` keyword in queries, aggregations may count duplicate rows that have not yet been merged. The `FINAL` keyword forces an on-the-fly merge at query time, guaranteeing accurate results at the cost of query performance.

Helicone uses this pattern for the `request_response_rmt` table, which stores mutable request/response metadata (properties, scores, status). Updates are implemented as full-row inserts with a new `updated_at` timestamp, not SQL UPDATEs.

Usage

Apply this heuristic when querying ClickHouse tables that use ReplacingMergeTree for accurate aggregations (cost sums, request counts, spend totals). Omitting `FINAL` from queries can produce inflated numbers due to unmerged duplicate rows.

The Insight (Rule of Thumb)

  • Action: Always append `FINAL` to `SELECT` queries on `ReplacingMergeTree` tables when accuracy matters. For updates, insert a complete copy of the row with a new `updated_at` timestamp.
  • Value: `FINAL` forces deduplication at query time. The `updated_at` column determines which row version to keep.
  • Trade-off: `FINAL` adds query latency proportional to the number of unmerged parts. For high-throughput dashboards, consider materialized views that pre-aggregate.

Reasoning

Table schema from `clickhouse/migrations/schema_41_request_response_replacing_merge_tree.sql:34`:

ENGINE = ReplacingMergeTree(updated_at)
PARTITION BY toYYYYMM(request_created_at)
PRIMARY KEY (organization_id, provider, model, user_id, request_created_at, request_id)
ORDER BY (organization_id, provider, model, user_id, request_created_at, request_id)

Query with FINAL from `valhalla/jawn/src/managers/creditsManager.ts:129-132`:

SELECT spend / ${COST_PRECISION_MULTIPLIER / 100} as spend_cents
FROM organization_ptb_spend_mv FINAL
WHERE organization_id = {val_0 : String}

Update pattern (insert-to-update) from `valhalla/jawn/src/lib/stores/request/VersionedRequestStore.ts`:

// 1. Read existing row (with ORDER BY updated_at DESC LIMIT 1)
const rowContents = await clickhouseDb.dbQuery<RequestResponseRMT>(`
  SELECT *
  FROM request_response_rmt
  WHERE request_id = {val_0: UUID}
  ORDER BY updated_at DESC
  LIMIT 1
`);

// 2. Insert modified copy (ReplacingMergeTree keeps the latest updated_at)
await clickhouseDb.dbInsertClickhouse("request_response_rmt", [{
  ...existingRow,
  properties: newProperties,
  updated_at: new Date().toISOString(),
}]);

The VersionedRequestStore also contains a TODO comment indicating this manual version control may be temporary: `// TODO Use final instead of hand rolling deduplication`.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment