Heuristic:Helicone Helicone ClickHouse ReplacingMergeTree FINAL
| Knowledge Sources | |
|---|---|
| Domains | Analytics_Storage, ClickHouse |
| Last Updated | 2026-02-14 06:00 GMT |
Overview
Use `ReplacingMergeTree(updated_at)` for mutable request/response data and always query with `FINAL` to force deduplication before aggregation.
Description
ClickHouse's `ReplacingMergeTree` engine handles deduplication by keeping only the latest row (by `updated_at` timestamp) for each unique primary key combination. However, deduplication only occurs during background merges, which are asynchronous. Without the `FINAL` keyword in queries, aggregations may count duplicate rows that have not yet been merged. The `FINAL` keyword forces an on-the-fly merge at query time, guaranteeing accurate results at the cost of query performance.
Helicone uses this pattern for the `request_response_rmt` table, which stores mutable request/response metadata (properties, scores, status). Updates are implemented as full-row inserts with a new `updated_at` timestamp, not SQL UPDATEs.
Usage
Apply this heuristic when querying ClickHouse tables that use ReplacingMergeTree for accurate aggregations (cost sums, request counts, spend totals). Omitting `FINAL` from queries can produce inflated numbers due to unmerged duplicate rows.
The Insight (Rule of Thumb)
- Action: Always append `FINAL` to `SELECT` queries on `ReplacingMergeTree` tables when accuracy matters. For updates, insert a complete copy of the row with a new `updated_at` timestamp.
- Value: `FINAL` forces deduplication at query time. The `updated_at` column determines which row version to keep.
- Trade-off: `FINAL` adds query latency proportional to the number of unmerged parts. For high-throughput dashboards, consider materialized views that pre-aggregate.
Reasoning
Table schema from `clickhouse/migrations/schema_41_request_response_replacing_merge_tree.sql:34`:
ENGINE = ReplacingMergeTree(updated_at)
PARTITION BY toYYYYMM(request_created_at)
PRIMARY KEY (organization_id, provider, model, user_id, request_created_at, request_id)
ORDER BY (organization_id, provider, model, user_id, request_created_at, request_id)
Query with FINAL from `valhalla/jawn/src/managers/creditsManager.ts:129-132`:
SELECT spend / ${COST_PRECISION_MULTIPLIER / 100} as spend_cents
FROM organization_ptb_spend_mv FINAL
WHERE organization_id = {val_0 : String}
Update pattern (insert-to-update) from `valhalla/jawn/src/lib/stores/request/VersionedRequestStore.ts`:
// 1. Read existing row (with ORDER BY updated_at DESC LIMIT 1)
const rowContents = await clickhouseDb.dbQuery<RequestResponseRMT>(`
SELECT *
FROM request_response_rmt
WHERE request_id = {val_0: UUID}
ORDER BY updated_at DESC
LIMIT 1
`);
// 2. Insert modified copy (ReplacingMergeTree keeps the latest updated_at)
await clickhouseDb.dbInsertClickhouse("request_response_rmt", [{
...existingRow,
properties: newProperties,
updated_at: new Date().toISOString(),
}]);
The VersionedRequestStore also contains a TODO comment indicating this manual version control may be temporary: `// TODO Use final instead of hand rolling deduplication`.