Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Langfuse Langfuse ClickHouse Analytics

From Leeroopedia
Knowledge Sources
Domains Infrastructure, Analytics_Database
Last Updated 2026-02-14 06:00 GMT

Overview

ClickHouse analytics database (version 24.3 to 25.8) for high-volume trace, observation, score, and event storage with configurable connection pooling, async inserts, and cluster support.

Description

ClickHouse serves as the analytics database for Langfuse, storing all high-volume tracing data including traces, observations (spans/generations), scores, and events. It is accessed via the @clickhouse/client npm package (v1.13.0) with a singleton connection pattern. The system supports read-only replicas, cluster mode, and configurable deletion strategies (alter_update, lightweight_update, lightweight_update_force). Migrations use shell scripts in packages/shared/clickhouse/scripts/.

Usage

Use this environment for all Langfuse deployments. ClickHouse is mandatory for storing and querying trace data, observation metrics, scores, and dashboard analytics. It works alongside PostgreSQL which handles relational/configuration data.

System Requirements

Category Requirement Notes
Database ClickHouse 24.3+ Development uses 25.8; production tested with 24.3+
Disk 50GB+ SSD High IOPS required; scales with trace volume
RAM 4GB+ ClickHouse is memory-intensive for GROUP BY operations
Network TCP ports 8123 (HTTP), 9000 (native) HTTP used by Node.js client

Dependencies

System Packages

  • clickhouse-server >= 24.3 (via Docker: clickhouse/clickhouse-server)

Node.js Packages

  • @clickhouse/client = 1.13.0

Credentials

The following environment variables must be set:

  • CLICKHOUSE_URL: (Required) ClickHouse HTTP endpoint (e.g., http://localhost:8123)
  • CLICKHOUSE_USER: (Required) ClickHouse auth user
  • CLICKHOUSE_PASSWORD: (Required) ClickHouse auth password
  • CLICKHOUSE_DB: Database name (default: default)
  • CLICKHOUSE_CLUSTER_NAME: Cluster name (default: default)
  • CLICKHOUSE_CLUSTER_ENABLED: Enable cluster mode (default: true)
  • CLICKHOUSE_READ_ONLY_URL: Read-only replica URL for legacy tables (optional)
  • CLICKHOUSE_EVENTS_READ_ONLY_URL: Read-only replica URL for events table (optional)

Performance Tuning

  • CLICKHOUSE_KEEP_ALIVE_IDLE_SOCKET_TTL: Idle socket timeout in ms (default: 9000)
  • CLICKHOUSE_MAX_OPEN_CONNECTIONS: Connection pool size (default: 25)
  • CLICKHOUSE_MAX_BYTES_BEFORE_EXTERNAL_GROUP_BY: Memory limit for GROUP BY (default: 32,000,000,000 bytes / ~32GB)
  • CLICKHOUSE_ASYNC_INSERT_MAX_DATA_SIZE: Max data size for async insert (optional)
  • CLICKHOUSE_ASYNC_INSERT_BUSY_TIMEOUT_MS: Timeout for async insert busy (optional)
  • CLICKHOUSE_LIGHTWEIGHT_DELETE_MODE: Deletion strategy (default: alter_update)

Quick Install

# Start ClickHouse via Docker Compose
pnpm run infra:dev:up

# Apply all pending ClickHouse migrations
cd packages/shared
bash clickhouse/scripts/up.sh

# (Optional) Create dev-only experimental tables
bash clickhouse/scripts/dev-tables.sh

Code Evidence

ClickHouse client initialization from packages/shared/src/server/clickhouse/client.ts:

const client = createClient({
  keep_alive: {
    idle_socket_ttl: env.CLICKHOUSE_KEEP_ALIVE_IDLE_SOCKET_TTL, // Default: 9000ms
  },
  max_open_connections: env.CLICKHOUSE_MAX_OPEN_CONNECTIONS,     // Default: 25
  clickhouse_settings: {
    async_insert: 1,
    wait_for_async_insert: 1,
  },
});

Deletion timeout from packages/shared/src/env.ts:

LANGFUSE_CLICKHOUSE_DELETION_TIMEOUT_MS: z.coerce.number().default(600_000), // 10 minutes
LANGFUSE_CLICKHOUSE_QUERY_MAX_ATTEMPTS: z.coerce.number().default(3),

Common Errors

Error Message Cause Solution
Connection refused on port 8123 ClickHouse not running Run pnpm run infra:dev:up
MEMORY_LIMIT_EXCEEDED Query exceeds memory limit Increase CLICKHOUSE_MAX_BYTES_BEFORE_EXTERNAL_GROUP_BY or optimize query
Socket hang up Connection timeout on long queries Retried automatically up to LANGFUSE_CLICKHOUSE_QUERY_MAX_ATTEMPTS (default: 3)
READONLY Connected to read-only replica for write Check CLICKHOUSE_URL points to primary node

Compatibility Notes

  • Cluster Mode: Controlled by CLICKHOUSE_CLUSTER_ENABLED. When enabled, DDL and data operations target the cluster.
  • Read Replicas: Separate URLs for read-only access to legacy tables (CLICKHOUSE_READ_ONLY_URL) and events table (CLICKHOUSE_EVENTS_READ_ONLY_URL).
  • Deletion Strategies: Three modes available via CLICKHOUSE_LIGHTWEIGHT_DELETE_MODE: alter_update (default, safest), lightweight_update, and lightweight_update_force.
  • Long Queries: Queries exceeding 30 seconds automatically enable HTTP progress headers at 10-second intervals.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment