Environment:Wandb Weave Trace Server Infrastructure
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Backend, Database |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
Server-side infrastructure environment with ClickHouse, Kafka, and cloud storage (S3/Azure/GCP) for the Weave trace server.
Description
This environment defines the infrastructure requirements for running the Weave trace server component. It requires ClickHouse as the primary data store, optional Kafka for event streaming and online evaluation, and supports Bring-Your-Own-Bucket (BYOB) file storage via AWS S3, Azure Blob Storage, or Google Cloud Storage. The trace server also includes ddtrace for APM, litellm for LLM scoring support, and OpenTelemetry libraries for trace ingestion.
Usage
Use this environment when self-hosting the Weave trace server or when deploying the server-side components. This is not required for SDK-only usage that connects to the hosted Wandb cloud service. It is the prerequisite for the build and publish implementations in the SDK Release workflow.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (Ubuntu recommended) | Production server deployment |
| ClickHouse | ClickHouse server | Default: `localhost:8123`, configurable via env vars |
| Kafka | Apache Kafka (optional) | Required only for online evaluation and event streaming |
| Network | Accessible storage endpoints | S3, Azure Blob, or GCS for BYOB file storage |
| Disk | SSD recommended | High IOPS for ClickHouse and file caching |
Dependencies
System Packages
- ClickHouse server
- Apache Kafka (optional)
Python Packages
- `ddtrace` >= 2.7.0
- `boto3` >= 1.34.0 (BYOB S3)
- `azure-storage-blob` >= 12.24.0, < 12.26.0 (BYOB Azure)
- `google-cloud-storage` >= 2.7.0 (BYOB GCP)
- `litellm` >= 1.36.1 (LLM scoring support)
- `opentelemetry-proto` >= 1.12.0
- `opentelemetry-semantic-conventions-ai` >= 0.4.3
- `openinference-semantic-conventions` >= 0.1.17
- `emoji` >= 2.12.1
Credentials
ClickHouse
- `WF_CLICKHOUSE_HOST`: ClickHouse hostname (default: `localhost`)
- `WF_CLICKHOUSE_PORT`: ClickHouse port (default: `8123`)
- `WF_CLICKHOUSE_USER`: ClickHouse username (default: `default`)
- `WF_CLICKHOUSE_PASS`: ClickHouse password (default: empty)
- `WF_CLICKHOUSE_DATABASE`: Database name (default: `default`)
Kafka
- `KAFKA_BROKER_HOST`: Kafka broker hostname (default: `localhost`)
- `KAFKA_BROKER_PORT`: Kafka broker port (default: `9092`)
- `KAFKA_CLIENT_USER`: Kafka authentication username (optional)
- `KAFKA_CLIENT_PASSWORD`: Kafka authentication password (optional)
AWS S3 (BYOB)
- `WF_FILE_STORAGE_URI`: S3 bucket URI
- `WF_FILE_STORAGE_AWS_ACCESS_KEY_ID`: AWS access key
- `WF_FILE_STORAGE_AWS_SECRET_ACCESS_KEY`: AWS secret key
- `WF_FILE_STORAGE_AWS_SESSION_TOKEN`: AWS session token (optional)
- `WF_FILE_STORAGE_AWS_KMS_KEY`: KMS encryption key (optional)
- `WF_FILE_STORAGE_AWS_REGION`: AWS region
Azure Blob (BYOB)
- `WF_FILE_STORAGE_AZURE_CONNECTION_STRING`: Azure connection string
- `WF_FILE_STORAGE_AZURE_ACCESS_KEY`: Azure access key
- `WF_FILE_STORAGE_AZURE_ACCOUNT_URL`: Azure account URL
GCP (BYOB)
- `WF_FILE_STORAGE_GCP_CREDENTIALS_JSON_B64`: Base64-encoded GCP credentials JSON
Feature Flags
- `WEAVE_ENABLE_ONLINE_EVAL`: Enable online evaluation worker (default: `false`)
- `WF_SCORING_WORKER_BATCH_SIZE`: Scoring worker batch size (default: `100`)
- `WF_SCORING_WORKER_BATCH_TIMEOUT`: Scoring worker batch timeout in seconds (default: `5`)
- `WF_FILE_STORAGE_PROJECT_ALLOW_LIST`: Comma-separated project IDs for BYOB
- `WF_FILE_STORAGE_PROJECT_RAMP_PCT`: BYOB rollout percentage (0-100)
Quick Install
# Install Weave with trace server dependencies
pip install "weave[trace_server]"
Code Evidence
ClickHouse configuration from `weave/trace_server/environment.py`:
wf_clickhouse_host: str = "localhost"
wf_clickhouse_port: int = 8123
wf_clickhouse_user: str = "default"
wf_clickhouse_pass: str = ""
wf_clickhouse_database: str = "default"
Kafka configuration from `weave/trace_server/environment.py`:
kafka_broker_host: str = "localhost"
kafka_broker_port: int = 9092
BYOB storage configuration from `weave/trace_server/environment.py`:
wf_file_storage_uri: str | None = None # S3/Azure/GCP URI
wf_file_storage_project_allow_list: list[str] = []
wf_file_storage_project_ramp_pct: int = 0
Trace server dependencies from `pyproject.toml:71-87`:
trace_server = [
"ddtrace>=2.7.0",
"boto3>=1.34.0",
"azure-storage-blob>=12.24.0,<12.26.0",
"google-cloud-storage>=2.7.0",
"litellm>=1.36.1",
"opentelemetry-proto>=1.12.0",
"opentelemetry-semantic-conventions-ai>=0.4.3",
"openinference-semantic-conventions>=0.1.17",
"emoji>=2.12.1",
]
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| ClickHouse connection refused | ClickHouse not running or wrong host/port | Verify `WF_CLICKHOUSE_HOST` and `WF_CLICKHOUSE_PORT` |
| Kafka broker unavailable | Kafka not running or wrong broker config | Verify `KAFKA_BROKER_HOST` and `KAFKA_BROKER_PORT` |
| S3 access denied | Invalid AWS credentials | Verify `WF_FILE_STORAGE_AWS_*` environment variables |
| Azure authentication error | Invalid connection string or access key | Verify `WF_FILE_STORAGE_AZURE_*` environment variables |
Compatibility Notes
- ClickHouse Replication: Set `WF_CLICKHOUSE_REPLICATED=true` for replicated setups; requires `WF_CLICKHOUSE_REPLICATED_PATH` and `WF_CLICKHOUSE_REPLICATED_CLUSTER`.
- Distributed Tables: Set `WF_CLICKHOUSE_USE_DISTRIBUTED_TABLES=true` for sharded ClickHouse clusters.
- Memory Limits: Use `WF_CLICKHOUSE_MAX_MEMORY_USAGE` and `WF_CLICKHOUSE_MAX_EXECUTION_TIME` to constrain resource usage.
- BYOB Rollout: Use `WF_FILE_STORAGE_PROJECT_RAMP_PCT` (0-100) for gradual rollout of BYOB storage.