Workflow:Helicone Helicone LLM Request Proxy Logging
| Knowledge Sources | |
|---|---|
| Domains | LLM_Ops, Observability, Proxy |
| Last Updated | 2026-02-14 06:00 GMT |
Overview
End-to-end process for intercepting LLM API requests through the Helicone proxy, forwarding them to any supported provider, and logging the full request/response lifecycle into the observability pipeline.
Description
This workflow describes the core data flow of the Helicone platform. A client application sends an LLM request to a Helicone proxy worker (deployed as a Cloudflare Worker) instead of directly to the LLM provider. The proxy transparently forwards the request, captures the response, and asynchronously logs the full interaction through a queue into the Jawn backend. Jawn processes the log by normalizing the provider format, calculating costs, storing request/response bodies in object storage, and inserting analytics data into ClickHouse. The logged data is then viewable through the Next.js web dashboard with filtering, search, and analytics capabilities.
The workflow supports all major LLM providers (OpenAI, Anthropic, Google, AWS Bedrock, Azure, and 25+ others) through a single proxy endpoint. It handles both streaming and non-streaming responses, caching, rate limiting, session tracking, and custom property tagging via Helicone-specific HTTP headers.
Usage
Execute this workflow whenever you need to observe, monitor, or debug LLM API calls from any application. This is the primary integration path for Helicone users: change the base URL of your LLM client to point to the Helicone proxy (or AI Gateway), add your Helicone API key, and all requests are automatically intercepted and logged without any other code changes.
Execution Steps
Step 1: Client Request Interception
The client application sends an LLM API request to a Helicone proxy endpoint instead of directly to the provider. The Cloudflare Worker receives the request, wraps it in a RequestWrapper that extracts Helicone-specific headers (authentication, target URL, session ID, custom properties, cache control, rate limit settings), and determines which router to use based on the worker type and hostname.
Key considerations:
- The proxy supports multiple worker types: OPENAI_PROXY, ANTHROPIC_PROXY, GATEWAY_API, AI_GATEWAY_API, and HELICONE_API
- Authentication is provided via the Helicone-Auth header with a Bearer token
- For the universal gateway, the target provider is specified via the Helicone-Target-Url header
- The AI Gateway uses model-based routing with automatic provider selection
Step 2: Request Routing and Validation
The router factory selects the appropriate handler based on the worker type. The handler validates the Helicone API key against the database, checks rate limits if configured, and looks up cached responses if caching is enabled. If a valid cached response exists, it is returned immediately without forwarding to the provider.
Key considerations:
- Rate limiting uses bucket-based token tracking
- Cache keys are derived from the request body hash and configured cache parameters
- The AI Gateway routing resolves model names to specific provider endpoints using the model registry
Step 3: Provider Communication
The ProxyForwarder constructs the outbound request to the LLM provider. For the AI Gateway, this involves transforming the request format if the target provider uses a different API format (e.g., converting OpenAI format to Anthropic format). The request is forwarded to the provider, and the response is streamed back to the client in real time.
Key considerations:
- Streaming responses are processed chunk-by-chunk through body processors
- The transform layer handles cross-provider format conversion (OpenAI to Anthropic, OpenAI to Google, etc.)
- Response metadata (status code, headers, timing) is captured during forwarding
Step 4: Asynchronous Log Queuing
After the response is fully delivered to the client, the worker asynchronously queues a log message containing the full request/response data. The message is sent to an Upstash Redis queue (or Kafka in some deployments) with the complete request body, response body, timing metadata, Helicone headers, and authentication context.
Key considerations:
- Logging is asynchronous to avoid adding latency to the client response
- The message format includes request/response bodies, timing data, and authentication context
- Failed log deliveries are retried with exponential backoff
Step 5: Backend Log Processing
The Jawn backend (Express.js server) consumes messages from the queue and processes each log entry. Processing involves: authenticating the organization, extracting token usage from the provider-specific response format using the appropriate UsageProcessor, normalizing the response using the LLM Mapper, calculating the request cost using the cost registry, and storing the full request/response bodies in MinIO (S3-compatible object storage).
Key considerations:
- Each provider has a dedicated UsageProcessor for extracting token counts from different response formats
- The LLM Mapper normalizes responses into a unified LlmSchema for consistent storage and display
- Cost calculation uses the model registry with tiered pricing and cache token multipliers
Step 6: Analytics Storage
The processed log entry metadata is inserted into both PostgreSQL (for application data, user lookups, and API key management) and ClickHouse (for high-performance analytics queries). ClickHouse stores the denormalized request/response metadata table that powers the dashboard analytics, including model, provider, token counts, cost, latency, status, and custom properties.
Key considerations:
- ClickHouse uses a cost precision multiplier of 1 billion to avoid floating-point issues
- Custom properties are stored as JSONB and indexed for efficient filtering
- PostgreSQL handles relational data like organizations, API keys, and user accounts
Step 7: Dashboard Visualization
Users access the Next.js web dashboard to view, filter, search, and analyze their logged LLM requests. The dashboard queries the Jawn API, which compiles user-defined filters into ClickHouse SQL using the filter system. Results are displayed with the LLM Mapper for human-readable formatting of request/response content across all provider formats.
Key considerations:
- The filter system supports complex queries across request metadata, properties, sessions, and custom fields
- Dashboard exports are available in Excel format
- The playground allows re-running requests with modified parameters