Workflow:Wandb Weave LLM Integration Tracing
| Knowledge Sources | |
|---|---|
| Domains | LLM_Ops, Observability, Integrations |
| Last Updated | 2026-02-14 11:00 GMT |
Overview
End-to-end process for automatically tracing LLM provider API calls (OpenAI, Anthropic, Google, Cohere, and 20+ others) through Weave's integration patching system.
Description
This workflow covers how Weave's integration system transparently intercepts calls to popular LLM provider SDKs and logs them as traced operations. The system uses a two-phase patching strategy: implicit patching (automatic via import hooks) and explicit patching (manual function calls). When patched, every API call to a supported provider is automatically captured with full request and response details, token usage statistics, latency measurements, and streaming chunk accumulation. No changes to existing application code are required beyond initializing Weave.
Usage
Execute this workflow when you are using one or more LLM provider SDKs (such as OpenAI, Anthropic, Google GenAI, Cohere, Mistral, Groq, AWS Bedrock, or others) and want to capture all API interactions without modifying your existing calling code. This is useful for debugging prompt behavior, monitoring token costs, analyzing latency, and auditing LLM usage in production.
Execution Steps
Step 1: Initialize Weave
Initialize the Weave client with a project name. By default, this enables implicit patching, which registers an import hook that intercepts future imports of supported LLM libraries and patches their API methods automatically.
Key considerations:
- Implicit patching is enabled by default; it can be disabled via the implicitly_patch_integrations setting
- If the LLM SDK is already imported before weave.init(), the system checks sys.modules and patches retroactively
- The import hook uses Python's sys.meta_path mechanism for zero-overhead interception
Step 2: Configure Integration Settings
Optionally configure integration-specific settings to control what gets traced. Each integration supports settings for enabling or disabling specific operations, controlling what metadata is captured, and customizing how calls are displayed in the UI.
Key considerations:
- Default settings trace all operations for all enabled integrations
- Explicit patching allows per-integration settings: weave.integrations.patch_openai(settings)
- Settings control whether to capture full request/response bodies, token usage, or just metadata
- Multiple integrations can be configured independently
Step 3: Make LLM API Calls
Use the LLM provider SDK normally. The patched methods intercept each call, create a Weave trace record, execute the original API call, and log the result. Both synchronous and streaming calls are supported.
Key considerations:
- Standard (non-streaming) calls log the complete request and response as a single trace entry
- Streaming calls use an accumulator to collect chunks and log the final aggregated result
- Async variants are supported alongside their sync counterparts
- Tool-calling and function-calling patterns are captured with full argument details
Step 4: Review Traced Calls
Examine the captured traces in the Weave UI. Each LLM call appears as a traced operation with the provider name, model identifier, input messages, output response, token usage breakdown, and latency. Calls made within traced functions appear as children in the call tree.
Key considerations:
- Token usage (prompt tokens, completion tokens, total tokens) is extracted from provider responses
- Custom display names show the model name for quick identification
- Streaming responses show the fully accumulated result, not individual chunks
- Integration traces integrate seamlessly with @weave.op-decorated function call trees
Step 5: Undo Patching
Optionally remove the integration patches when they are no longer needed. Each patcher provides an undo_patch() method that restores the original SDK methods. This is primarily used in testing contexts.
Key considerations:
- In production, patching typically remains active for the lifetime of the process
- In tests, use fixture-based setup and teardown for clean isolation
- Undoing patches restores the original SDK behavior completely