Workflow:Truera Trulens Custom App Instrumentation And Evaluation
| Knowledge Sources | |
|---|---|
| Domains | LLM_Ops, Evaluation, Instrumentation |
| Last Updated | 2026-02-14 08:00 GMT |
Overview
End-to-end process for instrumenting a custom Python LLM application with TruLens OTEL decorators, wrapping it with TruApp, and evaluating it with feedback functions.
Description
This workflow covers how to add observability to a custom Python class that does not use a supported framework (LangChain, LlamaIndex, LangGraph). It uses the @instrument decorator from TruLens to annotate methods with OpenTelemetry span types and attributes. The instrumented class is then wrapped with TruApp, which records traces during execution. Feedback functions are configured using Selectors that point to the decorated span attributes, enabling evaluation of retrieval quality, generation quality, and custom metrics.
Usage
Execute this workflow when you have a custom Python application (not built on LangChain, LlamaIndex, or LangGraph) that performs LLM operations such as retrieval and generation, and you want to add tracing and evaluation. This is the recommended path for any bespoke RAG pipeline, agent, or LLM wrapper class where you control the source code and need fine-grained control over which methods produce trace spans.
Execution Steps
Step 1: Initialize TruLens Session
Create a TruSession instance to manage database connections and trace collection. This must be done before any instrumented code executes so that the OTEL trace provider is properly configured.
Key considerations:
- TruSession is a singleton per process
- Configure database connector for production use (PostgreSQL, Snowflake)
- The session sets up the OTEL BatchSpanProcessor automatically
Step 2: Instrument Application Methods
Add the @instrument decorator to key methods in your custom class. Each decorator specifies a span type (RETRIEVAL, GENERATION, RECORD_ROOT, TOOL, AGENT, etc.) and maps method parameters and return values to span attributes using the SpanAttributes semantic conventions.
What happens:
- The RECORD_ROOT span type marks the application entry point and captures overall input/output
- RETRIEVAL spans capture query text and retrieved contexts
- GENERATION spans capture LLM completion calls
- Span attributes enable Selectors to extract data for feedback evaluation
Key considerations:
- Every instrumented app must have exactly one RECORD_ROOT span for on_input()/on_output() shortcuts to work
- Map method parameters to span attributes using parameter names as strings
- Use "return" as the attribute value to capture the method return value
- Nested method calls produce parent-child span relationships automatically
Step 3: Configure Feedback Provider and Functions
Set up a feedback provider and define feedback functions with explicit Selectors that point to the span types and attributes you decorated in Step 2. Since this is a custom app, you have full control over which spans and attributes are selected for evaluation.
Key considerations:
- Use Selector objects with span_type and span_attribute to target specific decorated methods
- For standard patterns, on_input()/on_output() shortcuts work if RECORD_ROOT is properly defined
- Custom feedback functions can be any callable that returns a float between 0.0 and 1.0
- Aggregation methods (np.mean, np.min) control how per-chunk scores are combined
Step 4: Wrap Application With TruApp
Wrap the custom class instance with TruApp, passing the app_name, app_version, and list of feedback functions. TruApp works with any Python object whose methods have been decorated with @instrument.
Key considerations:
- Pass the class instance (not the class itself) to TruApp
- app_name and app_version enable experiment tracking and comparison
- TruApp does not auto-instrument; only @instrument-decorated methods produce spans
Step 5: Record and Evaluate
Execute the application within the TruApp recording context. Each invocation produces a complete trace with the spans defined by your @instrument decorators. Feedback functions evaluate the trace asynchronously.
Key considerations:
- Use the context manager: with tru_app as recording
- Both sync and async methods are supported
- Traces and feedback results are persisted to the configured database
Step 6: Review Results
Retrieve feedback results programmatically or launch the TruLens dashboard. The trace viewer shows the span hierarchy matching your @instrument structure, and feedback scores are displayed alongside each record.
Key considerations:
- Use retrieve_feedback_results() to wait for evaluation completion
- The dashboard span tree reflects your custom instrumentation hierarchy
- Compare versions to measure the impact of code changes