Workflow:Truera Trulens Custom App Instrumentation And Evaluation

Knowledge Sources	TruLens TruLens Instrumentation Guide OTEL Semantic Conventions
Domains	LLM_Ops, Evaluation, Instrumentation
Last Updated	2026-02-14 08:00 GMT

Overview

End-to-end process for instrumenting a custom Python LLM application with TruLens OTEL decorators, wrapping it with TruApp, and evaluating it with feedback functions.

Description

This workflow covers how to add observability to a custom Python class that does not use a supported framework (LangChain, LlamaIndex, LangGraph). It uses the @instrument decorator from TruLens to annotate methods with OpenTelemetry span types and attributes. The instrumented class is then wrapped with TruApp, which records traces during execution. Feedback functions are configured using Selectors that point to the decorated span attributes, enabling evaluation of retrieval quality, generation quality, and custom metrics.

Usage

Execute this workflow when you have a custom Python application (not built on LangChain, LlamaIndex, or LangGraph) that performs LLM operations such as retrieval and generation, and you want to add tracing and evaluation. This is the recommended path for any bespoke RAG pipeline, agent, or LLM wrapper class where you control the source code and need fine-grained control over which methods produce trace spans.

Execution Steps

Step 1: Initialize TruLens Session

Create a TruSession instance to manage database connections and trace collection. This must be done before any instrumented code executes so that the OTEL trace provider is properly configured.

Key considerations:

TruSession is a singleton per process
Configure database connector for production use (PostgreSQL, Snowflake)
The session sets up the OTEL BatchSpanProcessor automatically

Step 2: Instrument Application Methods

Add the @instrument decorator to key methods in your custom class. Each decorator specifies a span type (RETRIEVAL, GENERATION, RECORD_ROOT, TOOL, AGENT, etc.) and maps method parameters and return values to span attributes using the SpanAttributes semantic conventions.

What happens:

The RECORD_ROOT span type marks the application entry point and captures overall input/output
RETRIEVAL spans capture query text and retrieved contexts
GENERATION spans capture LLM completion calls
Span attributes enable Selectors to extract data for feedback evaluation

Key considerations:

Every instrumented app must have exactly one RECORD_ROOT span for on_input()/on_output() shortcuts to work
Map method parameters to span attributes using parameter names as strings
Use "return" as the attribute value to capture the method return value
Nested method calls produce parent-child span relationships automatically

Step 3: Configure Feedback Provider and Functions

Set up a feedback provider and define feedback functions with explicit Selectors that point to the span types and attributes you decorated in Step 2. Since this is a custom app, you have full control over which spans and attributes are selected for evaluation.

Key considerations:

Use Selector objects with span_type and span_attribute to target specific decorated methods
For standard patterns, on_input()/on_output() shortcuts work if RECORD_ROOT is properly defined
Custom feedback functions can be any callable that returns a float between 0.0 and 1.0
Aggregation methods (np.mean, np.min) control how per-chunk scores are combined

Step 4: Wrap Application With TruApp

Wrap the custom class instance with TruApp, passing the app_name, app_version, and list of feedback functions. TruApp works with any Python object whose methods have been decorated with @instrument.

Key considerations:

Pass the class instance (not the class itself) to TruApp
app_name and app_version enable experiment tracking and comparison
TruApp does not auto-instrument; only @instrument-decorated methods produce spans

Step 5: Record and Evaluate

Execute the application within the TruApp recording context. Each invocation produces a complete trace with the spans defined by your @instrument decorators. Feedback functions evaluate the trace asynchronously.

Key considerations:

Use the context manager: with tru_app as recording
Both sync and async methods are supported
Traces and feedback results are persisted to the configured database

Step 6: Review Results

Retrieve feedback results programmatically or launch the TruLens dashboard. The trace viewer shows the span hierarchy matching your @instrument structure, and feedback scores are displayed alongside each record.

Key considerations:

Use retrieve_feedback_results() to wait for evaluation completion
The dashboard span tree reflects your custom instrumentation hierarchy
Compare versions to measure the impact of code changes

Execution Diagram

GitHub URL

Workflow Repository