Implementation:Truera Trulens Feedback Tool Selection
| Knowledge Sources | |
|---|---|
| Domains | Agent_Evaluation, LLM_Evaluation |
| Last Updated | 2026-02-14 08:00 GMT |
Overview
Concrete tool for evaluating the quality of tool selection in agentic LLM traces using rubric-based LLM judging, provided by the trulens-feedback library.
Description
The tool_selection_with_cot_reasons method on LLMProvider evaluates the quality of tool selection decisions in an agent's execution trace. It uses a rubric-based prompt to instruct an LLM judge to analyze the full trace and rate how well the agent selected tools.
The method supports trace compression to reduce token usage — large traces are compressed to preserve essential information while removing redundant data.
Usage
Use this feedback method when evaluating LangGraph agents or other agentic workflows. Bind it to a trace-level Selector to pass the full execution trace for evaluation. It is the primary component of the Agent GPA metric suite.
Code Reference
Source Location
- Repository: trulens
- File: src/feedback/trulens/feedback/llm_provider.py
- Lines: L3523-3606
Signature
class LLMProvider:
def tool_selection_with_cot_reasons(
self,
trace: Union[Trace, str],
criteria: Optional[str] = None,
additional_instructions: Optional[str] = None,
examples: Optional[List[Tuple[Dict[str, str], int]]] = None,
min_score_val: int = 0,
max_score_val: int = 3,
temperature: float = 0.0,
enable_trace_compression: bool = True,
**kwargs,
) -> Tuple[float, Dict]:
"""Evaluate tool selection quality in an agentic trace.
Args:
trace: The trace to evaluate (Trace object or JSON string).
criteria: Custom evaluation criteria (overrides default rubric).
additional_instructions: Extra instructions for the judge.
examples: Few-shot examples for evaluation.
min_score_val: Minimum score (default: 0).
max_score_val: Maximum score (default: 3).
temperature: LLM temperature (default: 0.0).
enable_trace_compression: Compress trace to reduce tokens (default: True).
Returns:
Tuple of (normalized_score, reasoning_dict) where score is 0.0-1.0.
"""
Import
from trulens.providers.openai import OpenAI
provider = OpenAI()
# Method accessed as: provider.tool_selection_with_cot_reasons
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| trace | Union[Trace, str] | Yes | Full agent execution trace (Trace object or JSON string) |
| criteria | str | No | Custom evaluation criteria |
| additional_instructions | str | No | Extra instructions for the LLM judge |
| enable_trace_compression | bool | No | Compress trace data (default: True) |
| min_score_val | int | No | Minimum score value (default: 0) |
| max_score_val | int | No | Maximum score value (default: 3) |
Outputs
| Name | Type | Description |
|---|---|---|
| return | Tuple[float, Dict] | (normalized_score between 0.0-1.0, dict with reasoning/evidence) |
Usage Examples
Agent Tool Selection Metric
from trulens.core import Feedback
from trulens.core.feedback.selector import Selector
from trulens.providers.openai import OpenAI
provider = OpenAI()
# Define tool selection metric with trace-level selector
f_tool_selection = Feedback(
provider.tool_selection_with_cot_reasons,
name="tool_selection"
).on(**{
"trace": Selector(trace_level=True),
})
Full Agent GPA Metrics
from trulens.core import Feedback
from trulens.core.feedback.selector import Selector
provider = OpenAI()
# Tool Selection (trace-level)
f_tool_selection = Feedback(
provider.tool_selection_with_cot_reasons,
name="tool_selection"
).on(trace=Selector(trace_level=True))
# Answer Relevance (input-output)
f_relevance = Feedback(
provider.relevance,
name="answer_relevance"
).on_input_output()
# Groundedness (context-output)
f_groundedness = Feedback(
provider.groundedness_measure_with_cot_reasons,
name="groundedness"
).on(
source=Selector.select_context(collect_list=True),
statement=Selector.select_record_output()
)
Related Pages
Implements Principle
Requires Environment
- Environment:Truera_Trulens_Python_Core_Environment
- Environment:Truera_Trulens_OpenAI_Provider_Environment