Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Truera Trulens Feedback Tool Selection

From Leeroopedia
Knowledge Sources
Domains Agent_Evaluation, LLM_Evaluation
Last Updated 2026-02-14 08:00 GMT

Overview

Concrete tool for evaluating the quality of tool selection in agentic LLM traces using rubric-based LLM judging, provided by the trulens-feedback library.

Description

The tool_selection_with_cot_reasons method on LLMProvider evaluates the quality of tool selection decisions in an agent's execution trace. It uses a rubric-based prompt to instruct an LLM judge to analyze the full trace and rate how well the agent selected tools.

The method supports trace compression to reduce token usage — large traces are compressed to preserve essential information while removing redundant data.

Usage

Use this feedback method when evaluating LangGraph agents or other agentic workflows. Bind it to a trace-level Selector to pass the full execution trace for evaluation. It is the primary component of the Agent GPA metric suite.

Code Reference

Source Location

  • Repository: trulens
  • File: src/feedback/trulens/feedback/llm_provider.py
  • Lines: L3523-3606

Signature

class LLMProvider:
    def tool_selection_with_cot_reasons(
        self,
        trace: Union[Trace, str],
        criteria: Optional[str] = None,
        additional_instructions: Optional[str] = None,
        examples: Optional[List[Tuple[Dict[str, str], int]]] = None,
        min_score_val: int = 0,
        max_score_val: int = 3,
        temperature: float = 0.0,
        enable_trace_compression: bool = True,
        **kwargs,
    ) -> Tuple[float, Dict]:
        """Evaluate tool selection quality in an agentic trace.

        Args:
            trace: The trace to evaluate (Trace object or JSON string).
            criteria: Custom evaluation criteria (overrides default rubric).
            additional_instructions: Extra instructions for the judge.
            examples: Few-shot examples for evaluation.
            min_score_val: Minimum score (default: 0).
            max_score_val: Maximum score (default: 3).
            temperature: LLM temperature (default: 0.0).
            enable_trace_compression: Compress trace to reduce tokens (default: True).

        Returns:
            Tuple of (normalized_score, reasoning_dict) where score is 0.0-1.0.
        """

Import

from trulens.providers.openai import OpenAI

provider = OpenAI()
# Method accessed as: provider.tool_selection_with_cot_reasons

I/O Contract

Inputs

Name Type Required Description
trace Union[Trace, str] Yes Full agent execution trace (Trace object or JSON string)
criteria str No Custom evaluation criteria
additional_instructions str No Extra instructions for the LLM judge
enable_trace_compression bool No Compress trace data (default: True)
min_score_val int No Minimum score value (default: 0)
max_score_val int No Maximum score value (default: 3)

Outputs

Name Type Description
return Tuple[float, Dict] (normalized_score between 0.0-1.0, dict with reasoning/evidence)

Usage Examples

Agent Tool Selection Metric

from trulens.core import Feedback
from trulens.core.feedback.selector import Selector
from trulens.providers.openai import OpenAI

provider = OpenAI()

# Define tool selection metric with trace-level selector
f_tool_selection = Feedback(
    provider.tool_selection_with_cot_reasons,
    name="tool_selection"
).on(**{
    "trace": Selector(trace_level=True),
})

Full Agent GPA Metrics

from trulens.core import Feedback
from trulens.core.feedback.selector import Selector

provider = OpenAI()

# Tool Selection (trace-level)
f_tool_selection = Feedback(
    provider.tool_selection_with_cot_reasons,
    name="tool_selection"
).on(trace=Selector(trace_level=True))

# Answer Relevance (input-output)
f_relevance = Feedback(
    provider.relevance,
    name="answer_relevance"
).on_input_output()

# Groundedness (context-output)
f_groundedness = Feedback(
    provider.groundedness_measure_with_cot_reasons,
    name="groundedness"
).on(
    source=Selector.select_context(collect_list=True),
    statement=Selector.select_record_output()
)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment