Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Confident ai Deepeval StepEfficiencyMetric

From Leeroopedia
Metadata
Knowledge Sources
Domains
Last Updated 2026-02-14 09:00 GMT

Overview

Concrete evaluation metric class that measures whether an AI agent completes tasks with minimal unnecessary steps. The StepEfficiencyMetric analyzes the agent's execution trace to identify redundant operations, circular reasoning, and wasted computation, scoring the overall efficiency of the agent's action sequence.

Description

The StepEfficiencyMetric evaluates the efficiency of the agent's execution path. It examines the trace of steps taken by the agent and uses an LLM-as-judge approach to assess whether each step was necessary and whether the overall execution was reasonably efficient.

Key capabilities:

  • Trace-based analysis -- examines the full sequence of agent steps including LLM calls, tool invocations, and reasoning steps.
  • Redundancy detection -- identifies unnecessary or repeated operations in the execution trace.
  • Efficiency scoring -- produces a continuous score reflecting the proportion of useful steps relative to total steps.
  • Reason generation -- provides human-readable explanations of inefficiencies found in the trace.

Usage

Import and instantiate for agent evaluation:

from deepeval.metrics import StepEfficiencyMetric

Code Reference

Source Location

  • Repository: confident-ai/deepeval
  • File: deepeval/metrics/step_efficiency/step_efficiency.py (lines 22--225)

Signature

class StepEfficiencyMetric(BaseMetric):
    def __init__(
        self,
        threshold: float = 0.5,
        model: Optional[str] = None,
        include_reason: bool = True,
        async_mode: bool = True,
        strict_mode: bool = False,
        verbose_mode: bool = False,
    ):
        ...

Import

from deepeval.metrics import StepEfficiencyMetric

Parent Class

  • BaseMetric

I/O Contract

Inputs (Constructor Parameters)

Input Contract
Name Type Default Description
threshold float 0.5 Minimum score (0--1) for the evaluation to pass.
model Optional[str] None LLM model to use as the evaluation judge. Falls back to default if not specified.
include_reason bool True Whether to generate a human-readable reason for the score.
async_mode bool True Whether to run evaluation asynchronously.
strict_mode bool False When enabled, scores are binarized to 0 or 1 based on the threshold.
verbose_mode bool False When enabled, prints detailed evaluation information during execution.

Outputs

Output Contract
Name Type Description
score float A value between 0 and 1 indicating the efficiency of the agent's execution path.
reason Optional[str] Human-readable explanation identifying inefficiencies (when include_reason=True).
success bool Whether the score meets or exceeds the threshold.

Usage Examples

Example 1: Basic Step Efficiency Evaluation

Create a metric with default settings.

from deepeval.metrics import StepEfficiencyMetric

metric = StepEfficiencyMetric(threshold=0.5)

Example 2: Custom Model and Threshold

Use a specific judge model with a higher efficiency threshold.

from deepeval.metrics import StepEfficiencyMetric

metric = StepEfficiencyMetric(
    threshold=0.7,
    model="gpt-4o",
    include_reason=True,
)
  • The model="gpt-4o" parameter specifies the LLM used for evaluation.
  • The threshold=0.7 requires a high efficiency score for a passing evaluation.

Example 3: Combined with Framework Instrumentation

Use with a LangChain callback handler for automatic efficiency evaluation.

from deepeval.metrics import StepEfficiencyMetric, TaskCompletionMetric
from deepeval.integrations.langchain import CallbackHandler

handler = CallbackHandler(
    metrics=[TaskCompletionMetric(), StepEfficiencyMetric()],
    name="my-agent",
)
agent.invoke({"input": "Plan a trip to Tokyo"}, config={"callbacks": [handler]})
  • Both task completion and step efficiency are evaluated together, providing a comprehensive view of agent quality.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment