Implementation:Confident ai Deepeval StepEfficiencyMetric

**Metadata**
Knowledge Sources	DeepEval
Domains	LLM_Evaluation AI_Agents
Last Updated	2026-02-14 09:00 GMT

Overview

Concrete evaluation metric class that measures whether an AI agent completes tasks with minimal unnecessary steps. The StepEfficiencyMetric analyzes the agent's execution trace to identify redundant operations, circular reasoning, and wasted computation, scoring the overall efficiency of the agent's action sequence.

Description

The StepEfficiencyMetric evaluates the efficiency of the agent's execution path. It examines the trace of steps taken by the agent and uses an LLM-as-judge approach to assess whether each step was necessary and whether the overall execution was reasonably efficient.

Key capabilities:

Trace-based analysis -- examines the full sequence of agent steps including LLM calls, tool invocations, and reasoning steps.
Redundancy detection -- identifies unnecessary or repeated operations in the execution trace.
Efficiency scoring -- produces a continuous score reflecting the proportion of useful steps relative to total steps.
Reason generation -- provides human-readable explanations of inefficiencies found in the trace.

Usage

Import and instantiate for agent evaluation:

from deepeval.metrics import StepEfficiencyMetric

Code Reference

Source Location

Repository: confident-ai/deepeval
File: deepeval/metrics/step_efficiency/step_efficiency.py (lines 22--225)

Signature

class StepEfficiencyMetric(BaseMetric):
    def __init__(
        self,
        threshold: float = 0.5,
        model: Optional[str] = None,
        include_reason: bool = True,
        async_mode: bool = True,
        strict_mode: bool = False,
        verbose_mode: bool = False,
    ):
        ...

Import

from deepeval.metrics import StepEfficiencyMetric

Parent Class

BaseMetric

I/O Contract

Inputs (Constructor Parameters)

**Input Contract**
Name	Type	Default	Description
`threshold`	float	`0.5`	Minimum score (0--1) for the evaluation to pass.
`model`	Optional[str]	`None`	LLM model to use as the evaluation judge. Falls back to default if not specified.
`include_reason`	bool	`True`	Whether to generate a human-readable reason for the score.
`async_mode`	bool	`True`	Whether to run evaluation asynchronously.
`strict_mode`	bool	`False`	When enabled, scores are binarized to 0 or 1 based on the threshold.
`verbose_mode`	bool	`False`	When enabled, prints detailed evaluation information during execution.

Outputs

**Output Contract**
Name	Type	Description
score	float	A value between 0 and 1 indicating the efficiency of the agent's execution path.
reason	Optional[str]	Human-readable explanation identifying inefficiencies (when `include_reason=True`).
success	bool	Whether the score meets or exceeds the threshold.

Usage Examples

Example 1: Basic Step Efficiency Evaluation

Create a metric with default settings.

from deepeval.metrics import StepEfficiencyMetric

metric = StepEfficiencyMetric(threshold=0.5)

Example 2: Custom Model and Threshold

Use a specific judge model with a higher efficiency threshold.

from deepeval.metrics import StepEfficiencyMetric

metric = StepEfficiencyMetric(
    threshold=0.7,
    model="gpt-4o",
    include_reason=True,
)

The model="gpt-4o" parameter specifies the LLM used for evaluation.
The threshold=0.7 requires a high efficiency score for a passing evaluation.

Example 3: Combined with Framework Instrumentation

Use with a LangChain callback handler for automatic efficiency evaluation.

from deepeval.metrics import StepEfficiencyMetric, TaskCompletionMetric
from deepeval.integrations.langchain import CallbackHandler

handler = CallbackHandler(
    metrics=[TaskCompletionMetric(), StepEfficiencyMetric()],
    name="my-agent",
)
agent.invoke({"input": "Plan a trip to Tokyo"}, config={"callbacks": [handler]})

Both task completion and step efficiency are evaluated together, providing a comprehensive view of agent quality.

Related Pages

Principle:Confident_ai_Deepeval_Step_Efficiency_Evaluation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment