Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Confident ai Deepeval GEval

From Leeroopedia

Overview

GEval is an API class in the deepeval library that implements the G-Eval methodology for LLM-based evaluation with custom criteria and evaluation steps. It allows practitioners to define flexible, domain-specific metrics by specifying natural language criteria and optional step-by-step evaluation instructions that guide the judge LLM through chain-of-thought scoring.

This is an API Doc implementation.

Source

Import

from deepeval.metrics import GEval
from deepeval.test_case import LLMTestCaseParams

Constructor Signature

GEval(
    name: str,
    evaluation_params: List[LLMTestCaseParams],
    criteria: Optional[str] = None,
    evaluation_steps: Optional[List[str]] = None,
    model: Optional[Union[str, DeepEvalBaseLLM]] = None,
    threshold: float = 0.5,
    strict_mode: bool = False,
    async_mode: bool = True,
    verbose_mode: bool = None
)

Parameters

Parameter Type Required Default Description
name str Yes -- A descriptive name for the metric (e.g., "Coherence", "Helpfulness").
evaluation_params List[LLMTestCaseParams] Yes -- Specifies which fields of the test case to include in the evaluation prompt. Common values: LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT, LLMTestCaseParams.EXPECTED_OUTPUT.
criteria Optional[str] No None Natural language description of what the metric should evaluate (e.g., "Determine whether the output is coherent and logically structured").
evaluation_steps Optional[List[str]] No None Ordered list of step-by-step instructions for the judge LLM to follow during evaluation. If not provided, the judge derives steps from the criteria.
model Optional[Union[str, DeepEvalBaseLLM]] No None The judge LLM to use for evaluation. Accepts a model name string (e.g., "gpt-4o") or a custom DeepEvalBaseLLM instance. Defaults to the framework default model.
threshold float No 0.5 The minimum score (0.0 to 1.0) for a test case to be considered passing.
strict_mode bool No False When enabled, the metric score is set to 0 if the raw score falls below the threshold (binary pass/fail).
async_mode bool No True Whether to run the evaluation asynchronously.
verbose_mode bool No None Whether to print detailed evaluation logs. Inherits from global config if not specified.

Input / Output

  • Inputs: Metric configuration parameters as described above.
  • Outputs: A configured GEval metric object that can be passed to evaluate() or assert_test() functions.

Example

Basic Usage: Coherence Metric

from deepeval.metrics import GEval
from deepeval.test_case import LLMTestCaseParams

coherence_metric = GEval(
    name="Coherence",
    evaluation_params=[LLMTestCaseParams.ACTUAL_OUTPUT],
    criteria="Determine if the output is coherent and logically structured",
    threshold=0.5
)

Advanced Usage: Custom Evaluation Steps

from deepeval.metrics import GEval
from deepeval.test_case import LLMTestCaseParams

helpfulness_metric = GEval(
    name="Helpfulness",
    evaluation_params=[
        LLMTestCaseParams.INPUT,
        LLMTestCaseParams.ACTUAL_OUTPUT
    ],
    criteria="Evaluate whether the response is helpful for the user's query",
    evaluation_steps=[
        "Identify the user's intent from the input",
        "Check if the output directly addresses the user's intent",
        "Assess whether the output provides actionable information",
        "Evaluate the completeness of the response"
    ],
    model="gpt-4o",
    threshold=0.7,
    strict_mode=False
)

Related Pages

Metadata

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment