Implementation:Confident ai Deepeval GEval

Overview

GEval is an API class in the deepeval library that implements the G-Eval methodology for LLM-based evaluation with custom criteria and evaluation steps. It allows practitioners to define flexible, domain-specific metrics by specifying natural language criteria and optional step-by-step evaluation instructions that guide the judge LLM through chain-of-thought scoring.

This is an API Doc implementation.

Source

Repository: Confident AI Deepeval
File: deepeval/metrics/g_eval/g_eval.py, lines 43-80
Class: GEval

Import

from deepeval.metrics import GEval
from deepeval.test_case import LLMTestCaseParams

Constructor Signature

GEval(
    name: str,
    evaluation_params: List[LLMTestCaseParams],
    criteria: Optional[str] = None,
    evaluation_steps: Optional[List[str]] = None,
    model: Optional[Union[str, DeepEvalBaseLLM]] = None,
    threshold: float = 0.5,
    strict_mode: bool = False,
    async_mode: bool = True,
    verbose_mode: bool = None
)

Parameters

Parameter	Type	Required	Default	Description
`name`	`str`	Yes	--	A descriptive name for the metric (e.g., "Coherence", "Helpfulness").
`evaluation_params`	`List[LLMTestCaseParams]`	Yes	--	Specifies which fields of the test case to include in the evaluation prompt. Common values: `LLMTestCaseParams.INPUT`, `LLMTestCaseParams.ACTUAL_OUTPUT`, `LLMTestCaseParams.EXPECTED_OUTPUT`.
`criteria`	`Optional[str]`	No	`None`	Natural language description of what the metric should evaluate (e.g., "Determine whether the output is coherent and logically structured").
`evaluation_steps`	`Optional[List[str]]`	No	`None`	Ordered list of step-by-step instructions for the judge LLM to follow during evaluation. If not provided, the judge derives steps from the criteria.
`model`	`Optional[Union[str, DeepEvalBaseLLM]]`	No	`None`	The judge LLM to use for evaluation. Accepts a model name string (e.g., `"gpt-4o"`) or a custom `DeepEvalBaseLLM` instance. Defaults to the framework default model.
`threshold`	`float`	No	`0.5`	The minimum score (0.0 to 1.0) for a test case to be considered passing.
`strict_mode`	`bool`	No	`False`	When enabled, the metric score is set to 0 if the raw score falls below the threshold (binary pass/fail).
`async_mode`	`bool`	No	`True`	Whether to run the evaluation asynchronously.
`verbose_mode`	`bool`	No	`None`	Whether to print detailed evaluation logs. Inherits from global config if not specified.

Input / Output

Inputs: Metric configuration parameters as described above.
Outputs: A configured GEval metric object that can be passed to evaluate() or assert_test() functions.

Example

Basic Usage: Coherence Metric

from deepeval.metrics import GEval
from deepeval.test_case import LLMTestCaseParams

coherence_metric = GEval(
    name="Coherence",
    evaluation_params=[LLMTestCaseParams.ACTUAL_OUTPUT],
    criteria="Determine if the output is coherent and logically structured",
    threshold=0.5
)

Advanced Usage: Custom Evaluation Steps

from deepeval.metrics import GEval
from deepeval.test_case import LLMTestCaseParams

helpfulness_metric = GEval(
    name="Helpfulness",
    evaluation_params=[
        LLMTestCaseParams.INPUT,
        LLMTestCaseParams.ACTUAL_OUTPUT
    ],
    criteria="Evaluate whether the response is helpful for the user's query",
    evaluation_steps=[
        "Identify the user's intent from the input",
        "Check if the output directly addresses the user's intent",
        "Assess whether the output provides actionable information",
        "Evaluate the completeness of the response"
    ],
    model="gpt-4o",
    threshold=0.7,
    strict_mode=False
)

Related Pages

Metadata

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment