Implementation:Confident ai Deepeval GEval
Appearance
Overview
GEval is an API class in the deepeval library that implements the G-Eval methodology for LLM-based evaluation with custom criteria and evaluation steps. It allows practitioners to define flexible, domain-specific metrics by specifying natural language criteria and optional step-by-step evaluation instructions that guide the judge LLM through chain-of-thought scoring.
This is an API Doc implementation.
Source
- Repository: Confident AI Deepeval
- File:
deepeval/metrics/g_eval/g_eval.py, lines 43-80 - Class:
GEval
Import
from deepeval.metrics import GEval
from deepeval.test_case import LLMTestCaseParams
Constructor Signature
GEval(
name: str,
evaluation_params: List[LLMTestCaseParams],
criteria: Optional[str] = None,
evaluation_steps: Optional[List[str]] = None,
model: Optional[Union[str, DeepEvalBaseLLM]] = None,
threshold: float = 0.5,
strict_mode: bool = False,
async_mode: bool = True,
verbose_mode: bool = None
)
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
name |
str |
Yes | -- | A descriptive name for the metric (e.g., "Coherence", "Helpfulness"). |
evaluation_params |
List[LLMTestCaseParams] |
Yes | -- | Specifies which fields of the test case to include in the evaluation prompt. Common values: LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT, LLMTestCaseParams.EXPECTED_OUTPUT.
|
criteria |
Optional[str] |
No | None |
Natural language description of what the metric should evaluate (e.g., "Determine whether the output is coherent and logically structured"). |
evaluation_steps |
Optional[List[str]] |
No | None |
Ordered list of step-by-step instructions for the judge LLM to follow during evaluation. If not provided, the judge derives steps from the criteria. |
model |
Optional[Union[str, DeepEvalBaseLLM]] |
No | None |
The judge LLM to use for evaluation. Accepts a model name string (e.g., "gpt-4o") or a custom DeepEvalBaseLLM instance. Defaults to the framework default model.
|
threshold |
float |
No | 0.5 |
The minimum score (0.0 to 1.0) for a test case to be considered passing. |
strict_mode |
bool |
No | False |
When enabled, the metric score is set to 0 if the raw score falls below the threshold (binary pass/fail). |
async_mode |
bool |
No | True |
Whether to run the evaluation asynchronously. |
verbose_mode |
bool |
No | None |
Whether to print detailed evaluation logs. Inherits from global config if not specified. |
Input / Output
- Inputs: Metric configuration parameters as described above.
- Outputs: A configured
GEvalmetric object that can be passed toevaluate()orassert_test()functions.
Example
Basic Usage: Coherence Metric
from deepeval.metrics import GEval
from deepeval.test_case import LLMTestCaseParams
coherence_metric = GEval(
name="Coherence",
evaluation_params=[LLMTestCaseParams.ACTUAL_OUTPUT],
criteria="Determine if the output is coherent and logically structured",
threshold=0.5
)
Advanced Usage: Custom Evaluation Steps
from deepeval.metrics import GEval
from deepeval.test_case import LLMTestCaseParams
helpfulness_metric = GEval(
name="Helpfulness",
evaluation_params=[
LLMTestCaseParams.INPUT,
LLMTestCaseParams.ACTUAL_OUTPUT
],
criteria="Evaluate whether the response is helpful for the user's query",
evaluation_steps=[
"Identify the user's intent from the input",
"Check if the output directly addresses the user's intent",
"Assess whether the output provides actionable information",
"Evaluate the completeness of the response"
],
model="gpt-4o",
threshold=0.7,
strict_mode=False
)
Related Pages
- Environment:Confident_ai_Deepeval_Python_3_9_Runtime
- Environment:Confident_ai_Deepeval_LLM_Provider_Credentials
- Heuristic:Confident_ai_Deepeval_Timeout_and_Retry_Tuning
Metadata
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment