Implementation:Run llama Llama index GuidelineEvaluator

Knowledge Sources	Run_llama_Llama_index
Domains	Evaluation, Guideline
Last Updated	2026-02-11 19:00 GMT

Overview

Evaluates whether a query-response pair adheres to a set of user-defined or default guidelines, returning a pass/fail result with structured feedback.

Description

The GuidelineEvaluator is a concrete implementation of BaseEvaluator that assesses whether a generated response follows specified quality guidelines. Unlike scoring-based evaluators, this evaluator produces a binary passing result (True/False) along with detailed feedback.

The evaluation workflow is:

The LLM is prompted with the query, response, and guidelines using a configurable eval_template.
The LLM output is parsed through a PydanticOutputParser that extracts an EvaluationData object containing a passing boolean and a feedback string.
The result is converted to an EvaluationResult with score set to 1.0 if passing or 0.0 if not.

The default guidelines instruct the LLM to check that:

The response fully answers the query.
The response avoids being vague or ambiguous.
The response is specific and uses statistics or numbers when possible.

Custom guidelines can be provided as a string during initialization. The eval_template defaults to a prompt that presents the query, response, and guidelines, then asks for constructive criticism. The output parser can also be customized by passing a different PydanticOutputParser instance.

The evaluator only considers the query and response parameters; the contexts parameter is ignored.

Usage

Use this evaluator when you need to enforce specific quality standards on LLM responses, such as checking for specificity, factual grounding, or adherence to a style guide. It is useful in production pipelines where binary pass/fail gating is needed rather than continuous scoring.

Code Reference

Source Location

Repository: Run_llama_Llama_index
File: llama-index-core/llama_index/core/evaluation/guideline.py

Signature

class GuidelineEvaluator(BaseEvaluator):
    def __init__(
        self,
        llm: Optional[LLM] = None,
        guidelines: Optional[str] = None,
        eval_template: Optional[Union[str, BasePromptTemplate]] = None,
        output_parser: Optional[PydanticOutputParser] = None,
    ) -> None: ...

    async def aevaluate(
        self,
        query: Optional[str] = None,
        response: Optional[str] = None,
        contexts: Optional[Sequence[str]] = None,
        sleep_time_in_seconds: int = 0,
        **kwargs: Any,
    ) -> EvaluationResult: ...

Import

from llama_index.core.evaluation.guideline import GuidelineEvaluator

I/O Contract

Inputs

Name	Type	Required	Description
llm	Optional[LLM]	No	The LLM to use for evaluation. Defaults to Settings.llm.
guidelines	Optional[str]	No	Custom guidelines for evaluating the response. Defaults to built-in guidelines about specificity and completeness.
eval_template	Optional[Union[str, BasePromptTemplate]]	No	Custom evaluation prompt template. Defaults to the built-in template.
output_parser	Optional[PydanticOutputParser]	No	Custom output parser. Defaults to a PydanticOutputParser for EvaluationData.
query	str	Yes (aevaluate)	The user query to evaluate against.
response	str	Yes (aevaluate)	The generated response to evaluate.
sleep_time_in_seconds	int	No (aevaluate)	Delay before evaluation for rate limiting. Defaults to 0.

Outputs

Name	Type	Description
result	EvaluationResult	Contains the query, response, passing (bool), score (1.0 or 0.0), and feedback from the LLM.

Usage Examples

from llama_index.core.evaluation.guideline import GuidelineEvaluator
from llama_index.core.llms import OpenAI

# Create evaluator with custom guidelines
evaluator = GuidelineEvaluator(
    llm=OpenAI(model="gpt-4"),
    guidelines=(
        "The response must include specific dates or timeframes.\n"
        "The response must cite at least one source.\n"
        "The response must not exceed 200 words.\n"
    ),
)

# Evaluate a response
result = await evaluator.aevaluate(
    query="When was Python first released?",
    response="Python was first released on February 20, 1991 by Guido van Rossum.",
)

print(f"Passing: {result.passing}")    # True or False
print(f"Score: {result.score}")        # 1.0 or 0.0
print(f"Feedback: {result.feedback}")  # Detailed critique

Related Pages

Environment:Run_llama_Llama_index_Python_LlamaIndex_Core

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment