Implementation:Run llama Llama index AnswerRelevancyEvaluator

Knowledge Sources	Run_llama_Llama_index
Domains	Evaluation, Relevancy
Last Updated	2026-02-11 19:00 GMT

Overview

Evaluates whether a generated response is relevant to the user query by using an LLM to score the response on subject matter match and focus alignment.

Description

The AnswerRelevancyEvaluator is a concrete implementation of BaseEvaluator that assesses how well a response addresses a given query. It uses a customizable prompt template that instructs the LLM to evaluate the response on two criteria:

Does the response match the subject matter of the query?
Does the response address the focus or perspective of the query?

Each criterion is worth 1 point, for a maximum raw score of 2. The default score_threshold is 2.0, and the final score is computed by dividing the raw score by this threshold, yielding a normalized value between 0.0 and 1.0.

The LLM output is parsed using a configurable parser_function. The default parser uses a regex pattern to extract a [RESULT] tag followed by a digit, along with the preceding text as feedback. If parsing fails and raise_error is True, a ValueError is raised; otherwise, the result is marked as invalid with an explanation.

The evaluator only considers the query and response parameters; the contexts parameter is ignored. It supports an optional sleep_time_in_seconds parameter for rate limiting when making multiple evaluation calls.

Usage

Use this evaluator when you need to assess whether LLM-generated answers are relevant to user queries. It is commonly used in RAG evaluation pipelines to measure response quality independently of the retrieved context. It works with any LLM that supports the apredict interface.

Code Reference

Source Location

Repository: Run_llama_Llama_index
File: llama-index-core/llama_index/core/evaluation/answer_relevancy.py

Signature

class AnswerRelevancyEvaluator(BaseEvaluator):
    def __init__(
        self,
        llm: Optional[LLM] = None,
        raise_error: bool = False,
        eval_template: str | BasePromptTemplate | None = None,
        score_threshold: float = 2.0,
        parser_function: Callable[
            [str], Tuple[Optional[float], Optional[str]]
        ] = _default_parser_function,
    ) -> None: ...

    async def aevaluate(
        self,
        query: str | None = None,
        response: str | None = None,
        contexts: Sequence[str] | None = None,
        sleep_time_in_seconds: int = 0,
        **kwargs: Any,
    ) -> EvaluationResult: ...

Import

from llama_index.core.evaluation.answer_relevancy import AnswerRelevancyEvaluator

I/O Contract

Inputs

Name	Type	Required	Description
llm	Optional[LLM]	No	The LLM to use for evaluation. Defaults to Settings.llm.
raise_error	bool	No	Whether to raise a ValueError on unparseable output. Defaults to False.
eval_template	str or BasePromptTemplate or None	No	Custom evaluation prompt template. Defaults to the built-in template.
score_threshold	float	No	The maximum raw score used for normalization. Defaults to 2.0.
parser_function	Callable	No	Function to parse LLM output into (score, feedback). Defaults to regex-based parser.
query	str	Yes (aevaluate)	The user query to evaluate against.
response	str	Yes (aevaluate)	The generated response to evaluate.
sleep_time_in_seconds	int	No (aevaluate)	Delay before evaluation for rate limiting. Defaults to 0.

Outputs

Name	Type	Description
result	EvaluationResult	Contains the query, response, normalized score (0.0-1.0), raw LLM feedback, and invalid_result/invalid_reason if parsing failed.

Usage Examples

from llama_index.core.evaluation.answer_relevancy import AnswerRelevancyEvaluator
from llama_index.core.llms import OpenAI

# Create the evaluator
evaluator = AnswerRelevancyEvaluator(
    llm=OpenAI(model="gpt-4"),
    score_threshold=2.0,
)

# Evaluate a response
result = await evaluator.aevaluate(
    query="What is the capital of France?",
    response="Paris is the capital and largest city of France.",
)

print(f"Score: {result.score}")       # e.g., 1.0 (normalized)
print(f"Feedback: {result.feedback}")  # Detailed LLM feedback

Related Pages

Environment:Run_llama_Llama_index_Python_LlamaIndex_Core

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment