Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vibrantlabsai Ragas AnswerRelevancyV2

From Leeroopedia
Knowledge Sources
Domains Evaluation, Metrics
Last Updated 2026-02-12 00:00 GMT

Overview

Evaluates the relevancy of a generated answer to the original question by generating questions from the response and comparing them to the original question using cosine similarity of embeddings.

Description

The AnswerRelevancy metric (V2 collections implementation) measures how relevant a generated response is to the user's original question. It uses a reverse question generation approach:

1. The metric generates multiple questions from the response text using an LLM (controlled by the strictness parameter, default 3). 2. Each generated question is embedded alongside the original user input using a provided embeddings model. 3. Cosine similarity is computed between the original question embedding and each generated question embedding. 4. The final score is the mean cosine similarity across all generated questions.

Additionally, the metric detects noncommittal or evasive answers. If all generated questions are flagged as noncommittal, the score is forced to 0.0 regardless of the cosine similarity. This prevents vague or hedging answers from scoring well.

The metric uses AnswerRelevancePrompt with structured input/output classes (AnswerRelevanceInput and AnswerRelevanceOutput) to communicate with the LLM. It requires both an InstructorBaseRagasLLM for question generation and a BaseRagasEmbedding for semantic comparison. Only modern instructor-based components are supported; legacy wrappers are rejected.

Usage

Use this metric to evaluate whether an LLM's answer actually addresses the question that was asked. A high score indicates the response is topically relevant to the input query. A low score suggests the response diverges from what was asked, or the response is evasive/noncommittal.

This is the V2 collections version which uses modern instructor LLMs with structured output and modern embeddings, replacing the legacy V1 implementation. The V2 version provides automatic validation and a pure async API.

Code Reference

Source Location

Signature

class AnswerRelevancy(BaseMetric):
    def __init__(
        self,
        llm: "InstructorBaseRagasLLM",
        embeddings: "BaseRagasEmbedding",
        name: str = "answer_relevancy",
        strictness: int = 3,
        **kwargs,
    ): ...

    async def ascore(self, user_input: str, response: str) -> MetricResult: ...

Import

from ragas.metrics.collections import AnswerRelevancy

I/O Contract

Constructor Parameters

Name Type Required Description
llm InstructorBaseRagasLLM Yes Modern instructor-based LLM used for generating questions from the response
embeddings BaseRagasEmbedding Yes Embeddings model used for computing cosine similarity between questions
name str No Metric name (default: "answer_relevancy")
strictness int No Number of questions to generate from the response (default: 3). Higher values increase robustness but also LLM calls

Inputs

Name Type Required Description
user_input str Yes The original question posed by the user. Must be non-empty
response str Yes The generated response to evaluate for relevancy. Must be non-empty

Outputs

Name Type Description
score MetricResult (float value) Relevancy score between 0.0 and 1.0. Higher is better. 0.0 if all responses are noncommittal or no questions could be generated

Usage Examples

Basic Usage

import openai
from ragas.llms.base import llm_factory
from ragas.embeddings.base import embedding_factory
from ragas.metrics.collections import AnswerRelevancy

# Setup dependencies
client = openai.AsyncOpenAI()
llm = llm_factory("gpt-4o-mini", client=client)
embeddings = embedding_factory("openai", model="text-embedding-ada-002", client=client)

# Create metric instance
metric = AnswerRelevancy(llm=llm, embeddings=embeddings, strictness=3)

# Single evaluation
result = await metric.ascore(
    user_input="What is the capital of France?",
    response="Paris is the capital of France."
)
print(f"Answer Relevancy: {result.value}")

Custom Strictness

from ragas.metrics.collections import AnswerRelevancy

# Higher strictness for more robust evaluation (5 generated questions)
metric = AnswerRelevancy(llm=llm, embeddings=embeddings, strictness=5)

result = await metric.ascore(
    user_input="Explain photosynthesis.",
    response="Photosynthesis is the process by which plants convert sunlight into energy."
)
print(f"Answer Relevancy: {result.value}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment