Implementation:Vibrantlabsai Ragas AnswerRelevancyV2

Knowledge Sources	Vibrantlabsai_Ragas
Domains	Evaluation, Metrics
Last Updated	2026-02-12 00:00 GMT

Overview

Evaluates the relevancy of a generated answer to the original question by generating questions from the response and comparing them to the original question using cosine similarity of embeddings.

Description

The AnswerRelevancy metric (V2 collections implementation) measures how relevant a generated response is to the user's original question. It uses a reverse question generation approach:

1. The metric generates multiple questions from the response text using an LLM (controlled by the strictness parameter, default 3). 2. Each generated question is embedded alongside the original user input using a provided embeddings model. 3. Cosine similarity is computed between the original question embedding and each generated question embedding. 4. The final score is the mean cosine similarity across all generated questions.

Additionally, the metric detects noncommittal or evasive answers. If all generated questions are flagged as noncommittal, the score is forced to 0.0 regardless of the cosine similarity. This prevents vague or hedging answers from scoring well.

The metric uses AnswerRelevancePrompt with structured input/output classes (AnswerRelevanceInput and AnswerRelevanceOutput) to communicate with the LLM. It requires both an InstructorBaseRagasLLM for question generation and a BaseRagasEmbedding for semantic comparison. Only modern instructor-based components are supported; legacy wrappers are rejected.

Usage

Use this metric to evaluate whether an LLM's answer actually addresses the question that was asked. A high score indicates the response is topically relevant to the input query. A low score suggests the response diverges from what was asked, or the response is evasive/noncommittal.

This is the V2 collections version which uses modern instructor LLMs with structured output and modern embeddings, replacing the legacy V1 implementation. The V2 version provides automatic validation and a pure async API.

Code Reference

Source Location

Repository: Vibrantlabsai_Ragas
File: src/ragas/metrics/collections/answer_relevancy/metric.py

Signature

class AnswerRelevancy(BaseMetric):
    def __init__(
        self,
        llm: "InstructorBaseRagasLLM",
        embeddings: "BaseRagasEmbedding",
        name: str = "answer_relevancy",
        strictness: int = 3,
        **kwargs,
    ): ...

    async def ascore(self, user_input: str, response: str) -> MetricResult: ...

Import

from ragas.metrics.collections import AnswerRelevancy

I/O Contract

Constructor Parameters

Name	Type	Required	Description
llm	InstructorBaseRagasLLM	Yes	Modern instructor-based LLM used for generating questions from the response
embeddings	BaseRagasEmbedding	Yes	Embeddings model used for computing cosine similarity between questions
name	str	No	Metric name (default: "answer_relevancy")
strictness	int	No	Number of questions to generate from the response (default: 3). Higher values increase robustness but also LLM calls

Inputs

Name	Type	Required	Description
user_input	str	Yes	The original question posed by the user. Must be non-empty
response	str	Yes	The generated response to evaluate for relevancy. Must be non-empty

Outputs

Name	Type	Description
score	MetricResult (float value)	Relevancy score between 0.0 and 1.0. Higher is better. 0.0 if all responses are noncommittal or no questions could be generated

Usage Examples

Basic Usage

import openai
from ragas.llms.base import llm_factory
from ragas.embeddings.base import embedding_factory
from ragas.metrics.collections import AnswerRelevancy

# Setup dependencies
client = openai.AsyncOpenAI()
llm = llm_factory("gpt-4o-mini", client=client)
embeddings = embedding_factory("openai", model="text-embedding-ada-002", client=client)

# Create metric instance
metric = AnswerRelevancy(llm=llm, embeddings=embeddings, strictness=3)

# Single evaluation
result = await metric.ascore(
    user_input="What is the capital of France?",
    response="Paris is the capital of France."
)
print(f"Answer Relevancy: {result.value}")

Custom Strictness

from ragas.metrics.collections import AnswerRelevancy

# Higher strictness for more robust evaluation (5 generated questions)
metric = AnswerRelevancy(llm=llm, embeddings=embeddings, strictness=5)

result = await metric.ascore(
    user_input="Explain photosynthesis.",
    response="Photosynthesis is the process by which plants convert sunlight into energy."
)
print(f"Answer Relevancy: {result.value}")

Related Pages

Environment:Vibrantlabsai_Ragas_Python_3_9_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment