Implementation:Vibrantlabsai Ragas AnswerRelevancyV2
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Metrics |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
Evaluates the relevancy of a generated answer to the original question by generating questions from the response and comparing them to the original question using cosine similarity of embeddings.
Description
The AnswerRelevancy metric (V2 collections implementation) measures how relevant a generated response is to the user's original question. It uses a reverse question generation approach:
1. The metric generates multiple questions from the response text using an LLM (controlled by the strictness parameter, default 3). 2. Each generated question is embedded alongside the original user input using a provided embeddings model. 3. Cosine similarity is computed between the original question embedding and each generated question embedding. 4. The final score is the mean cosine similarity across all generated questions.
Additionally, the metric detects noncommittal or evasive answers. If all generated questions are flagged as noncommittal, the score is forced to 0.0 regardless of the cosine similarity. This prevents vague or hedging answers from scoring well.
The metric uses AnswerRelevancePrompt with structured input/output classes (AnswerRelevanceInput and AnswerRelevanceOutput) to communicate with the LLM. It requires both an InstructorBaseRagasLLM for question generation and a BaseRagasEmbedding for semantic comparison. Only modern instructor-based components are supported; legacy wrappers are rejected.
Usage
Use this metric to evaluate whether an LLM's answer actually addresses the question that was asked. A high score indicates the response is topically relevant to the input query. A low score suggests the response diverges from what was asked, or the response is evasive/noncommittal.
This is the V2 collections version which uses modern instructor LLMs with structured output and modern embeddings, replacing the legacy V1 implementation. The V2 version provides automatic validation and a pure async API.
Code Reference
Source Location
- Repository: Vibrantlabsai_Ragas
- File: src/ragas/metrics/collections/answer_relevancy/metric.py
Signature
class AnswerRelevancy(BaseMetric):
def __init__(
self,
llm: "InstructorBaseRagasLLM",
embeddings: "BaseRagasEmbedding",
name: str = "answer_relevancy",
strictness: int = 3,
**kwargs,
): ...
async def ascore(self, user_input: str, response: str) -> MetricResult: ...
Import
from ragas.metrics.collections import AnswerRelevancy
I/O Contract
Constructor Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| llm | InstructorBaseRagasLLM | Yes | Modern instructor-based LLM used for generating questions from the response |
| embeddings | BaseRagasEmbedding | Yes | Embeddings model used for computing cosine similarity between questions |
| name | str | No | Metric name (default: "answer_relevancy") |
| strictness | int | No | Number of questions to generate from the response (default: 3). Higher values increase robustness but also LLM calls |
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| user_input | str | Yes | The original question posed by the user. Must be non-empty |
| response | str | Yes | The generated response to evaluate for relevancy. Must be non-empty |
Outputs
| Name | Type | Description |
|---|---|---|
| score | MetricResult (float value) | Relevancy score between 0.0 and 1.0. Higher is better. 0.0 if all responses are noncommittal or no questions could be generated |
Usage Examples
Basic Usage
import openai
from ragas.llms.base import llm_factory
from ragas.embeddings.base import embedding_factory
from ragas.metrics.collections import AnswerRelevancy
# Setup dependencies
client = openai.AsyncOpenAI()
llm = llm_factory("gpt-4o-mini", client=client)
embeddings = embedding_factory("openai", model="text-embedding-ada-002", client=client)
# Create metric instance
metric = AnswerRelevancy(llm=llm, embeddings=embeddings, strictness=3)
# Single evaluation
result = await metric.ascore(
user_input="What is the capital of France?",
response="Paris is the capital of France."
)
print(f"Answer Relevancy: {result.value}")
Custom Strictness
from ragas.metrics.collections import AnswerRelevancy
# Higher strictness for more robust evaluation (5 generated questions)
metric = AnswerRelevancy(llm=llm, embeddings=embeddings, strictness=5)
result = await metric.ascore(
user_input="Explain photosynthesis.",
response="Photosynthesis is the process by which plants convert sunlight into energy."
)
print(f"Answer Relevancy: {result.value}")