Implementation:Vibrantlabsai Ragas AnswerRelevance
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Metrics |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
AnswerRelevancy (also known as ResponseRelevancy) scores how relevant a generated answer is to the original question by generating reverse questions from the answer and measuring their cosine similarity to the original question.
Description
The AnswerRelevancy metric evaluates the relevance of an answer by using a novel reverse-question approach. Rather than directly comparing the answer to the question, the metric uses an LLM to generate questions that the answer could plausibly be responding to. These generated questions are then compared against the original question using cosine similarity of their embedding vectors.
The core idea is that a highly relevant answer should produce generated questions that are very similar to the original question. Answers that contain incomplete, redundant, or unnecessary information will produce generated questions that diverge from the original, resulting in lower similarity scores.
The metric also incorporates a noncommittal detection mechanism. If the answer is determined to be evasive, vague, or ambiguous (e.g., "I don't know" or "I'm not sure"), the answer is flagged as noncommittal and the score is set to zero regardless of the cosine similarity.
The strictness parameter controls how many questions are generated per answer (default 3). The final score is the mean cosine similarity across all generated questions, multiplied by a binary noncommittal flag.
The class hierarchy consists of ResponseRelevancy (the base implementation) and AnswerRelevancy (a subclass alias). The pre-instantiated answer_relevancy singleton is available for convenience.
Usage
Use this metric when you want to evaluate whether an LLM's response actually addresses the question that was asked, without requiring a reference answer. It is particularly effective for detecting off-topic responses, overly generic answers, or responses that dodge the question.
Code Reference
Source Location
- Repository: Vibrantlabsai_Ragas
- File: src/ragas/metrics/_answer_relevance.py
Signature
@dataclass
class ResponseRelevancy(MetricWithLLM, MetricWithEmbeddings, SingleTurnMetric):
name: str = "answer_relevancy"
output_type = MetricOutputType.CONTINUOUS
question_generation: PydanticPrompt = ResponseRelevancePrompt()
strictness: int = 3
class AnswerRelevancy(ResponseRelevancy):
...
Import
from ragas.metrics import AnswerRelevancy
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| user_input | str | Yes | The original question or prompt from the user |
| response | str | Yes | The generated answer to evaluate for relevance |
| strictness | int | No | Number of questions generated per answer for similarity comparison (default 3, ideal range 3-5) |
Outputs
| Name | Type | Description |
|---|---|---|
| score | float | Relevance score between 0.0 and 1.0 (mean cosine similarity of generated questions to original question, zeroed out if answer is noncommittal) |
Internal Components
ResponseRelevancePrompt
The ResponseRelevancePrompt is a PydanticPrompt that takes a ResponseRelevanceInput (containing the response text) and produces a ResponseRelevanceOutput with a generated question and a noncommittal flag (0 or 1).
Similarity Calculation
The calculate_similarity method computes cosine similarity between the original question embedding and all generated question embeddings:
def calculate_similarity(self, question: str, generated_questions: list[str]):
question_vec = np.asarray(self.embeddings.embed_query(question)).reshape(1, -1)
gen_question_vec = np.asarray(
self.embeddings.embed_documents(generated_questions)
).reshape(len(generated_questions), -1)
norm = np.linalg.norm(gen_question_vec, axis=1) * np.linalg.norm(
question_vec, axis=1
)
return np.dot(gen_question_vec, question_vec.T).reshape(-1) / norm
Score Computation
The final score multiplies mean cosine similarity by a noncommittal indicator:
cosine_sim = self.calculate_similarity(question, gen_questions)
score = cosine_sim.mean() * int(not all_noncommittal)
Usage Examples
Basic Usage
from ragas.metrics import AnswerRelevancy
from ragas import evaluate
from datasets import Dataset
data = {
"user_input": ["Where was Albert Einstein born?"],
"response": ["Albert Einstein was born in Germany."],
}
dataset = Dataset.from_dict(data)
results = evaluate(dataset, metrics=[AnswerRelevancy()])
print(results)
Custom Strictness
from ragas.metrics import AnswerRelevancy
from ragas.dataset_schema import SingleTurnSample
# Use higher strictness for more robust evaluation
relevancy = AnswerRelevancy()
relevancy.strictness = 5
sample = SingleTurnSample(
user_input="What is photosynthesis?",
response="Photosynthesis is the process by which plants convert sunlight into energy.",
)