Implementation:Explodinggradients Ragas Collections AnswerRelevancy Metric
| Field | Value |
|---|---|
| source | Repo |
| domains | Metrics, Evaluation |
| last_updated | 2026-02-10 00:00 GMT |
Overview
AnswerRelevancy is a v2 class-based metric that evaluates how relevant a response is to the original question using a dual-component design combining LLM-based question generation with embedding cosine similarity.
Description
AnswerRelevancy extends BaseMetric and requires both a modern InstructorBaseRagasLLM and a BaseRagasEmbedding. The evaluation algorithm works as follows:
- For each iteration up to
strictnesscount (default 3), the LLM generates a synthetic question from the response along with a noncommittal flag indicating if the response is evasive. - The original question and all generated questions are embedded using the embeddings model.
- Cosine similarity is computed between the original question vector and each generated question vector.
- The final score is the mean cosine similarity, reduced to 0.0 if all generated responses were flagged as noncommittal.
This approach measures whether the response actually addresses the question -- a relevant answer should allow reconstruction of the original question. The metric uses structured prompts via AnswerRelevancePrompt with AnswerRelevanceInput / AnswerRelevanceOutput Pydantic models.
Usage
Instantiate with required llm and embeddings parameters and optional strictness (number of generated questions). Call ascore(user_input, response). Both components are validated by the base class to ensure they are modern implementations.
Code Reference
| Property | Value |
|---|---|
| Source Location | src/ragas/metrics/collections/answer_relevancy/metric.py L1--157
|
| Signature | class AnswerRelevancy(BaseMetric)
|
| Import | from ragas.metrics.collections import AnswerRelevancy
|
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
user_input |
str |
Yes | The original question |
response |
str |
Yes | The response text to evaluate |
Constructor Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
llm |
InstructorBaseRagasLLM |
(required) | Modern instructor-based LLM for question generation |
embeddings |
BaseRagasEmbedding |
(required) | Modern embeddings model for semantic comparison |
name |
str |
"answer_relevancy" |
Metric name |
strictness |
int |
3 |
Number of questions to generate per evaluation |
Outputs
| Field | Type | Description |
|---|---|---|
MetricResult.value |
float |
Answer relevancy score in range 0.0--1.0 (higher is better) |
Usage Examples
import openai
from ragas.llms.base import llm_factory
from ragas.embeddings.base import embedding_factory
from ragas.metrics.collections import AnswerRelevancy
# Setup dependencies
client = openai.AsyncOpenAI()
llm = llm_factory("gpt-4o-mini", client=client)
embeddings = embedding_factory(
"openai", model="text-embedding-ada-002",
client=client, interface="modern"
)
# Create metric
metric = AnswerRelevancy(llm=llm, embeddings=embeddings, strictness=3)
# Single evaluation
result = await metric.ascore(
user_input="What is the capital of France?",
response="Paris is the capital of France."
)
print(f"Answer Relevancy: {result.value}")
# Higher strictness for more robust evaluation
strict_metric = AnswerRelevancy(llm=llm, embeddings=embeddings, strictness=5)
result = await strict_metric.ascore(
user_input="Explain quantum computing.",
response="Quantum computing uses qubits that can exist in superposition."
)
print(f"Strict Answer Relevancy: {result.value}")
Related Pages
- Explodinggradients_Ragas_Collections_BaseMetric_Class -- Base class for all v2 metrics
- Explodinggradients_Ragas_Collections_SemanticSimilarity_Metric -- Pure embedding-based similarity metric
- Explodinggradients_Ragas_Collections_ContextRelevance_Metric -- Context relevance evaluation
- Explodinggradients_Ragas_Collections_ResponseGroundedness_Metric -- Response groundedness evaluation