Implementation:Vibrantlabsai Ragas AnswerSimilarity
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Metrics |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
AnswerSimilarity measures the semantic similarity between a generated response and a reference answer using embedding-based cosine similarity or cross-encoder scoring.
Description
The AnswerSimilarity metric (which extends SemanticSimilarity) evaluates how semantically close a generated response is to a reference answer. It works by computing cosine similarity between the embedding vectors of the two texts.
The base class SemanticSimilarity supports two modes of operation:
- Standard embeddings mode: Both the reference and response are embedded independently, and the cosine similarity between the resulting vectors is computed. The implementation normalizes each embedding vector and computes their dot product. It supports both modern (BaseRagasEmbedding with aembed_text) and legacy (BaseRagasEmbeddings with embed_text) embedding interfaces.
- Cross-encoder mode: When HuggingfaceEmbeddings are provided and the model is a cross-encoder, a different scoring approach is used. However, the async path (ascore) raises NotImplementedError for cross-encoder models, meaning cross-encoders are only supported through the synchronous interface.
An optional threshold parameter can be set to convert the continuous similarity score into a binary result (1 if similarity >= threshold, 0 otherwise).
The metric is based on the Semantic Answer Similarity (SAS) paper: arxiv.org/pdf/2108.06130.pdf.
AnswerSimilarity is a thin subclass of SemanticSimilarity that simply sets the default metric name to "answer_similarity".
Usage
Use this metric when you want a quick, embedding-based comparison between a generated answer and a reference answer. It is useful as a standalone metric or as a component in composite metrics (e.g., AnswerCorrectness uses it as its semantic similarity component). It does not require an LLM, only an embedding model.
Code Reference
Source Location
- Repository: Vibrantlabsai_Ragas
- File: src/ragas/metrics/_answer_similarity.py
Signature
@dataclass
class SemanticSimilarity(MetricWithEmbeddings, SingleTurnMetric):
name: str = "semantic_similarity"
output_type = MetricOutputType.CONTINUOUS
is_cross_encoder: bool = False
threshold: t.Optional[float] = None
@dataclass
class AnswerSimilarity(SemanticSimilarity):
name: str = "answer_similarity"
Import
from ragas.metrics import AnswerSimilarity
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| reference | str | Yes | The ground truth or reference answer |
| response | str | Yes | The generated answer to compare against the reference |
| threshold | float | No | If set, converts the continuous score to binary (1 if score >= threshold, 0 otherwise) |
Outputs
| Name | Type | Description |
|---|---|---|
| score | float | Cosine similarity between the reference and response embeddings, ranging from 0.0 to 1.0 (or binary 0/1 if a threshold is set) |
Internal Components
Embedding Computation
The metric handles empty strings by replacing them with a single space to avoid embedding errors. It supports two embedding API interfaces:
# Modern interface (BaseRagasEmbedding)
if hasattr(self.embeddings, "aembed_text"):
embedding_1 = np.array(await self.embeddings.aembed_text(ground_truth))
embedding_2 = np.array(await self.embeddings.aembed_text(answer))
else:
# Legacy interface (BaseRagasEmbeddings)
embedding_1 = np.array(await self.embeddings.embed_text(ground_truth))
embedding_2 = np.array(await self.embeddings.embed_text(answer))
Cosine Similarity
After obtaining embeddings, the vectors are normalized and their dot product is computed:
norms_1 = np.linalg.norm(embedding_1, keepdims=True)
norms_2 = np.linalg.norm(embedding_2, keepdims=True)
embedding_1_normalized = embedding_1 / norms_1
embedding_2_normalized = embedding_2 / norms_2
similarity = embedding_1_normalized @ embedding_2_normalized.T
score = similarity.flatten()
Usage Examples
Basic Usage
from ragas.metrics import AnswerSimilarity
from ragas import evaluate
from datasets import Dataset
data = {
"response": ["The sun is powered by nuclear fusion."],
"reference": [
"The sun is powered by nuclear fusion, where hydrogen atoms fuse to form helium."
],
}
dataset = Dataset.from_dict(data)
results = evaluate(dataset, metrics=[AnswerSimilarity()])
print(results)
With Threshold
from ragas.metrics import AnswerSimilarity
# Binary mode: score is 1 if similarity >= 0.7, else 0
similarity_binary = AnswerSimilarity(threshold=0.7)