Implementation:Vibrantlabsai Ragas AnswerSimilarity

Knowledge Sources	Vibrantlabsai_Ragas
Domains	Evaluation, Metrics
Last Updated	2026-02-12 00:00 GMT

Overview

AnswerSimilarity measures the semantic similarity between a generated response and a reference answer using embedding-based cosine similarity or cross-encoder scoring.

Description

The AnswerSimilarity metric (which extends SemanticSimilarity) evaluates how semantically close a generated response is to a reference answer. It works by computing cosine similarity between the embedding vectors of the two texts.

The base class SemanticSimilarity supports two modes of operation:

Standard embeddings mode: Both the reference and response are embedded independently, and the cosine similarity between the resulting vectors is computed. The implementation normalizes each embedding vector and computes their dot product. It supports both modern (BaseRagasEmbedding with aembed_text) and legacy (BaseRagasEmbeddings with embed_text) embedding interfaces.

Cross-encoder mode: When HuggingfaceEmbeddings are provided and the model is a cross-encoder, a different scoring approach is used. However, the async path (ascore) raises NotImplementedError for cross-encoder models, meaning cross-encoders are only supported through the synchronous interface.

An optional threshold parameter can be set to convert the continuous similarity score into a binary result (1 if similarity >= threshold, 0 otherwise).

The metric is based on the Semantic Answer Similarity (SAS) paper: arxiv.org/pdf/2108.06130.pdf.

AnswerSimilarity is a thin subclass of SemanticSimilarity that simply sets the default metric name to "answer_similarity".

Usage

Use this metric when you want a quick, embedding-based comparison between a generated answer and a reference answer. It is useful as a standalone metric or as a component in composite metrics (e.g., AnswerCorrectness uses it as its semantic similarity component). It does not require an LLM, only an embedding model.

Code Reference

Source Location

Repository: Vibrantlabsai_Ragas
File: src/ragas/metrics/_answer_similarity.py

Signature

@dataclass
class SemanticSimilarity(MetricWithEmbeddings, SingleTurnMetric):
    name: str = "semantic_similarity"
    output_type = MetricOutputType.CONTINUOUS
    is_cross_encoder: bool = False
    threshold: t.Optional[float] = None

@dataclass
class AnswerSimilarity(SemanticSimilarity):
    name: str = "answer_similarity"

Import

from ragas.metrics import AnswerSimilarity

I/O Contract

Inputs

Name	Type	Required	Description
reference	str	Yes	The ground truth or reference answer
response	str	Yes	The generated answer to compare against the reference
threshold	float	No	If set, converts the continuous score to binary (1 if score >= threshold, 0 otherwise)

Outputs

Name	Type	Description
score	float	Cosine similarity between the reference and response embeddings, ranging from 0.0 to 1.0 (or binary 0/1 if a threshold is set)

Internal Components

Embedding Computation

The metric handles empty strings by replacing them with a single space to avoid embedding errors. It supports two embedding API interfaces:

# Modern interface (BaseRagasEmbedding)
if hasattr(self.embeddings, "aembed_text"):
    embedding_1 = np.array(await self.embeddings.aembed_text(ground_truth))
    embedding_2 = np.array(await self.embeddings.aembed_text(answer))
else:
    # Legacy interface (BaseRagasEmbeddings)
    embedding_1 = np.array(await self.embeddings.embed_text(ground_truth))
    embedding_2 = np.array(await self.embeddings.embed_text(answer))

Cosine Similarity

After obtaining embeddings, the vectors are normalized and their dot product is computed:

norms_1 = np.linalg.norm(embedding_1, keepdims=True)
norms_2 = np.linalg.norm(embedding_2, keepdims=True)
embedding_1_normalized = embedding_1 / norms_1
embedding_2_normalized = embedding_2 / norms_2
similarity = embedding_1_normalized @ embedding_2_normalized.T
score = similarity.flatten()

Usage Examples

Basic Usage

from ragas.metrics import AnswerSimilarity
from ragas import evaluate
from datasets import Dataset

data = {
    "response": ["The sun is powered by nuclear fusion."],
    "reference": [
        "The sun is powered by nuclear fusion, where hydrogen atoms fuse to form helium."
    ],
}
dataset = Dataset.from_dict(data)

results = evaluate(dataset, metrics=[AnswerSimilarity()])
print(results)

With Threshold

from ragas.metrics import AnswerSimilarity

# Binary mode: score is 1 if similarity >= 0.7, else 0
similarity_binary = AnswerSimilarity(threshold=0.7)

Related Pages

Environment:Vibrantlabsai_Ragas_Python_3_9_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment