Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vibrantlabsai Ragas AnswerSimilarity

From Leeroopedia
Knowledge Sources
Domains Evaluation, Metrics
Last Updated 2026-02-12 00:00 GMT

Overview

AnswerSimilarity measures the semantic similarity between a generated response and a reference answer using embedding-based cosine similarity or cross-encoder scoring.

Description

The AnswerSimilarity metric (which extends SemanticSimilarity) evaluates how semantically close a generated response is to a reference answer. It works by computing cosine similarity between the embedding vectors of the two texts.

The base class SemanticSimilarity supports two modes of operation:

  • Standard embeddings mode: Both the reference and response are embedded independently, and the cosine similarity between the resulting vectors is computed. The implementation normalizes each embedding vector and computes their dot product. It supports both modern (BaseRagasEmbedding with aembed_text) and legacy (BaseRagasEmbeddings with embed_text) embedding interfaces.
  • Cross-encoder mode: When HuggingfaceEmbeddings are provided and the model is a cross-encoder, a different scoring approach is used. However, the async path (ascore) raises NotImplementedError for cross-encoder models, meaning cross-encoders are only supported through the synchronous interface.

An optional threshold parameter can be set to convert the continuous similarity score into a binary result (1 if similarity >= threshold, 0 otherwise).

The metric is based on the Semantic Answer Similarity (SAS) paper: arxiv.org/pdf/2108.06130.pdf.

AnswerSimilarity is a thin subclass of SemanticSimilarity that simply sets the default metric name to "answer_similarity".

Usage

Use this metric when you want a quick, embedding-based comparison between a generated answer and a reference answer. It is useful as a standalone metric or as a component in composite metrics (e.g., AnswerCorrectness uses it as its semantic similarity component). It does not require an LLM, only an embedding model.

Code Reference

Source Location

Signature

@dataclass
class SemanticSimilarity(MetricWithEmbeddings, SingleTurnMetric):
    name: str = "semantic_similarity"
    output_type = MetricOutputType.CONTINUOUS
    is_cross_encoder: bool = False
    threshold: t.Optional[float] = None

@dataclass
class AnswerSimilarity(SemanticSimilarity):
    name: str = "answer_similarity"

Import

from ragas.metrics import AnswerSimilarity

I/O Contract

Inputs

Name Type Required Description
reference str Yes The ground truth or reference answer
response str Yes The generated answer to compare against the reference
threshold float No If set, converts the continuous score to binary (1 if score >= threshold, 0 otherwise)

Outputs

Name Type Description
score float Cosine similarity between the reference and response embeddings, ranging from 0.0 to 1.0 (or binary 0/1 if a threshold is set)

Internal Components

Embedding Computation

The metric handles empty strings by replacing them with a single space to avoid embedding errors. It supports two embedding API interfaces:

# Modern interface (BaseRagasEmbedding)
if hasattr(self.embeddings, "aembed_text"):
    embedding_1 = np.array(await self.embeddings.aembed_text(ground_truth))
    embedding_2 = np.array(await self.embeddings.aembed_text(answer))
else:
    # Legacy interface (BaseRagasEmbeddings)
    embedding_1 = np.array(await self.embeddings.embed_text(ground_truth))
    embedding_2 = np.array(await self.embeddings.embed_text(answer))

Cosine Similarity

After obtaining embeddings, the vectors are normalized and their dot product is computed:

norms_1 = np.linalg.norm(embedding_1, keepdims=True)
norms_2 = np.linalg.norm(embedding_2, keepdims=True)
embedding_1_normalized = embedding_1 / norms_1
embedding_2_normalized = embedding_2 / norms_2
similarity = embedding_1_normalized @ embedding_2_normalized.T
score = similarity.flatten()

Usage Examples

Basic Usage

from ragas.metrics import AnswerSimilarity
from ragas import evaluate
from datasets import Dataset

data = {
    "response": ["The sun is powered by nuclear fusion."],
    "reference": [
        "The sun is powered by nuclear fusion, where hydrogen atoms fuse to form helium."
    ],
}
dataset = Dataset.from_dict(data)

results = evaluate(dataset, metrics=[AnswerSimilarity()])
print(results)

With Threshold

from ragas.metrics import AnswerSimilarity

# Binary mode: score is 1 if similarity >= 0.7, else 0
similarity_binary = AnswerSimilarity(threshold=0.7)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment