Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vibrantlabsai Ragas SemanticSimilarityV2

From Leeroopedia
Knowledge Sources
Domains Evaluation, Metrics
Last Updated 2026-02-12 00:00 GMT

Overview

SemanticSimilarity is a class-based v2 metric that evaluates the semantic similarity between reference and response texts by computing the cosine similarity of their embedding vectors.

Description

The SemanticSimilarity metric measures how semantically close a generated response is to a reference text using vector embeddings. It inherits from BaseMetric and requires a BaseRagasEmbedding instance to function.

The algorithm is based on the Semantic Answer Similarity (SAS) approach described in this paper and works as follows:

  1. Both the reference and response texts are embedded using the provided embeddings model via the embed_text() method.
  2. The resulting embedding vectors are converted to NumPy arrays.
  3. Each embedding is L2-normalized (divided by its Euclidean norm).
  4. The cosine similarity is computed as the dot product of the two normalized vectors: embedding_1_normalized @ embedding_2_normalized.T.
  5. The resulting similarity value is flattened to a scalar.

An optional threshold parameter enables binary classification: when set, any similarity score at or above the threshold returns 1.0 (similar), and any score below returns 0.0 (dissimilar). When threshold is None (the default), the raw cosine similarity value is returned.

Empty or None inputs are replaced with a single space character to prevent embedding errors.

Usage

Use SemanticSimilarity when you need to evaluate whether a generated response captures the same meaning as a reference text, regardless of exact wording. This is useful for evaluating paraphrasing quality, answer correctness in question-answering systems, and general text generation fidelity. Unlike lexical metrics (BLEU, ROUGE), this metric captures semantic equivalence even when different words or phrasings are used. It requires an embeddings model to be configured.

Code Reference

Source Location

Signature

class SemanticSimilarity(BaseMetric):
    embeddings: "BaseRagasEmbedding"

    def __init__(
        self,
        embeddings: "BaseRagasEmbedding",
        name: str = "semantic_similarity",
        threshold: t.Optional[float] = None,
        **kwargs,
    ):

Import

from ragas.metrics.collections import SemanticSimilarity

I/O Contract

Inputs

Name Type Required Description
embeddings BaseRagasEmbedding Yes An embeddings model instance with an embed_text() method (validated at initialization)
reference str Yes The reference/ground truth text
response str Yes The response text to evaluate against the reference
threshold float No Optional threshold for binary classification. When set, scores >= threshold return 1.0, otherwise 0.0 (default: None)

Outputs

Name Type Description
result MetricResult A MetricResult object with a value attribute containing the cosine similarity score between 0.0 and 1.0 (or binary 0.0/1.0 if threshold is set)

Usage Examples

Basic Usage

from openai import AsyncOpenAI
from ragas.embeddings.base import embedding_factory
from ragas.metrics.collections import SemanticSimilarity

# Setup embeddings
client = AsyncOpenAI()
embeddings = embedding_factory(
    "openai",
    model="text-embedding-ada-002",
    client=client,
    interface="modern"
)

# Create metric instance
metric = SemanticSimilarity(embeddings=embeddings)

# Evaluate semantic similarity
result = await metric.ascore(
    reference="Paris is the capital of France.",
    response="The capital of France is Paris."
)
print(f"Semantic Similarity: {result.value}")

Binary Classification with Threshold

from ragas.metrics.collections import SemanticSimilarity

# Use threshold for binary pass/fail classification
metric = SemanticSimilarity(embeddings=embeddings, threshold=0.8)

result = await metric.ascore(
    reference="The weather is sunny today.",
    response="It is a bright and sunny day."
)
print(f"Similar (>= 0.8): {result.value}")  # 1.0 or 0.0

Batch Evaluation

from ragas.metrics.collections import SemanticSimilarity

metric = SemanticSimilarity(embeddings=embeddings)

results = await metric.abatch_score([
    {"reference": "Machine learning is a subset of AI.",
     "response": "ML is part of artificial intelligence."},
    {"reference": "The sky is blue.",
     "response": "Water boils at 100 degrees Celsius."},
])

for i, result in enumerate(results):
    print(f"Sample {i}: Similarity = {result.value}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment