Implementation:Vibrantlabsai Ragas SemanticSimilarityV2
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Metrics |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
SemanticSimilarity is a class-based v2 metric that evaluates the semantic similarity between reference and response texts by computing the cosine similarity of their embedding vectors.
Description
The SemanticSimilarity metric measures how semantically close a generated response is to a reference text using vector embeddings. It inherits from BaseMetric and requires a BaseRagasEmbedding instance to function.
The algorithm is based on the Semantic Answer Similarity (SAS) approach described in this paper and works as follows:
- Both the reference and response texts are embedded using the provided embeddings model via the
embed_text()method. - The resulting embedding vectors are converted to NumPy arrays.
- Each embedding is L2-normalized (divided by its Euclidean norm).
- The cosine similarity is computed as the dot product of the two normalized vectors:
embedding_1_normalized @ embedding_2_normalized.T. - The resulting similarity value is flattened to a scalar.
An optional threshold parameter enables binary classification: when set, any similarity score at or above the threshold returns 1.0 (similar), and any score below returns 0.0 (dissimilar). When threshold is None (the default), the raw cosine similarity value is returned.
Empty or None inputs are replaced with a single space character to prevent embedding errors.
Usage
Use SemanticSimilarity when you need to evaluate whether a generated response captures the same meaning as a reference text, regardless of exact wording. This is useful for evaluating paraphrasing quality, answer correctness in question-answering systems, and general text generation fidelity. Unlike lexical metrics (BLEU, ROUGE), this metric captures semantic equivalence even when different words or phrasings are used. It requires an embeddings model to be configured.
Code Reference
Source Location
- Repository: Vibrantlabsai_Ragas
- File: src/ragas/metrics/collections/_semantic_similarity.py
Signature
class SemanticSimilarity(BaseMetric):
embeddings: "BaseRagasEmbedding"
def __init__(
self,
embeddings: "BaseRagasEmbedding",
name: str = "semantic_similarity",
threshold: t.Optional[float] = None,
**kwargs,
):
Import
from ragas.metrics.collections import SemanticSimilarity
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| embeddings | BaseRagasEmbedding | Yes | An embeddings model instance with an embed_text() method (validated at initialization)
|
| reference | str | Yes | The reference/ground truth text |
| response | str | Yes | The response text to evaluate against the reference |
| threshold | float | No | Optional threshold for binary classification. When set, scores >= threshold return 1.0, otherwise 0.0 (default: None) |
Outputs
| Name | Type | Description |
|---|---|---|
| result | MetricResult | A MetricResult object with a value attribute containing the cosine similarity score between 0.0 and 1.0 (or binary 0.0/1.0 if threshold is set)
|
Usage Examples
Basic Usage
from openai import AsyncOpenAI
from ragas.embeddings.base import embedding_factory
from ragas.metrics.collections import SemanticSimilarity
# Setup embeddings
client = AsyncOpenAI()
embeddings = embedding_factory(
"openai",
model="text-embedding-ada-002",
client=client,
interface="modern"
)
# Create metric instance
metric = SemanticSimilarity(embeddings=embeddings)
# Evaluate semantic similarity
result = await metric.ascore(
reference="Paris is the capital of France.",
response="The capital of France is Paris."
)
print(f"Semantic Similarity: {result.value}")
Binary Classification with Threshold
from ragas.metrics.collections import SemanticSimilarity
# Use threshold for binary pass/fail classification
metric = SemanticSimilarity(embeddings=embeddings, threshold=0.8)
result = await metric.ascore(
reference="The weather is sunny today.",
response="It is a bright and sunny day."
)
print(f"Similar (>= 0.8): {result.value}") # 1.0 or 0.0
Batch Evaluation
from ragas.metrics.collections import SemanticSimilarity
metric = SemanticSimilarity(embeddings=embeddings)
results = await metric.abatch_score([
{"reference": "Machine learning is a subset of AI.",
"response": "ML is part of artificial intelligence."},
{"reference": "The sky is blue.",
"response": "Water boils at 100 degrees Celsius."},
])
for i, result in enumerate(results):
print(f"Sample {i}: Similarity = {result.value}")