Implementation:Run llama Llama index SemanticSimilarityEvaluator

Knowledge Sources	Run_llama_Llama_index
Domains	Evaluation, Similarity
Last Updated	2026-02-11 19:00 GMT

Overview

Evaluates the quality of a generated response by computing the embedding similarity between the response and a reference answer, without requiring an LLM judge.

Description

The SemanticSimilarityEvaluator is a concrete implementation of BaseEvaluator that measures response quality by comparing the semantic similarity of the generated response to a known reference answer. It is inspired by the paper "Semantic Answer Similarity for Evaluating Question Answering Models" (https://arxiv.org/pdf/2108.06130.pdf).

The evaluator works as follows:

It embeds both the response and reference strings using the configured embedding model (BaseEmbedding).
It computes a similarity score between the two embedding vectors using a configurable similarity function.
It determines a passing result by checking if the similarity score meets or exceeds the similarity_threshold.

The similarity function can be configured in two ways:

By providing a similarity_mode (a SimilarityMode enum value) which uses the built-in similarity function with that mode. The default mode is SimilarityMode.DEFAULT.
By providing a custom similarity_fn callable that accepts two embedding vectors and returns a float. Note that similarity_mode and similarity_fn are mutually exclusive.

Unlike LLM-based evaluators, this evaluator does not require an LLM and has no prompts (_get_prompts returns an empty dict). It ignores the query and contexts parameters, requiring only response and reference.

Usage

Use this evaluator when you need a fast, deterministic, and cost-effective way to evaluate response quality against reference answers. It is particularly useful for regression testing, large-scale evaluation benchmarks where LLM judge calls would be prohibitively expensive, or as a complementary metric alongside LLM-based evaluators.

Code Reference

Source Location

Repository: Run_llama_Llama_index
File: llama-index-core/llama_index/core/evaluation/semantic_similarity.py

Signature

class SemanticSimilarityEvaluator(BaseEvaluator):
    def __init__(
        self,
        embed_model: Optional[BaseEmbedding] = None,
        similarity_fn: Optional[Callable[..., float]] = None,
        similarity_mode: Optional[SimilarityMode] = None,
        similarity_threshold: float = 0.8,
    ) -> None: ...

    async def aevaluate(
        self,
        query: Optional[str] = None,
        response: Optional[str] = None,
        contexts: Optional[Sequence[str]] = None,
        reference: Optional[str] = None,
        **kwargs: Any,
    ) -> EvaluationResult: ...

Import

from llama_index.core.evaluation.semantic_similarity import SemanticSimilarityEvaluator

I/O Contract

Inputs

Name	Type	Required	Description
embed_model	Optional[BaseEmbedding]	No	The embedding model to use. Defaults to Settings.embed_model.
similarity_fn	Optional[Callable[..., float]]	No	Custom similarity function. Mutually exclusive with similarity_mode.
similarity_mode	Optional[SimilarityMode]	No	Similarity computation mode (e.g., cosine). Mutually exclusive with similarity_fn. Defaults to SimilarityMode.DEFAULT.
similarity_threshold	float	No	Minimum similarity score to pass. Defaults to 0.8.
response	str	Yes (aevaluate)	The generated response to evaluate.
reference	str	Yes (aevaluate)	The reference answer to compare against.

Outputs

Name	Type	Description
result	EvaluationResult	Contains the similarity score (float), passing (bool based on threshold), and feedback string with the similarity score.

Usage Examples

from llama_index.core.evaluation.semantic_similarity import SemanticSimilarityEvaluator

# Create the evaluator with default settings
evaluator = SemanticSimilarityEvaluator(
    similarity_threshold=0.8,
)

# Evaluate response against reference
result = await evaluator.aevaluate(
    response="Paris is the capital of France.",
    reference="The capital city of France is Paris.",
)

print(f"Score: {result.score}")       # e.g., 0.95
print(f"Passing: {result.passing}")    # True (score >= 0.8)
print(f"Feedback: {result.feedback}")  # "Similarity score: 0.95"

Related Pages

Environment:Run_llama_Llama_index_Python_LlamaIndex_Core

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment