Implementation:Evidentlyai Evidently Legacy Semantic Similarity Feature

Knowledge Sources	Evidentlyai_Evidently
Domains	ML Monitoring, NLP, Sentence Embeddings
Last Updated	2026-02-14 12:00 GMT

Overview

Provides a generated feature that computes the semantic similarity between two text columns using sentence transformer embeddings and normalized cosine distance.

Description

The SemanticSimilarityFeature class extends GeneratedFeature to produce a numerical feature representing the semantic similarity between two text columns. The similarity is computed using the following steps:

Both columns are encoded into dense vector embeddings using a SentenceTransformer model (default: "all-MiniLM-L6-v2").
For each row, a normalized cosine distance is computed between the two embedding vectors using the formula: 1 - ((1 - cosine_similarity) / 2), which maps the result to the range [0, 1] where 1 indicates identical meaning and 0 indicates maximum dissimilarity.

NaN values in the text columns are filled with empty strings before encoding. The feature type is ColumnType.Numerical.

The feature column name is the two column names joined by a pipe character (|), and the default display name follows the pattern "Semantic Similarity for {col1} {col2}.".

Usage

Use this feature to measure how semantically similar two pieces of text are. Common use cases include comparing questions and answers, comparing original and paraphrased text, measuring response relevance to input queries, or detecting semantic drift between reference and production text.

Code Reference

Source Location

Repository: Evidentlyai_Evidently
File: src/evidently/legacy/features/semantic_similarity_feature.py

Signature

class SemanticSimilarityFeature(GeneratedFeature):
    class Config:
        type_alias = "evidently:feature:SemanticSimilarityFeature"

    __feature_type__: ClassVar = ColumnType.Numerical
    columns: List[str]
    model: str = "all-MiniLM-L6-v2"

    def generate_feature(self, data: pd.DataFrame, data_definition: DataDefinition) -> pd.DataFrame: ...
    def _feature_name(self): ...
    def _as_column(self) -> ColumnName: ...

Import

from evidently.legacy.features.semantic_similarity_feature import SemanticSimilarityFeature

I/O Contract

Inputs

Name	Type	Required	Description
columns	List[str]	Yes	A list of exactly two column names containing the text to compare
model	str	No	SentenceTransformer model name (default: "all-MiniLM-L6-v2")

Outputs

Name	Type	Description
return	pd.DataFrame	A single-column DataFrame with float values in [0, 1] representing normalized cosine similarity between the two text columns

Usage Examples

from evidently.legacy.features.semantic_similarity_feature import SemanticSimilarityFeature

# Compare question and answer similarity with default model
similarity_feature = SemanticSimilarityFeature(
    columns=["question", "answer"]
)

# Use a different sentence transformer model
similarity_feature = SemanticSimilarityFeature(
    columns=["original_text", "generated_text"],
    model="all-mpnet-base-v2"
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment