Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Evidentlyai Evidently Legacy Semantic Similarity Feature

From Leeroopedia
Knowledge Sources
Domains ML Monitoring, NLP, Sentence Embeddings
Last Updated 2026-02-14 12:00 GMT

Overview

Provides a generated feature that computes the semantic similarity between two text columns using sentence transformer embeddings and normalized cosine distance.

Description

The SemanticSimilarityFeature class extends GeneratedFeature to produce a numerical feature representing the semantic similarity between two text columns. The similarity is computed using the following steps:

  1. Both columns are encoded into dense vector embeddings using a SentenceTransformer model (default: "all-MiniLM-L6-v2").
  2. For each row, a normalized cosine distance is computed between the two embedding vectors using the formula: 1 - ((1 - cosine_similarity) / 2), which maps the result to the range [0, 1] where 1 indicates identical meaning and 0 indicates maximum dissimilarity.

NaN values in the text columns are filled with empty strings before encoding. The feature type is ColumnType.Numerical.

The feature column name is the two column names joined by a pipe character (|), and the default display name follows the pattern "Semantic Similarity for {col1} {col2}.".

Usage

Use this feature to measure how semantically similar two pieces of text are. Common use cases include comparing questions and answers, comparing original and paraphrased text, measuring response relevance to input queries, or detecting semantic drift between reference and production text.

Code Reference

Source Location

Signature

class SemanticSimilarityFeature(GeneratedFeature):
    class Config:
        type_alias = "evidently:feature:SemanticSimilarityFeature"

    __feature_type__: ClassVar = ColumnType.Numerical
    columns: List[str]
    model: str = "all-MiniLM-L6-v2"

    def generate_feature(self, data: pd.DataFrame, data_definition: DataDefinition) -> pd.DataFrame: ...
    def _feature_name(self): ...
    def _as_column(self) -> ColumnName: ...

Import

from evidently.legacy.features.semantic_similarity_feature import SemanticSimilarityFeature

I/O Contract

Inputs

Name Type Required Description
columns List[str] Yes A list of exactly two column names containing the text to compare
model str No SentenceTransformer model name (default: "all-MiniLM-L6-v2")

Outputs

Name Type Description
return pd.DataFrame A single-column DataFrame with float values in [0, 1] representing normalized cosine similarity between the two text columns

Usage Examples

from evidently.legacy.features.semantic_similarity_feature import SemanticSimilarityFeature

# Compare question and answer similarity with default model
similarity_feature = SemanticSimilarityFeature(
    columns=["question", "answer"]
)

# Use a different sentence transformer model
similarity_feature = SemanticSimilarityFeature(
    columns=["original_text", "generated_text"],
    model="all-mpnet-base-v2"
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment