Implementation:Evidentlyai Evidently Legacy Semantic Similarity Feature
| Knowledge Sources | |
|---|---|
| Domains | ML Monitoring, NLP, Sentence Embeddings |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
Provides a generated feature that computes the semantic similarity between two text columns using sentence transformer embeddings and normalized cosine distance.
Description
The SemanticSimilarityFeature class extends GeneratedFeature to produce a numerical feature representing the semantic similarity between two text columns. The similarity is computed using the following steps:
- Both columns are encoded into dense vector embeddings using a SentenceTransformer model (default: "all-MiniLM-L6-v2").
- For each row, a normalized cosine distance is computed between the two embedding vectors using the formula: 1 - ((1 - cosine_similarity) / 2), which maps the result to the range [0, 1] where 1 indicates identical meaning and 0 indicates maximum dissimilarity.
NaN values in the text columns are filled with empty strings before encoding. The feature type is ColumnType.Numerical.
The feature column name is the two column names joined by a pipe character (|), and the default display name follows the pattern "Semantic Similarity for {col1} {col2}.".
Usage
Use this feature to measure how semantically similar two pieces of text are. Common use cases include comparing questions and answers, comparing original and paraphrased text, measuring response relevance to input queries, or detecting semantic drift between reference and production text.
Code Reference
Source Location
- Repository: Evidentlyai_Evidently
- File: src/evidently/legacy/features/semantic_similarity_feature.py
Signature
class SemanticSimilarityFeature(GeneratedFeature):
class Config:
type_alias = "evidently:feature:SemanticSimilarityFeature"
__feature_type__: ClassVar = ColumnType.Numerical
columns: List[str]
model: str = "all-MiniLM-L6-v2"
def generate_feature(self, data: pd.DataFrame, data_definition: DataDefinition) -> pd.DataFrame: ...
def _feature_name(self): ...
def _as_column(self) -> ColumnName: ...
Import
from evidently.legacy.features.semantic_similarity_feature import SemanticSimilarityFeature
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| columns | List[str] | Yes | A list of exactly two column names containing the text to compare |
| model | str | No | SentenceTransformer model name (default: "all-MiniLM-L6-v2") |
Outputs
| Name | Type | Description |
|---|---|---|
| return | pd.DataFrame | A single-column DataFrame with float values in [0, 1] representing normalized cosine similarity between the two text columns |
Usage Examples
from evidently.legacy.features.semantic_similarity_feature import SemanticSimilarityFeature
# Compare question and answer similarity with default model
similarity_feature = SemanticSimilarityFeature(
columns=["question", "answer"]
)
# Use a different sentence transformer model
similarity_feature = SemanticSimilarityFeature(
columns=["original_text", "generated_text"],
model="all-mpnet-base-v2"
)