Implementation:Evidentlyai Evidently Legacy BERTScore Feature

Knowledge Sources	Evidentlyai_Evidently
Domains	ML Monitoring, NLP, Text Similarity
Last Updated	2026-02-14 12:00 GMT

Overview

Computes BERTScore (precision, recall, and F1) between two text columns using BERT embeddings, with optional TF-IDF weighting.

Description

The BERTScoreFeature class extends GeneratedFeature to compute a BERTScore-based similarity metric between two text columns. BERTScore measures the semantic similarity between text pairs by comparing their contextual BERT embeddings at the token level.

The implementation follows these steps:

Tokenization and Embedding: Both columns are tokenized using a BertTokenizer and embedded via a BertModel (defaulting to "bert-base-uncased"). The last hidden state is extracted as token-level embeddings.

IDF Score Computation: The compute_idf_scores() method computes inverse document frequency (IDF) scores across all sentences in both columns combined. It counts document frequency for each unique token (including special [CLS] and [SEP] tokens) and applies -log(count/M) where M is the total number of sentences. IDF arrays are padded to uniform length for batch processing.

Similarity Calculation: The max_similarity() method computes a cosine similarity matrix between token embeddings of two sentences and returns the maximum similarity for each token in the first set.

Score Aggregation: The calculate_scores() method computes recall and precision. When tfidf_weighted is True, the max similarity scores are weighted by IDF values and normalized by IDF sums. When False, simple mean of max similarities is used. The final score is the F1 harmonic mean of precision and recall.

The feature produces a single numerical column named by joining the input column names with a pipe character.

Usage

Use this feature when you need to measure semantic similarity between two text columns, such as comparing generated responses against reference answers, evaluating translation quality, or monitoring text generation drift. The TF-IDF weighting option down-weights common tokens and up-weights rare, informative tokens in the similarity computation.

Code Reference

Source Location

Repository: Evidentlyai_Evidently
File: src/evidently/legacy/features/BERTScore_feature.py

Signature

class BERTScoreFeature(GeneratedFeature):
    class Config:
        type_alias = "evidently:feature:BERTScoreFeature"

    __feature_type__: ClassVar = ColumnType.Numerical
    columns: List[str]
    model: str = "bert-base-uncased"
    tfidf_weighted: bool = False

    def generate_feature(self, data: pd.DataFrame, data_definition: DataDefinition) -> pd.DataFrame: ...
    def compute_idf_scores(self, col1: pd.Series, col2: pd.Series, tokenizer) -> tuple: ...
    def max_similarity(self, embeddings1, embeddings2): ...
    def calculate_scores(self, emb1, emb2, idf_scores, index): ...
    def _feature_name(self): ...
    def _as_column(self) -> ColumnName: ...

Import

from evidently.legacy.features.BERTScore_feature import BERTScoreFeature

I/O Contract

Inputs

Name	Type	Required	Description
columns	List[str]	Yes	A list of exactly two column names: the candidate text column and the reference text column.
model	str	No	The pretrained BERT model identifier. Defaults to "bert-base-uncased".
tfidf_weighted	bool	No	Whether to weight token similarities by IDF scores. Defaults to False.
data	pd.DataFrame	Yes (at generation time)	The input DataFrame containing the text columns specified in columns.
data_definition	DataDefinition	Yes (at generation time)	The data definition describing the dataset schema.

Outputs

Name	Type	Description
return	pd.DataFrame	col2").

Usage Examples

from evidently.legacy.features.BERTScore_feature import BERTScoreFeature

# Basic BERTScore between two text columns
bert_feature = BERTScoreFeature(
    columns=["generated_answer", "reference_answer"],
)

# With TF-IDF weighting and a specific model
bert_feature_weighted = BERTScoreFeature(
    columns=["prediction", "ground_truth"],
    model="bert-base-uncased",
    tfidf_weighted=True,
)

# Generate features from a DataFrame
# result_df = bert_feature.generate_feature(data=my_dataframe, data_definition=my_data_def)

Related Pages

Environment:Evidentlyai_Evidently_Python_Core_Environment
Evidentlyai_Evidently_Legacy_Generated_Features - Base class GeneratedFeature that BERTScoreFeature extends.
Evidentlyai_Evidently_Legacy_Exact_Match_Feature - A simpler text comparison feature for exact string matching.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment