Principle:Evidentlyai Evidently Text Quality Reporting
| Knowledge Sources | |
|---|---|
| Domains | NLP, Text_Analysis, Data_Quality |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
A preset-based reporting mechanism that summarizes computed text descriptor statistics across a dataset.
Description
Text Quality Reporting uses the TextEvals preset to generate summary statistics for text descriptor columns. After descriptors compute row-level features (sentiment, text length, etc.), TextEvals aggregates these into dataset-level statistics: mean, standard deviation, min, max, quantiles, and value distributions.
This bridges the gap between row-level descriptor computation and dataset-level quality assessment.
Usage
Use after computing text descriptors via Dataset.from_pandas(descriptors=[...]). Include TextEvals in a Report to get aggregated statistics.
Theoretical Basis
Text quality reporting follows the compute-then-aggregate pattern:
# Step 1: Row-level computation (descriptors)
dataset["sentiment"] = [compute_sentiment(row) for row in text_column]
# Step 2: Dataset-level aggregation (TextEvals)
stats = {
"mean_sentiment": mean(dataset["sentiment"]),
"std_sentiment": std(dataset["sentiment"]),
"sentiment_distribution": histogram(dataset["sentiment"]),
}