Implementation:Interpretml Interpret Powerlift TaskMeasures

Knowledge Sources	Interpretml Interpret
Domains	Benchmarking, Statistics, Data_Analysis
Last Updated	2026-02-07 12:00 GMT

Overview

Collection of statistical measure functions that compute dataset characteristics (entropy, class statistics, regression statistics, and data statistics) for Powerlift benchmarking tasks.

Description

This module provides four functions used to compute and populate statistical metadata for benchmarking datasets:

entropy() -- Computes the Shannon entropy of a label distribution. Supports configurable logarithmic base and normalized entropy (entropy divided by the maximum possible entropy for the number of classes). Returns 0 for distributions with 1 or fewer labels/classes.

class_stats() -- Computes classification-specific statistics for a target series: number of classes, normalized entropy, minimum class count, and maximum class count. Results are written into a provided meta dictionary.

regression_stats() -- Computes regression-specific statistics for a response series: number of classes (set to 0), minimum value, average value, and maximum value. Results are written into a provided meta dictionary.

data_stats() -- Computes feature-level statistics for a DataFrame: number of samples, number of features, maximum unique continuous values, maximum categories per feature, total categories, percentage of categorical features, and percentage of special values (NaN, empty strings for categorical; NaN, zero for continuous). Results are written into a provided meta dictionary.

All statistics functions populate a metadata dictionary that is later stored in the Task model's meta field and individual Task columns.

Usage

Use these functions when registering new datasets with Powerlift. They are called during task creation to compute the statistical profile of each dataset, which is stored alongside the data for filtering and analysis of benchmark results.

Code Reference

Source Location

Repository: Interpretml_Interpret
File: python/powerlift/powerlift/measures/task_measures.py

Signature

def entropy(
    labels: Iterable, base: Optional[Number] = None, normalized: bool = False
) -> Number:
    ...

def class_stats(y: pd.Series, meta):
    ...

def regression_stats(y: pd.Series, meta):
    ...

def data_stats(X: pd.DataFrame, categorical_mask: Iterable[bool], meta):
    ...

Import

from powerlift.measures.task_measures import entropy, class_stats, regression_stats, data_stats

I/O Contract

Inputs

Name	Type	Required	Description
labels	Iterable	Yes	Label array for entropy computation
base	Number	No	Logarithmic base for entropy (defaults to e for natural log)
normalized	bool	No	Whether to return normalized entropy (default: False)
y	pd.Series	Yes	Target/response series for class_stats or regression_stats
X	pd.DataFrame	Yes	Feature DataFrame for data_stats
categorical_mask	Iterable[bool]	Yes	Boolean mask indicating which columns are categorical (for data_stats)
meta	dict	Yes	Mutable dictionary to populate with computed statistics

Outputs

Name	Type	Description
entropy return	Number	Shannon entropy value (or normalized entropy)
meta["n_classes"]	int	Number of unique classes (set by class_stats)
meta["class_normalized_entropy"]	float	Normalized entropy of class distribution (set by class_stats)
meta["min_class_count"]	int	Minimum class count (set by class_stats)
meta["max_class_count"]	int	Maximum class count (set by class_stats)
meta["n_samples"]	int	Number of samples (set by data_stats)
meta["n_features"]	int	Number of features (set by data_stats)
meta["max_unique_continuous"]	int	Maximum unique values in any continuous feature (set by data_stats)
meta["max_categories"]	int	Maximum categories in any categorical feature (set by data_stats)
meta["total_categories"]	int	Total categories across all categorical features (set by data_stats)
meta["percent_categorical"]	float	Proportion of features that are categorical (set by data_stats)
meta["percent_special_values"]	float	Proportion of cells with special values (set by data_stats)

Usage Examples

import pandas as pd
from powerlift.measures.task_measures import entropy, class_stats, data_stats

# Compute entropy of a label distribution
labels = [0, 0, 1, 1, 2, 2, 2]
ent = entropy(labels)
norm_ent = entropy(labels, normalized=True)

# Compute class statistics
y = pd.Series([0, 0, 1, 1, 2, 2, 2])
meta = {}
class_stats(y, meta)
# meta == {"n_classes": 3, "class_normalized_entropy": ..., "min_class_count": 2, "max_class_count": 3}

# Compute data statistics
X = pd.DataFrame({"age": [25, 30, 35], "color": ["red", "blue", "red"]})
categorical_mask = [False, True]
data_stats(X, categorical_mask, meta)
# meta now also includes n_samples, n_features, max_unique_continuous, etc.

Related Pages

Interpretml_Interpret_Powerlift_Schema -- Task model where computed statistics are stored
Interpretml_Interpret_Powerlift_RunTrials -- Trial runner that accesses task metadata during execution

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment