Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Interpretml Interpret Powerlift TaskMeasures

From Leeroopedia


Knowledge Sources
Domains Benchmarking, Statistics, Data_Analysis
Last Updated 2026-02-07 12:00 GMT

Overview

Collection of statistical measure functions that compute dataset characteristics (entropy, class statistics, regression statistics, and data statistics) for Powerlift benchmarking tasks.

Description

This module provides four functions used to compute and populate statistical metadata for benchmarking datasets:

  • entropy() -- Computes the Shannon entropy of a label distribution. Supports configurable logarithmic base and normalized entropy (entropy divided by the maximum possible entropy for the number of classes). Returns 0 for distributions with 1 or fewer labels/classes.
  • class_stats() -- Computes classification-specific statistics for a target series: number of classes, normalized entropy, minimum class count, and maximum class count. Results are written into a provided meta dictionary.
  • regression_stats() -- Computes regression-specific statistics for a response series: number of classes (set to 0), minimum value, average value, and maximum value. Results are written into a provided meta dictionary.
  • data_stats() -- Computes feature-level statistics for a DataFrame: number of samples, number of features, maximum unique continuous values, maximum categories per feature, total categories, percentage of categorical features, and percentage of special values (NaN, empty strings for categorical; NaN, zero for continuous). Results are written into a provided meta dictionary.

All statistics functions populate a metadata dictionary that is later stored in the Task model's meta field and individual Task columns.

Usage

Use these functions when registering new datasets with Powerlift. They are called during task creation to compute the statistical profile of each dataset, which is stored alongside the data for filtering and analysis of benchmark results.

Code Reference

Source Location

Signature

def entropy(
    labels: Iterable, base: Optional[Number] = None, normalized: bool = False
) -> Number:
    ...

def class_stats(y: pd.Series, meta):
    ...

def regression_stats(y: pd.Series, meta):
    ...

def data_stats(X: pd.DataFrame, categorical_mask: Iterable[bool], meta):
    ...

Import

from powerlift.measures.task_measures import entropy, class_stats, regression_stats, data_stats

I/O Contract

Inputs

Name Type Required Description
labels Iterable Yes Label array for entropy computation
base Number No Logarithmic base for entropy (defaults to e for natural log)
normalized bool No Whether to return normalized entropy (default: False)
y pd.Series Yes Target/response series for class_stats or regression_stats
X pd.DataFrame Yes Feature DataFrame for data_stats
categorical_mask Iterable[bool] Yes Boolean mask indicating which columns are categorical (for data_stats)
meta dict Yes Mutable dictionary to populate with computed statistics

Outputs

Name Type Description
entropy return Number Shannon entropy value (or normalized entropy)
meta["n_classes"] int Number of unique classes (set by class_stats)
meta["class_normalized_entropy"] float Normalized entropy of class distribution (set by class_stats)
meta["min_class_count"] int Minimum class count (set by class_stats)
meta["max_class_count"] int Maximum class count (set by class_stats)
meta["n_samples"] int Number of samples (set by data_stats)
meta["n_features"] int Number of features (set by data_stats)
meta["max_unique_continuous"] int Maximum unique values in any continuous feature (set by data_stats)
meta["max_categories"] int Maximum categories in any categorical feature (set by data_stats)
meta["total_categories"] int Total categories across all categorical features (set by data_stats)
meta["percent_categorical"] float Proportion of features that are categorical (set by data_stats)
meta["percent_special_values"] float Proportion of cells with special values (set by data_stats)

Usage Examples

import pandas as pd
from powerlift.measures.task_measures import entropy, class_stats, data_stats

# Compute entropy of a label distribution
labels = [0, 0, 1, 1, 2, 2, 2]
ent = entropy(labels)
norm_ent = entropy(labels, normalized=True)

# Compute class statistics
y = pd.Series([0, 0, 1, 1, 2, 2, 2])
meta = {}
class_stats(y, meta)
# meta == {"n_classes": 3, "class_normalized_entropy": ..., "min_class_count": 2, "max_class_count": 3}

# Compute data statistics
X = pd.DataFrame({"age": [25, 30, 35], "color": ["red", "blue", "red"]})
categorical_mask = [False, True]
data_stats(X, categorical_mask, meta)
# meta now also includes n_samples, n_features, max_unique_continuous, etc.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment