Workflow:Scikit learn contrib Imbalanced learn Imbalanced Model Evaluation

Knowledge Sources	imbalanced-learn imbalanced-learn Metrics Docs scikit-learn Metrics
Domains	Machine_Learning, Model_Evaluation, Imbalanced_Learning
Last Updated	2026-02-09 03:00 GMT

Overview

End-to-end process for evaluating classifiers trained on imbalanced datasets using specialized metrics that account for class distribution skew, including geometric mean, index balanced accuracy, and the imbalanced classification report.

Description

This workflow provides a comprehensive evaluation framework for classifiers operating on imbalanced data. Standard accuracy is misleading when classes are skewed because a model can achieve high accuracy by simply predicting the majority class. Imbalanced-learn offers metrics that give equal weight to per-class performance: geometric_mean_score (geometric mean of per-class recalls), sensitivity_score and specificity_score (per-class recall variants), make_index_balanced_accuracy (a meta-metric that penalizes imbalanced per-class performance), and classification_report_imbalanced (an extended classification report including specificity, geometric mean, and index balanced accuracy per class).

The workflow covers training a model with resampling, computing predictions, and generating a multi-metric evaluation that captures both overall and per-class performance.

Usage

Execute this workflow after training any classifier on an imbalanced dataset to obtain a meaningful assessment of model quality. This is essential when standard accuracy or F1-score does not adequately reflect minority-class performance. The imbalanced classification report is particularly useful for comparing different resampling strategies or ensemble methods.

Execution Steps

Step 1: Dataset and Model Setup

Generate or load an imbalanced classification dataset and split it into training and testing sets with stratified sampling. Build a classification pipeline using imblearn's Pipeline with a scaler, a SMOTE sampler, and a classifier. This provides the trained model whose predictions will be evaluated.

Key considerations:

Use stratified splitting to ensure test set reflects the original class distribution
The evaluation metrics work with any classifier, not just resampled pipelines
Prepare both y_test (ground truth) and y_pred (predictions) for metric computation

Step 2: Generate Predictions

Train the pipeline on the training set and generate predictions on the held-out test set. Resampling happens only during training; predictions are made on the original imbalanced test data. This mirrors real-world deployment where incoming data follows the natural distribution.

Key considerations:

Predictions reflect the model's behavior on naturally distributed data
For probability-based metrics, also generate predict_proba outputs
Store both predictions and ground truth for comprehensive evaluation

Step 3: Compute Geometric Mean Score

Calculate the geometric_mean_score which computes the geometric mean of per-class recall values. This metric ranges from 0 to 1, where 1 indicates perfect classification across all classes. It penalizes models that perform well on some classes but poorly on others, making it more informative than accuracy for imbalanced problems.

Key considerations:

Geometric mean is undefined if any class has zero recall (returns 0)
Supports multi-class problems with micro, macro, and weighted averaging
A high geometric mean indicates balanced performance across all classes

Step 4: Compute Index Balanced Accuracy

Apply make_index_balanced_accuracy to create a weighted version of any base metric. This meta-metric adjusts a scoring function to account for class imbalance by weighting it with a dominance factor that measures how skewed the per-class performance is. The alpha parameter controls the penalty strength for imbalanced per-class results.

Key considerations:

make_index_balanced_accuracy wraps any existing scorer function
The alpha parameter (0 to 1) controls imbalance penalty: higher alpha means stronger penalty
The squared parameter enables squared weighting for additional penalty
Returns a new callable scorer that can be used like any sklearn metric

Step 5: Generate Imbalanced Classification Report

Produce a comprehensive per-class report using classification_report_imbalanced. This extends sklearn's classification_report with additional columns for specificity (true negative rate), geometric mean, and index balanced accuracy per class. The report provides a complete picture of model performance broken down by each class.

Key considerations:

Output format mirrors sklearn's classification_report for familiarity
Includes precision, recall, specificity, f1, geometric mean, and IBA per class
Averaged metrics (macro, weighted) are provided in the summary rows
Use this report to identify which classes the model struggles with

Execution Diagram

GitHub URL

Workflow Repository