Workflow:Scikit learn contrib Imbalanced learn Imbalanced Model Evaluation
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Model_Evaluation, Imbalanced_Learning |
| Last Updated | 2026-02-09 03:00 GMT |
Overview
End-to-end process for evaluating classifiers trained on imbalanced datasets using specialized metrics that account for class distribution skew, including geometric mean, index balanced accuracy, and the imbalanced classification report.
Description
This workflow provides a comprehensive evaluation framework for classifiers operating on imbalanced data. Standard accuracy is misleading when classes are skewed because a model can achieve high accuracy by simply predicting the majority class. Imbalanced-learn offers metrics that give equal weight to per-class performance: geometric_mean_score (geometric mean of per-class recalls), sensitivity_score and specificity_score (per-class recall variants), make_index_balanced_accuracy (a meta-metric that penalizes imbalanced per-class performance), and classification_report_imbalanced (an extended classification report including specificity, geometric mean, and index balanced accuracy per class).
The workflow covers training a model with resampling, computing predictions, and generating a multi-metric evaluation that captures both overall and per-class performance.
Usage
Execute this workflow after training any classifier on an imbalanced dataset to obtain a meaningful assessment of model quality. This is essential when standard accuracy or F1-score does not adequately reflect minority-class performance. The imbalanced classification report is particularly useful for comparing different resampling strategies or ensemble methods.
Execution Steps
Step 1: Dataset and Model Setup
Generate or load an imbalanced classification dataset and split it into training and testing sets with stratified sampling. Build a classification pipeline using imblearn's Pipeline with a scaler, a SMOTE sampler, and a classifier. This provides the trained model whose predictions will be evaluated.
Key considerations:
- Use stratified splitting to ensure test set reflects the original class distribution
- The evaluation metrics work with any classifier, not just resampled pipelines
- Prepare both y_test (ground truth) and y_pred (predictions) for metric computation
Step 2: Generate Predictions
Train the pipeline on the training set and generate predictions on the held-out test set. Resampling happens only during training; predictions are made on the original imbalanced test data. This mirrors real-world deployment where incoming data follows the natural distribution.
Key considerations:
- Predictions reflect the model's behavior on naturally distributed data
- For probability-based metrics, also generate predict_proba outputs
- Store both predictions and ground truth for comprehensive evaluation
Step 3: Compute Geometric Mean Score
Calculate the geometric_mean_score which computes the geometric mean of per-class recall values. This metric ranges from 0 to 1, where 1 indicates perfect classification across all classes. It penalizes models that perform well on some classes but poorly on others, making it more informative than accuracy for imbalanced problems.
Key considerations:
- Geometric mean is undefined if any class has zero recall (returns 0)
- Supports multi-class problems with micro, macro, and weighted averaging
- A high geometric mean indicates balanced performance across all classes
Step 4: Compute Index Balanced Accuracy
Apply make_index_balanced_accuracy to create a weighted version of any base metric. This meta-metric adjusts a scoring function to account for class imbalance by weighting it with a dominance factor that measures how skewed the per-class performance is. The alpha parameter controls the penalty strength for imbalanced per-class results.
Key considerations:
- make_index_balanced_accuracy wraps any existing scorer function
- The alpha parameter (0 to 1) controls imbalance penalty: higher alpha means stronger penalty
- The squared parameter enables squared weighting for additional penalty
- Returns a new callable scorer that can be used like any sklearn metric
Step 5: Generate Imbalanced Classification Report
Produce a comprehensive per-class report using classification_report_imbalanced. This extends sklearn's classification_report with additional columns for specificity (true negative rate), geometric mean, and index balanced accuracy per class. The report provides a complete picture of model performance broken down by each class.
Key considerations:
- Output format mirrors sklearn's classification_report for familiarity
- Includes precision, recall, specificity, f1, geometric mean, and IBA per class
- Averaged metrics (macro, weighted) are provided in the summary rows
- Use this report to identify which classes the model struggles with