Principle:Scikit learn contrib Imbalanced learn Balanced Batch Generation
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Data_Preprocessing, Imbalanced_Learning |
| Last Updated | 2026-02-09 03:00 GMT |
Overview
A training data generation strategy that produces class-balanced mini-batches for deep learning by resampling the dataset before each epoch, ensuring the neural network sees balanced classes during gradient updates.
Description
Balanced batch generation addresses class imbalance in neural network training at the data loading level. Before each epoch, the full dataset is resampled (typically via under-sampling) to create a balanced index. Mini-batches are then drawn from this balanced index. This ensures every gradient update step sees approximately equal representation from all classes, without modifying the model architecture or loss function.
This approach integrates with Keras/TensorFlow training loops via the Sequence (or PyDataset) API, allowing use with model.fit().
Usage
Use this principle when training neural networks (Keras/TensorFlow) on imbalanced data and you want balanced mini-batches without class-weighted loss functions.
Theoretical Basis
At each epoch:
- Apply a sampler (default: RandomUnderSampler) to get balanced indices
- Shuffle the balanced indices
- Partition indices into mini-batches of size batch_size
- Each __getitem__ call returns one balanced batch
# Abstract balanced batch generation (NOT real implementation)
def generate_epoch():
balanced_indices = sampler.fit_resample(X, y).sample_indices_
shuffle(balanced_indices)
for i in range(0, len(balanced_indices), batch_size):
batch_idx = balanced_indices[i:i+batch_size]
yield X[batch_idx], y[batch_idx]