Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Scikit learn contrib Imbalanced learn Balanced Batch Generation

From Leeroopedia
Revision as of 17:38, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Scikit_learn_contrib_Imbalanced_learn_Balanced_Batch_Generation.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Deep_Learning, Data_Preprocessing, Imbalanced_Learning
Last Updated 2026-02-09 03:00 GMT

Overview

A training data generation strategy that produces class-balanced mini-batches for deep learning by resampling the dataset before each epoch, ensuring the neural network sees balanced classes during gradient updates.

Description

Balanced batch generation addresses class imbalance in neural network training at the data loading level. Before each epoch, the full dataset is resampled (typically via under-sampling) to create a balanced index. Mini-batches are then drawn from this balanced index. This ensures every gradient update step sees approximately equal representation from all classes, without modifying the model architecture or loss function.

This approach integrates with Keras/TensorFlow training loops via the Sequence (or PyDataset) API, allowing use with model.fit().

Usage

Use this principle when training neural networks (Keras/TensorFlow) on imbalanced data and you want balanced mini-batches without class-weighted loss functions.

Theoretical Basis

At each epoch:

  1. Apply a sampler (default: RandomUnderSampler) to get balanced indices
  2. Shuffle the balanced indices
  3. Partition indices into mini-batches of size batch_size
  4. Each __getitem__ call returns one balanced batch
# Abstract balanced batch generation (NOT real implementation)
def generate_epoch():
    balanced_indices = sampler.fit_resample(X, y).sample_indices_
    shuffle(balanced_indices)
    for i in range(0, len(balanced_indices), batch_size):
        batch_idx = balanced_indices[i:i+batch_size]
        yield X[batch_idx], y[batch_idx]

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment