Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Scikit learn contrib Imbalanced learn Package Initialization

From Leeroopedia


Principle: Scikit-learn-contrib Imbalanced-learn Package Initialization

Theory: The package initialization module serves as the entry point to the imbalanced-learn library. It follows several design patterns that are common in large Python scientific computing packages, ensuring a clean public API while managing heavyweight optional dependencies gracefully.

Design Pattern 1: Conditional Import Guarding

The __IMBLEARN_SETUP__ flag is injected into Python's __builtins__ by the build process. When present and truthy, the __init__.py skips all subpackage imports and instead writes a short message to stderr:

try:
    __IMBLEARN_SETUP__  # type: ignore
except NameError:
    __IMBLEARN_SETUP__ = False

if __IMBLEARN_SETUP__:
    sys.stderr.write("Partial import of imblearn during the build process.\n")

Rationale: During package installation, the setup script may need to import the package metadata (such as the version string) before compiled extensions are available. The guard allows a partial import that satisfies the build system without triggering import errors from missing compiled modules. This pattern was inherited from scikit-learn itself, which uses the analogous __SKLEARN_SETUP__ flag.

Design Pattern 2: Lazy Module Loading

Heavy optional dependencies -- specifically the keras subpackage, which transitively pulls in TensorFlow -- are wrapped in a custom LazyLoader class. The loader is a subclass of types.ModuleType that defers the actual importlib.import_module call until the first attribute access (__getattr__ or __dir__):

keras = LazyLoader("keras", globals(), "imblearn.keras")

Rationale: TensorFlow and Keras are large dependencies with significant import-time overhead (often several seconds). Most imbalanced-learn users rely only on the traditional machine learning samplers and never touch the deep learning integration. By lazy-loading the keras module, the library avoids penalizing the common case. The pattern is adapted from TensorFlow's own lazy_loader.py.

Mechanism: When imblearn.keras.SomeClass is accessed, the LazyLoader.__getattr__ method fires, which:

  1. Calls importlib.import_module("imblearn.keras") to perform the real import.
  2. Replaces the LazyLoader placeholder in the parent module's globals with the real module object.
  3. Updates the LazyLoader instance's own __dict__ so that any stale references still resolve attributes efficiently (since __getattr__ is only called when normal attribute lookup fails).

Design Pattern 3: Explicit __all__ for Public API Control

The module declares an explicit __all__ list that enumerates exactly which names are part of the public API:

__all__ = [
    "combine",
    "ensemble",
    "exceptions",
    "keras",
    "metrics",
    "model_selection",
    "over_sampling",
    "tensorflow",
    "under_sampling",
    "utils",
    "pipeline",
    "FunctionSampler",
    "__version__",
]

Rationale: This controls what is exported by from imblearn import * and serves as documentation of the intended public surface. Internal helpers like LazyLoader, show_versions, and the imported standard library modules (importlib, sys, types) are deliberately excluded.

Summary

Pattern Mechanism Benefit
Build-time guard __IMBLEARN_SETUP__ flag Safe partial import during installation
Lazy loading Custom LazyLoader(types.ModuleType) Avoids multi-second TensorFlow import overhead for non-DL users
Explicit __all__ List of public names Clean, predictable public API surface

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment