Principle:Scikit learn contrib Imbalanced learn Package Initialization
Principle: Scikit-learn-contrib Imbalanced-learn Package Initialization
Theory: The package initialization module serves as the entry point to the imbalanced-learn library. It follows several design patterns that are common in large Python scientific computing packages, ensuring a clean public API while managing heavyweight optional dependencies gracefully.
Design Pattern 1: Conditional Import Guarding
The __IMBLEARN_SETUP__ flag is injected into Python's __builtins__ by the build process. When present and truthy, the __init__.py skips all subpackage imports and instead writes a short message to stderr:
try:
__IMBLEARN_SETUP__ # type: ignore
except NameError:
__IMBLEARN_SETUP__ = False
if __IMBLEARN_SETUP__:
sys.stderr.write("Partial import of imblearn during the build process.\n")
Rationale: During package installation, the setup script may need to import the package metadata (such as the version string) before compiled extensions are available. The guard allows a partial import that satisfies the build system without triggering import errors from missing compiled modules. This pattern was inherited from scikit-learn itself, which uses the analogous __SKLEARN_SETUP__ flag.
Design Pattern 2: Lazy Module Loading
Heavy optional dependencies -- specifically the keras subpackage, which transitively pulls in TensorFlow -- are wrapped in a custom LazyLoader class. The loader is a subclass of types.ModuleType that defers the actual importlib.import_module call until the first attribute access (__getattr__ or __dir__):
keras = LazyLoader("keras", globals(), "imblearn.keras")
Rationale: TensorFlow and Keras are large dependencies with significant import-time overhead (often several seconds). Most imbalanced-learn users rely only on the traditional machine learning samplers and never touch the deep learning integration. By lazy-loading the keras module, the library avoids penalizing the common case. The pattern is adapted from TensorFlow's own lazy_loader.py.
Mechanism: When imblearn.keras.SomeClass is accessed, the LazyLoader.__getattr__ method fires, which:
- Calls
importlib.import_module("imblearn.keras")to perform the real import. - Replaces the
LazyLoaderplaceholder in the parent module's globals with the real module object. - Updates the
LazyLoaderinstance's own__dict__so that any stale references still resolve attributes efficiently (since__getattr__is only called when normal attribute lookup fails).
Design Pattern 3: Explicit __all__ for Public API Control
The module declares an explicit __all__ list that enumerates exactly which names are part of the public API:
__all__ = [
"combine",
"ensemble",
"exceptions",
"keras",
"metrics",
"model_selection",
"over_sampling",
"tensorflow",
"under_sampling",
"utils",
"pipeline",
"FunctionSampler",
"__version__",
]
Rationale: This controls what is exported by from imblearn import * and serves as documentation of the intended public surface. Internal helpers like LazyLoader, show_versions, and the imported standard library modules (importlib, sys, types) are deliberately excluded.
Summary
| Pattern | Mechanism | Benefit |
|---|---|---|
| Build-time guard | __IMBLEARN_SETUP__ flag |
Safe partial import during installation |
| Lazy loading | Custom LazyLoader(types.ModuleType) |
Avoids multi-second TensorFlow import overhead for non-DL users |
Explicit __all__ |
List of public names | Clean, predictable public API surface |