Heuristic:Scikit learn Scikit learn Feature Scaling Numerical Stability

Knowledge Sources	scikit-learn StandardScaler
Domains	Optimization, Numerical_Stability
Last Updated	2026-02-08 15:00 GMT

Overview

Numerical stability techniques used internally by StandardScaler: handling near-constant features, two-pass centering for precision, and proper penalty scaling in solvers.

Description

StandardScaler and related preprocessing transformers handle several numerical edge cases internally. Near-constant features (where scale approaches machine epsilon) are automatically set to scale=1.0 to avoid division by near-zero values. When centering data, a two-pass algorithm is used to correct floating-point precision errors in the mean. Additionally, LogisticRegression solvers must properly scale the penalty term with `n_samples` because the loss function uses a sum (not mean) of pointwise losses.

Usage

Apply this heuristic when encountering UserWarnings about numerical issues during StandardScaler fitting, or when LogisticRegression produces unexpected results with different sample sizes. Relevant to StandardScaler_Init, LogisticRegression_Fit, and Pipeline_Fit_Predict.

The Insight (Rule of Thumb)

Action: Always scale features before using gradient-based solvers (lbfgs, sag, saga). StandardScaler handles edge cases automatically.
Value: Near-constant threshold is `10 * eps` where eps is the machine epsilon for the dtype (~1.1e-15 for float64).
Trade-off: StandardScaler replaces near-zero scales with 1.0 (feature unchanged), which preserves constant features rather than amplifying noise.
Solver note: The regularization strength `C` in LogisticRegression scales differently than expected: the objective is `C * sum(loss) + penalty`, not `mean(loss) + 1/C * penalty`.

Reasoning

Without the near-constant feature guard, dividing by a very small scale value would amplify floating-point noise into large feature values, causing optimizer instability. The two-pass centering corrects for the loss of precision that occurs when subtracting two nearly equal floating-point numbers (catastrophic cancellation). The penalty scaling note is critical: users often expect `C` to be sample-size invariant, but because sklearn uses `sum` rather than `mean`, the effective regularization strength changes with dataset size.

Code Evidence

Near-constant feature handling from `sklearn/preprocessing/_data.py:99-131`:

# Features with scale close to machine epsilon are set to 1.0
constant_mask = scale < 10 * xp.finfo(scale.dtype).eps

Two-pass centering from `sklearn/preprocessing/_data.py:269-295`:

# If mean centering has precision issues, subtract mean again
# after initial centering to correct floating-point errors

Penalty scaling from `sklearn/linear_model/_logistic.py:309-320`:

# All solvers relying on LinearModelLoss need to scale penalty
# with n_samples because the objective is:
#     C * sum(pointwise_loss) + penalty
# NOT:
#     mean(pointwise_loss) + 1/C * penalty
sw_sum = n_samples  # if sample_weight is None

QuantileTransformer auto-adjustment from `sklearn/preprocessing/_data.py:2884-2888`:

# If n_quantiles > n_samples, automatically set n_quantiles = n_samples
# and issue a warning informing user of this adjustment

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment