Heuristic:Haifengl Smile BFGS Convergence Tuning

Knowledge Sources	Smile BFGS numerical stability patterns in smile.math.BFGS
Domains	Optimization, Numerical_Methods
Last Updated	2026-02-08 22:00 GMT

Overview

Tuning strategies for BFGS and L-BFGS-B optimizer convergence, including epsilon control and fallback behaviors for numerical instability.

Description

Smile's BFGS implementation includes three optimizer variants (BFGS, L-BFGS, L-BFGS-B) with configurable convergence parameters. The optimizer uses epsilon-based convergence criteria for both function values and x values, and includes fallback behaviors for handling numerical edge cases like NaN/Infinity from line search failures or roundoff errors. Understanding these internal thresholds and fallback mechanisms is critical for diagnosing optimization failures.

Usage

Use this heuristic when debugging optimization convergence issues, tuning BFGS parameters, or encountering NaN loss or line search failures. It applies to any Smile workflow that calls `BFGS.minimize()` or `BFGS.lbfgs()` or `BFGS.lbfgsb()`, which includes manifold learning, Gaussian Process regression, and MLP training.

The Insight (Rule of Thumb)

Action: Set the `smile.bfgs.epsilon` system property to control convergence sensitivity.
Value: Default is `1E-8`. Convergence tolerances TOLX and TOLF are `4 * EPSILON`. Increase for noisy functions; decrease for high-precision needs.
Trade-off: Smaller epsilon requires more iterations but gives tighter convergence. Larger epsilon converges faster but may stop prematurely.
Fallback: When line search produces bad values (NaN, Infinity, or f(x) increase), L-BFGS-B returns the previous good x rather than failing. Watch for `"bad x produced by line search"` or `"bad f(x) produced by line search"` log warnings.
Max iterations: If the optimizer reaches max iterations without convergence, it logs a warning and returns the best result found. This is not necessarily an error.
Descent direction: A `"search direction is not a descent direction"` warning indicates roundoff problems. Consider rescaling inputs or using a different starting point.

Reasoning

The BFGS convergence criteria are derived from machine epsilon considerations. The epsilon value `1E-8` is intentionally chosen between machine epsilon (~`1E-16` for double) and its square root (~`1E-8`), providing a balance between numerical precision and practical convergence. The TOLX and TOLF multipliers of `4 * EPSILON` provide margin for roundoff accumulation.

The fallback behavior in L-BFGS-B (returning previous good x when line search fails) prevents the optimizer from diverging due to numerical instability, which is especially important for ill-conditioned problems. The `STPMX = 100.0` constant limits the maximum step length to prevent overshooting.

Code evidence from `base/src/main/java/smile/math/BFGS.java:88-95`:

/** A number close to zero, between machine epsilon and its square root. */
private static final double EPSILON = Double.parseDouble(
    System.getProperty("smile.bfgs.epsilon", "1E-8"));
/** The convergence criterion on x values. */
private static final double TOLX = 4 * EPSILON;
/** The convergence criterion on function value. */
private static final double TOLF = 4 * EPSILON;
/** The scaled maximum step length allowed in line searches. */
private static final double STPMX = 100.0;

Line search fallback from `base/src/main/java/smile/math/BFGS.java:662-671`:

logger.warn("L-BFGS-B: bad x produced by line search, return previous good x");
// ...
logger.warn("L-BFGS-B: bad f(x) produced by line search, return previous good x");

Gradient epsilon is also configurable from `base/src/main/java/smile/util/function/DifferentiableMultivariateFunction.java:27`:

double EPSILON = Double.parseDouble(
    System.getProperty("smile.gradient.epsilon", "1E-8"));

Related Pages

Implementation:Haifengl_Smile_Decomposition_Solvers

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment