Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Avhz RustQuant Learning Rate Tuning

From Leeroopedia



Knowledge Sources
Domains Optimization, Calibration
Last Updated 2026-02-07 19:00 GMT

Overview

Tuning the learning rate and convergence tolerance for RustQuant's gradient descent optimizer, with `sqrt(f64::EPSILON)` as the default tolerance.

Description

RustQuant's `GradientDescent` optimizer uses a fixed learning rate with convergence checked via the Euclidean norm of the gradient against a tolerance threshold. The default tolerance is `f64::EPSILON.sqrt()` (approximately `1.49e-8`), which balances machine precision limits with practical convergence. The test suite reveals that learning rate selection depends heavily on the objective function landscape: well-behaved convex functions tolerate `lr = 0.1`, while ill-conditioned functions like Rosenbrock require `lr = 0.001` with 10x more iterations.

Usage

Apply this heuristic when using `GradientDescent::new()` for model calibration workflows. Start with the recommended defaults and adjust based on observed convergence behavior.

The Insight (Rule of Thumb)

  • Default Tolerance: Leave tolerance as `None` to use `f64::EPSILON.sqrt()` (~1.49e-8). This is near-optimal for double-precision floating point.
  • Well-conditioned functions (x^2, Booth): Use `learning_rate = 0.1`, `max_iterations = 1000`.
  • Ill-conditioned functions (Rosenbrock): Use `learning_rate = 0.001`, `max_iterations = 10000`.
  • Trade-off: Larger learning rate converges faster but risks oscillation or divergence. Smaller learning rate is more stable but may require many more iterations.
  • Diagnostic: Enable `verbose = true` to monitor gradient norm and function value per iteration. If oscillations occur, reduce the learning rate by 10x.

Reasoning

The convergence tolerance `f64::EPSILON.sqrt()` is the standard choice for gradient-based methods because:

  • Gradient computations via autodiff introduce rounding errors of order `f64::EPSILON`
  • The gradient norm threshold should be above the noise floor but below meaningful signal
  • `sqrt(EPSILON) ~ 1.49e-8` sits at the geometric mean between machine precision and unity

The Rosenbrock function is a classic test for optimizers because its minimum lies in a narrow curved valley where the gradient is nearly perpendicular to the valley direction. The 100x ratio between the `0.1` and `0.001` learning rates directly reflects the condition number difference between well-conditioned and ill-conditioned problems.

Code Evidence

Default tolerance fallback from `gradient_descent.rs:160`:

let tolerance = self.tolerance.unwrap_or(f64::EPSILON.sqrt());

Stationarity check from `gradient_descent.rs:137-139`:

fn is_stationary(gradient: &[f64], tol: f64) -> bool {
    gradient.iter().map(|&x| x * x).sum::<f64>().sqrt() < tol
}

Well-conditioned test (lr=0.1, 1000 iters) from `gradient_descent.rs:262`:

let gd = GradientDescent::new(0.1, 1000, Some(0.000_001));
let result = gd.optimize(f, &[10.0], false);

Ill-conditioned test (lr=0.001, 10000 iters) from `gradient_descent.rs:309`:

let gd = GradientDescent::new(0.001, 10000, Some(0.000_001));
let result = gd.optimize(f, &[0.0, 5.0], false);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment