Implementation:Scikit learn Scikit learn SAG Solver
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Stochastic Optimization |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for solving Ridge and Logistic Regression optimization problems using the Stochastic Average Gradient (SAG/SAGA) algorithm provided by scikit-learn.
Description
The sag_solver function implements the Stochastic Average Gradient (SAG) and SAGA optimization algorithms for fitting Ridge regression and Logistic Regression models. SAG maintains a running average of past gradients, achieving a linear convergence rate while only computing one gradient per iteration. SAGA is a variant that supports non-strongly convex composite objectives and non-smooth penalties. The module also provides the get_auto_step_size helper for computing the theoretically optimal step size based on the maximum squared sum of features and the regularization parameter.
Usage
The SAG solver is used internally by Ridge and LogisticRegression when solver='sag' or solver='saga' is specified. It is especially efficient for large datasets where computing the full gradient at each step would be expensive. SAG/SAGA are preferred when you have a large number of samples and want faster convergence than vanilla SGD while maintaining low per-iteration cost.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/linear_model/_sag.py
Signature
def sag_solver(
X,
y,
sample_weight=None,
loss="log",
alpha=1.0,
beta=0.0,
max_iter=1000,
tol=0.001,
verbose=0,
random_state=None,
check_input=True,
max_squared_sum=None,
warm_start_mem=None,
is_saga=False,
):
def get_auto_step_size(
max_squared_sum, alpha_scaled, loss, fit_intercept,
n_samples=None, is_saga=False,
):
Import
from sklearn.linear_model._sag import sag_solver
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| X | ndarray | Yes | Training data of shape (n_samples, n_features) |
| y | ndarray | Yes | Target values of shape (n_samples,) or (n_samples, n_classes) |
| sample_weight | ndarray | No | Weight assigned to each sample (default=None) |
| loss | str | No | Loss function: 'log', 'squared', or 'multinomial' (default='log') |
| alpha | float | No | L2 regularization strength (default=1.0) |
| beta | float | No | L1 regularization strength for SAGA (default=0.0) |
| max_iter | int | No | Maximum number of passes over the data (default=1000) |
| tol | float | No | Convergence tolerance based on relative change (default=0.001) |
| random_state | int or RandomState | No | Random seed for sample selection |
| max_squared_sum | float | No | Maximum squared sum of X over samples for step size computation |
| warm_start_mem | dict | No | Dictionary containing previous gradient memory for warm start |
| is_saga | bool | No | Whether to use SAGA variant instead of SAG (default=False) |
Outputs
| Name | Type | Description |
|---|---|---|
| coef_ | ndarray | Optimized weight coefficients |
| n_iter_ | int | Number of iterations actually performed |
| warm_start_mem | dict | Gradient memory dict that can be used for warm starting |
Usage Examples
Basic Usage
# SAG solver is typically used internally via Ridge or LogisticRegression.
from sklearn.linear_model import Ridge
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=1000, n_features=20, noise=10, random_state=42)
model = Ridge(alpha=1.0, solver="sag", random_state=42)
model.fit(X, y)
print("Score:", model.score(X, y))