Implementation:Scikit learn Scikit learn BenchSAGA
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Benchmarking |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for benchmarking SAGA solver performance on multinomial logistic regression provided by scikit-learn.
Description
This benchmark script compares scikit-learn's SAGA solver against Lightning's SAGA and Liblinear for logistic regression. It demonstrates the gain in using multinomial logistic regression over one-vs-rest strategies. The script evaluates on multiple datasets including 20 newsgroups, RCV1, digits, and iris, measuring log loss and training time across both binary and multinomial settings with L1 and L2 penalties.
Usage
Use this benchmark to evaluate the performance advantage of SAGA's native multinomial logistic regression over one-vs-rest approaches, and to compare solver performance across different dataset sizes and penalty types.
Code Reference
Source Location
- Repository: scikit-learn
- File: benchmarks/bench_saga.py
Signature
def fit_single(
solver,
X,
y,
penalty="l2",
single_target=True,
C=1,
max_iter=10,
skip_slow=False,
dtype=np.float64,
)
Import
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import log_loss
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| solver | str | Yes | Solver name: 'sag', 'saga', 'liblinear', or 'lightning' |
| X | array-like | Yes | Feature matrix |
| y | array-like | Yes | Target labels |
| penalty | str | No | Regularization type: 'l1' or 'l2' (default: 'l2') |
| single_target | bool | No | Whether to use binary classification (default: True) |
| C | float | No | Inverse regularization strength (default: 1) |
| max_iter | int | No | Maximum number of iterations (default: 10) |
| dtype | numpy dtype | No | Data type for arrays (default: np.float64) |
Outputs
| Name | Type | Description |
|---|---|---|
| train_score | float | Log loss on training data |
| test_score | float | Log loss on test data |
| duration | float | Wall clock time of fit |
| Plot | matplotlib figure | Bar plots comparing solver performance |
Usage Examples
Basic Usage
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
clf = LogisticRegression(solver='saga', penalty='l2', max_iter=100, random_state=42)
clf.fit(X_train, y_train)
print("Test accuracy:", clf.score(X_test, y_test))