Implementation:Scikit learn Scikit learn BenchRCV1LogregConvergence
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Benchmarking |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for benchmarking logistic regression convergence on the RCV1 dataset provided by scikit-learn.
Description
This benchmark script compares the convergence behavior of various logistic regression solvers on the RCV1 text classification dataset. It evaluates scikit-learn's LogisticRegression (with SAG, SAGA, and liblinear solvers), SGDClassifier, and optionally Lightning's implementations. The script uses joblib caching to speed up repeated runs and measures train/test loss and accuracy across different numbers of iterations.
Usage
Use this benchmark to evaluate which logistic regression solver converges fastest on large-scale sparse text classification problems, and to compare the convergence profiles of different optimization algorithms.
Code Reference
Source Location
- Repository: scikit-learn
- File: benchmarks/bench_rcv1_logreg_convergence.py
Signature
def get_loss(w, intercept, myX, myy, C)
def bench_one(name, clf_type, clf_params, n_iter)
from sklearn.linear_model import LogisticRegression, SGDClassifier
from sklearn.datasets import fetch_rcv1
Import
from sklearn.linear_model import LogisticRegression, SGDClassifier
from sklearn.datasets import fetch_rcv1
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| name | str | Yes | Identifier for the solver being benchmarked |
| clf_type | class | Yes | Classifier class (LogisticRegression, SGDClassifier, etc.) |
| clf_params | dict | Yes | Parameters for the classifier constructor |
| n_iter | int | Yes | Number of iterations to run |
Outputs
| Name | Type | Description |
|---|---|---|
| train_loss | float | Logistic loss on training data |
| train_score | float | Accuracy on training data |
| test_score | float | Accuracy on test data |
| duration | float | Wall clock time of fit |
| Plot | matplotlib figure | Convergence curves for all solvers |
Usage Examples
Basic Usage
from sklearn.datasets import fetch_rcv1
from sklearn.linear_model import LogisticRegression
rcv1 = fetch_rcv1()
X, y = rcv1.data, rcv1.target
clf = LogisticRegression(solver='saga', max_iter=100, random_state=42)
clf.fit(X, y)
print("Score:", clf.score(X, y))