Environment:Microsoft Onnxruntime Sklearn Conversion Environment

Field	Value
sources	docs/python/examples/plot_train_convert_predict.py, docs/python/requirements.txt
domains	scikit-learn, model-conversion, onnx, inference, validation
last_updated	2026-02-10

Overview

Python environment for training scikit-learn models, converting them to ONNX format with skl2onnx, and validating predictions using onnxruntime.

Description

The Sklearn Conversion Environment supports the complete workflow of training a machine learning model with scikit-learn, converting it to the ONNX format, and running validation inference through ONNX Runtime. The conversion pipeline uses the skl2onnx library, which provides the convert_sklearn() function to translate trained scikit-learn estimators (such as LogisticRegression, RandomForestClassifier, GradientBoostingRegressor, etc.) into ONNX graphs. Input types are specified using FloatTensorType to define the expected tensor shape and data type. Once converted, the ONNX model is loaded into an InferenceSession for prediction, enabling side-by-side validation against the original scikit-learn model. This workflow is demonstrated in plot_train_convert_predict.py, which serves as both a runnable example and a Sphinx-gallery documentation page. The documentation build environment additionally requires sphinx, matplotlib, and related packages as listed in docs/python/requirements.txt.

Usage

Use this environment whenever you need to:

Convert a trained scikit-learn model to ONNX format for deployment.
Validate that the ONNX-converted model produces outputs matching the original scikit-learn model.
Deploy scikit-learn models into ONNX Runtime-based inference pipelines.
Generate documentation or tutorials for the sklearn-to-ONNX conversion workflow.

System Requirements

Requirement	Minimum	Recommended
Python	3.10	3.12
Operating System	Linux, Windows, macOS	Any
RAM	2 GB	8 GB (for larger datasets)
Disk	500 MB	1 GB (with documentation build)

Dependencies

System Packages

No additional system packages are required beyond a standard Python installation. All dependencies are Python packages.

Python Packages

Package	Version Constraint	Purpose
scikit-learn	(latest)	Model training (e.g., `LogisticRegression`, `RandomForest`)
skl2onnx	(latest)	Conversion of sklearn models to ONNX via `convert_sklearn()`
numpy	>= 1.21.6	Data array construction and manipulation
onnxruntime	1.25.0	Inference on converted ONNX models
onnx	(latest)	ONNX model format library (dependency of skl2onnx)

Documentation Build Dependencies (docs/python/requirements.txt)

Package	Purpose
sphinx	Documentation generator
matplotlib	Plot generation for documentation examples
sphinx-gallery	Auto-generates documentation pages from example scripts
numpydoc	NumPy-style docstring rendering in Sphinx

Credentials

No credentials, API keys, or environment variables are required for this environment.

Quick Install

pip install scikit-learn skl2onnx onnxruntime numpy

For documentation build:

pip install -r docs/python/requirements.txt

Verify installation:

python -c "import sklearn; import skl2onnx; import onnxruntime; print('All packages loaded successfully')"

Code Evidence

Sklearn model training and conversion (plot_train_convert_predict.py)

# docs/python/examples/plot_train_convert_predict.py
from sklearn.linear_model import LogisticRegression
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType

# Train a scikit-learn model
model = LogisticRegression()
model.fit(X_train, y_train)

# Define the input type for conversion
initial_type = [('float_input', FloatTensorType([None, X_train.shape[1]]))]

# Convert the trained model to ONNX format
onnx_model = convert_sklearn(model, initial_types=initial_type)

This snippet from the example script demonstrates the core three-step workflow: train a scikit-learn estimator, define input types using FloatTensorType, and convert to ONNX with convert_sklearn().

ONNX Runtime validation inference (plot_train_convert_predict.py)

# docs/python/examples/plot_train_convert_predict.py
import onnxruntime as rt

# Load the converted ONNX model into an InferenceSession
session = rt.InferenceSession(onnx_model.SerializeToString())

# Run inference using the ONNX Runtime session
input_name = session.get_inputs()[0].name
pred_onnx = session.run(None, {input_name: X_test.astype(numpy.float32)})

After conversion, the ONNX model is loaded into an InferenceSession and used for prediction, allowing direct comparison against model.predict(X_test) from scikit-learn.

FloatTensorType for input specification (plot_train_convert_predict.py)

# docs/python/examples/plot_train_convert_predict.py
from skl2onnx.common.data_types import FloatTensorType

# FloatTensorType([None, n_features]) specifies:
#   - None: dynamic batch dimension
#   - n_features: fixed feature count matching training data
initial_type = [('float_input', FloatTensorType([None, 4]))]

Common Errors

Error	Cause	Solution
`ModuleNotFoundError: No module named 'skl2onnx'`	skl2onnx not installed	Run `pip install skl2onnx`
`RuntimeError: Unsupported sklearn operator`	The sklearn estimator type is not supported by skl2onnx	Check the skl2onnx supported operators list and update skl2onnx to the latest version
`InvalidArgument: input tensor data type mismatch`	Input data passed as float64 instead of float32	Cast input: `X_test.astype(numpy.float32)`
`RuntimeError: shape mismatch`	Number of features in input does not match the model	Ensure `FloatTensorType([None, n_features])` matches training data dimensions
`ValueError: initial_types cannot be None`	`convert_sklearn()` called without specifying input types	Always provide `initial_types` parameter with `FloatTensorType` definitions
`ImportError: cannot import name 'FloatTensorType'`	Outdated version of skl2onnx	Upgrade: `pip install --upgrade skl2onnx`

Compatibility Notes

scikit-learn versions: The skl2onnx converter supports scikit-learn 1.0 and later. Some newer estimators or parameters may require updating skl2onnx to the latest version.
ONNX opset versions: The default target opset for conversion depends on the skl2onnx version. You can specify a target opset explicitly: convert_sklearn(model, initial_types=initial_type, target_opset=18).
Data types: ONNX Runtime expects float32 input by default. Always cast NumPy arrays to float32 before running inference, even if scikit-learn used float64 internally.
Pipelines: Complete scikit-learn Pipeline objects (including preprocessors like StandardScaler, OneHotEncoder) can be converted as a single unit.
Custom estimators: Custom sklearn-compatible estimators require registering a custom converter with skl2onnx before conversion.
Cross-platform: The converted ONNX model is platform-independent and can be deployed on any OS or runtime that supports ONNX Runtime.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment