Environment:Microsoft Onnxruntime Sklearn Conversion Environment
| Field | Value |
|---|---|
| sources | docs/python/examples/plot_train_convert_predict.py, docs/python/requirements.txt |
| domains | scikit-learn, model-conversion, onnx, inference, validation |
| last_updated | 2026-02-10 |
Overview
Python environment for training scikit-learn models, converting them to ONNX format with skl2onnx, and validating predictions using onnxruntime.
Description
The Sklearn Conversion Environment supports the complete workflow of training a machine learning model with scikit-learn, converting it to the ONNX format, and running validation inference through ONNX Runtime. The conversion pipeline uses the skl2onnx library, which provides the convert_sklearn() function to translate trained scikit-learn estimators (such as LogisticRegression, RandomForestClassifier, GradientBoostingRegressor, etc.) into ONNX graphs. Input types are specified using FloatTensorType to define the expected tensor shape and data type. Once converted, the ONNX model is loaded into an InferenceSession for prediction, enabling side-by-side validation against the original scikit-learn model. This workflow is demonstrated in plot_train_convert_predict.py, which serves as both a runnable example and a Sphinx-gallery documentation page. The documentation build environment additionally requires sphinx, matplotlib, and related packages as listed in docs/python/requirements.txt.
Usage
Use this environment whenever you need to:
- Convert a trained scikit-learn model to ONNX format for deployment.
- Validate that the ONNX-converted model produces outputs matching the original scikit-learn model.
- Deploy scikit-learn models into ONNX Runtime-based inference pipelines.
- Generate documentation or tutorials for the sklearn-to-ONNX conversion workflow.
System Requirements
| Requirement | Minimum | Recommended |
|---|---|---|
| Python | 3.10 | 3.12 |
| Operating System | Linux, Windows, macOS | Any |
| RAM | 2 GB | 8 GB (for larger datasets) |
| Disk | 500 MB | 1 GB (with documentation build) |
Dependencies
System Packages
No additional system packages are required beyond a standard Python installation. All dependencies are Python packages.
Python Packages
| Package | Version Constraint | Purpose |
|---|---|---|
| scikit-learn | (latest) | Model training (e.g., LogisticRegression, RandomForest)
|
| skl2onnx | (latest) | Conversion of sklearn models to ONNX via convert_sklearn()
|
| numpy | >= 1.21.6 | Data array construction and manipulation |
| onnxruntime | 1.25.0 | Inference on converted ONNX models |
| onnx | (latest) | ONNX model format library (dependency of skl2onnx) |
Documentation Build Dependencies (docs/python/requirements.txt)
| Package | Purpose |
|---|---|
| sphinx | Documentation generator |
| matplotlib | Plot generation for documentation examples |
| sphinx-gallery | Auto-generates documentation pages from example scripts |
| numpydoc | NumPy-style docstring rendering in Sphinx |
Credentials
No credentials, API keys, or environment variables are required for this environment.
Quick Install
pip install scikit-learn skl2onnx onnxruntime numpy
For documentation build:
pip install -r docs/python/requirements.txt
Verify installation:
python -c "import sklearn; import skl2onnx; import onnxruntime; print('All packages loaded successfully')"
Code Evidence
Sklearn model training and conversion (plot_train_convert_predict.py)
# docs/python/examples/plot_train_convert_predict.py
from sklearn.linear_model import LogisticRegression
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
# Train a scikit-learn model
model = LogisticRegression()
model.fit(X_train, y_train)
# Define the input type for conversion
initial_type = [('float_input', FloatTensorType([None, X_train.shape[1]]))]
# Convert the trained model to ONNX format
onnx_model = convert_sklearn(model, initial_types=initial_type)
This snippet from the example script demonstrates the core three-step workflow: train a scikit-learn estimator, define input types using FloatTensorType, and convert to ONNX with convert_sklearn().
ONNX Runtime validation inference (plot_train_convert_predict.py)
# docs/python/examples/plot_train_convert_predict.py
import onnxruntime as rt
# Load the converted ONNX model into an InferenceSession
session = rt.InferenceSession(onnx_model.SerializeToString())
# Run inference using the ONNX Runtime session
input_name = session.get_inputs()[0].name
pred_onnx = session.run(None, {input_name: X_test.astype(numpy.float32)})
After conversion, the ONNX model is loaded into an InferenceSession and used for prediction, allowing direct comparison against model.predict(X_test) from scikit-learn.
FloatTensorType for input specification (plot_train_convert_predict.py)
# docs/python/examples/plot_train_convert_predict.py
from skl2onnx.common.data_types import FloatTensorType
# FloatTensorType([None, n_features]) specifies:
# - None: dynamic batch dimension
# - n_features: fixed feature count matching training data
initial_type = [('float_input', FloatTensorType([None, 4]))]
Common Errors
| Error | Cause | Solution |
|---|---|---|
ModuleNotFoundError: No module named 'skl2onnx' |
skl2onnx not installed | Run pip install skl2onnx
|
RuntimeError: Unsupported sklearn operator |
The sklearn estimator type is not supported by skl2onnx | Check the skl2onnx supported operators list and update skl2onnx to the latest version |
InvalidArgument: input tensor data type mismatch |
Input data passed as float64 instead of float32 | Cast input: X_test.astype(numpy.float32)
|
RuntimeError: shape mismatch |
Number of features in input does not match the model | Ensure FloatTensorType([None, n_features]) matches training data dimensions
|
ValueError: initial_types cannot be None |
convert_sklearn() called without specifying input types |
Always provide initial_types parameter with FloatTensorType definitions
|
ImportError: cannot import name 'FloatTensorType' |
Outdated version of skl2onnx | Upgrade: pip install --upgrade skl2onnx
|
Compatibility Notes
- scikit-learn versions: The skl2onnx converter supports scikit-learn 1.0 and later. Some newer estimators or parameters may require updating skl2onnx to the latest version.
- ONNX opset versions: The default target opset for conversion depends on the skl2onnx version. You can specify a target opset explicitly:
convert_sklearn(model, initial_types=initial_type, target_opset=18). - Data types: ONNX Runtime expects
float32input by default. Always cast NumPy arrays tofloat32before running inference, even if scikit-learn usedfloat64internally. - Pipelines: Complete scikit-learn
Pipelineobjects (including preprocessors likeStandardScaler,OneHotEncoder) can be converted as a single unit. - Custom estimators: Custom sklearn-compatible estimators require registering a custom converter with skl2onnx before conversion.
- Cross-platform: The converted ONNX model is platform-independent and can be deployed on any OS or runtime that supports ONNX Runtime.
Related Pages
- Implementation:Microsoft_Onnxruntime_Sklearn_Model_Training
- Implementation:Microsoft_Onnxruntime_FloatTensorType_Init
- Implementation:Microsoft_Onnxruntime_Convert_Sklearn
- Implementation:Microsoft_Onnxruntime_InferenceSession_Run_For_Validation
- Implementation:Microsoft_Onnxruntime_InferenceSession_Run_For_Prediction