Implementation:SeldonIO Seldon core Sklearn Pipeline Train And Serialize

Property	Value
Implementation Name	Sklearn_Pipeline_Train_And_Serialize
Type	API Doc
Overview	Concrete tool for training and serializing sklearn models provided by the scikit-learn and joblib libraries.
Implements Principle	SeldonIO_Seldon_core_Model_Artifact_Preparation
Workflow	Model_Deployment
Domains	MLOps, Model_Serialization
Source	`samples/scripts/models/iris/train.py:L1-25`
External Dependencies	sklearn, joblib, mlserver_sklearn
Last Updated	2026-02-13 00:00 GMT

Description

This implementation demonstrates how to train a scikit-learn pipeline and serialize it using joblib for deployment on Seldon Core 2 with MLServer. The training script creates a LogisticRegression classifier wrapped in a Pipeline, fits it on the Iris dataset, and persists the fitted pipeline to a .joblib file. The resulting artifact, combined with a model-settings.json configuration, is ready for upload to a model storage location and deployment via the Seldon Model CRD.

Code Reference

Source: samples/scripts/models/iris/train.py:L1-25

import joblib
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn import datasets

def main():
    clf = LogisticRegression(solver="liblinear", multi_class='ovr')
    p = Pipeline([("clf", clf)])
    p.fit(X, y)
    filename_p = "model.joblib"
    joblib.dump(p, filename_p)

if __name__ == "__main__":
    iris = datasets.load_iris()
    X, y = iris.data, iris.target
    main()

model-settings.json:

{
  "name": "iris",
  "implementation": "mlserver_sklearn.SKLearnModel",
  "parameters": {
    "uri": "./model.joblib",
    "version": "v0.1.0"
  }
}

Key Parameters

Parameter	Value	Description
`solver`	`"liblinear"`	Optimization algorithm for LogisticRegression; suitable for small datasets
`multi_class`	`"ovr"`	One-vs-rest strategy for multi-class classification
`implementation`	`"mlserver_sklearn.SKLearnModel"`	MLServer runtime class for scikit-learn models
`uri`	`"./model.joblib"`	Relative path to the serialized model artifact
`version`	`"v0.1.0"`	Model version identifier for tracking

I/O Contract

Inputs

Input	Type	Description
Raw training data	sklearn Iris dataset	X shape [n_samples, 4] (sepal length, sepal width, petal length, petal width), y shape [n_samples] (target class: 0, 1, or 2)

Outputs

Output	Type	Description
`model.joblib`	Serialized artifact	Joblib-serialized sklearn Pipeline containing the fitted LogisticRegression classifier
`model-settings.json`	Configuration file	MLServer model settings specifying runtime implementation and artifact URI

Usage Examples

Training and Serializing the Model

# Run the training script to produce model.joblib
python samples/scripts/models/iris/train.py

# Verify the artifact was created
ls -la model.joblib

Uploading to Remote Storage

# Upload model artifact and settings to GCS
gsutil cp model.joblib gs://seldon-models/mlserver/iris/
gsutil cp model-settings.json gs://seldon-models/mlserver/iris/

Loading the Serialized Model Locally

import joblib

# Load the serialized pipeline
pipeline = joblib.load("model.joblib")

# Verify predictions work
predictions = pipeline.predict([[5.1, 3.5, 1.4, 0.2]])
print(predictions)  # [0]

Knowledge Sources

Repository: https://github.com/SeldonIO/seldon-core
Documentation: https://mlserver.readthedocs.io

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment