Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:SeldonIO Seldon core Sklearn Pipeline Train And Serialize

From Leeroopedia
Property Value
Implementation Name Sklearn_Pipeline_Train_And_Serialize
Type API Doc
Overview Concrete tool for training and serializing sklearn models provided by the scikit-learn and joblib libraries.
Implements Principle SeldonIO_Seldon_core_Model_Artifact_Preparation
Workflow Model_Deployment
Domains MLOps, Model_Serialization
Source samples/scripts/models/iris/train.py:L1-25
External Dependencies sklearn, joblib, mlserver_sklearn
Last Updated 2026-02-13 00:00 GMT

Description

This implementation demonstrates how to train a scikit-learn pipeline and serialize it using joblib for deployment on Seldon Core 2 with MLServer. The training script creates a LogisticRegression classifier wrapped in a Pipeline, fits it on the Iris dataset, and persists the fitted pipeline to a .joblib file. The resulting artifact, combined with a model-settings.json configuration, is ready for upload to a model storage location and deployment via the Seldon Model CRD.

Code Reference

Source: samples/scripts/models/iris/train.py:L1-25

import joblib
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn import datasets

def main():
    clf = LogisticRegression(solver="liblinear", multi_class='ovr')
    p = Pipeline([("clf", clf)])
    p.fit(X, y)
    filename_p = "model.joblib"
    joblib.dump(p, filename_p)

if __name__ == "__main__":
    iris = datasets.load_iris()
    X, y = iris.data, iris.target
    main()

model-settings.json:

{
  "name": "iris",
  "implementation": "mlserver_sklearn.SKLearnModel",
  "parameters": {
    "uri": "./model.joblib",
    "version": "v0.1.0"
  }
}

Key Parameters

Parameter Value Description
solver "liblinear" Optimization algorithm for LogisticRegression; suitable for small datasets
multi_class "ovr" One-vs-rest strategy for multi-class classification
implementation "mlserver_sklearn.SKLearnModel" MLServer runtime class for scikit-learn models
uri "./model.joblib" Relative path to the serialized model artifact
version "v0.1.0" Model version identifier for tracking

I/O Contract

Inputs

Input Type Description
Raw training data sklearn Iris dataset X shape [n_samples, 4] (sepal length, sepal width, petal length, petal width), y shape [n_samples] (target class: 0, 1, or 2)

Outputs

Output Type Description
model.joblib Serialized artifact Joblib-serialized sklearn Pipeline containing the fitted LogisticRegression classifier
model-settings.json Configuration file MLServer model settings specifying runtime implementation and artifact URI

Usage Examples

Training and Serializing the Model

# Run the training script to produce model.joblib
python samples/scripts/models/iris/train.py

# Verify the artifact was created
ls -la model.joblib

Uploading to Remote Storage

# Upload model artifact and settings to GCS
gsutil cp model.joblib gs://seldon-models/mlserver/iris/
gsutil cp model-settings.json gs://seldon-models/mlserver/iris/

Loading the Serialized Model Locally

import joblib

# Load the serialized pipeline
pipeline = joblib.load("model.joblib")

# Verify predictions work
predictions = pipeline.predict([[5.1, 3.5, 1.4, 0.2]])
print(predictions)  # [0]

Knowledge Sources

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment