Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:SeldonIO Seldon core Alibi Detect Training

From Leeroopedia
Revision as of 13:50, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/SeldonIO_Seldon_core_Alibi_Detect_Training.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Property Value
Implementation Name Alibi Detect Training
Type API Doc
Overview Concrete tools for training drift and outlier detectors provided by the alibi-detect library
Domains MLOps, Statistical_Testing, Anomaly_Detection
Implements Principle SeldonIO_Seldon_core_Drift_And_Outlier_Detection_Training
Source samples/examples/income_classifier/train.py:L127-279
External Dependencies alibi_detect (TabularDrift, OutlierVAE, save_detector), sklearn, tensorflow, joblib, numpy
Knowledge Sources Repo (https://github.com/SeldonIO/seldon-core), Doc (https://docs.seldon.io/projects/alibi-detect), Paper (https://arxiv.org/abs/2311.01096)
Last Updated 2026-02-13 00:00 GMT

Code Reference

Drift Detector Training (lines 126-148)

categories_per_feature = {f: None for f in list(category_map.keys())}
cd = TabularDrift(x_ref, p_val=.05, categories_per_feature=categories_per_feature)

preds = cd.predict(x_h0)
print('Drift? {}'.format(labels[preds['data']['is_drift']]))

from alibi_detect.utils.saving import save_detector
save_detector(cd, "./drift-detector")

Classifier Training (lines 156-185)

ordinal_transformer = Pipeline(
    steps=[
        ('imputer', SimpleImputer(strategy='median')),
        ('scaler', StandardScaler())
    ]
)
categorical_transformer = Pipeline(
    steps=[
        ('imputer', SimpleImputer(strategy='median')),
        ('onehot', OneHotEncoder(handle_unknown='ignore'))
    ]
)
preprocessor = ColumnTransformer(
    transformers=[
        ('num', ordinal_transformer, ordinal_features),
        ('cat', categorical_transformer, categorical_features)
    ],
    sparse_threshold=0
)
clf = RandomForestClassifier(n_estimators=50)
train_pipeline = Pipeline(
    steps=[('preprocessor', preprocessor), ('classifier', clf)]
)
train_pipeline.fit(x_ref, y_ref)
dump(train_pipeline, './classifier/model.joblib')

Outlier Detector Training (lines 206-279)

n_features = X_train.shape[1]
latent_dim = 2

encoder_net = tf.keras.Sequential([
    InputLayer(input_shape=(n_features,)),
    Dense(25, activation=tf.nn.relu),
    Dense(10, activation=tf.nn.relu),
    Dense(5, activation=tf.nn.relu)
])

decoder_net = tf.keras.Sequential([
    InputLayer(input_shape=(latent_dim,)),
    Dense(5, activation=tf.nn.relu),
    Dense(10, activation=tf.nn.relu),
    Dense(25, activation=tf.nn.relu),
    Dense(n_features, activation=None)
])

od = OutlierVAE(
    threshold=None,
    score_type='mse',
    encoder_net=encoder_net,
    decoder_net=decoder_net,
    latent_dim=latent_dim,
    samples=5
)
od.fit(X_train, loss_fn=tf.keras.losses.mse, epochs=5, verbose=True)
save_detector(od, "./outlier-detector")

Key Parameters

Component Parameter Value Description
TabularDrift x_ref training data Reference distribution for drift comparison
TabularDrift p_val 0.05 Significance threshold for drift hypothesis test
TabularDrift categories_per_feature dict Maps feature indices to category info (None = infer)
OutlierVAE threshold None Outlier threshold (None = auto-calibrate)
OutlierVAE score_type 'mse' Anomaly scoring method (mean squared error)
OutlierVAE encoder_net Sequential Custom encoder architecture (n_features -> 25 -> 10 -> 5)
OutlierVAE decoder_net Sequential Custom decoder architecture (latent_dim -> 5 -> 10 -> 25 -> n_features)
OutlierVAE latent_dim 2 Dimensionality of VAE latent space
OutlierVAE samples 5 Number of latent samples for reconstruction
OutlierVAE.fit epochs 5 Training epochs for the VAE

I/O Contract

Inputs

Input Type Description
x_ref numpy.ndarray Reference training data (Adult Census dataset features)
y_ref numpy.ndarray Reference training labels
X_train numpy.ndarray Preprocessed training features for outlier detector
category_map dict Mapping of feature indices to categorical feature metadata
ordinal_features list Indices of continuous/ordinal features
categorical_features list Indices of categorical features

Outputs

Output Format Description
drift-detector/ Directory (alibi-detect format) Serialized TabularDrift detector
outlier-detector/ Directory (alibi-detect format) Serialized OutlierVAE detector
classifier/model.joblib Joblib file Serialized sklearn RandomForest pipeline
preprocessor/model.joblib Joblib file Serialized sklearn ColumnTransformer pipeline

Usage Examples

Full Training Pipeline

# Run the training script to produce all four artifacts
cd samples/examples/income_classifier
python train.py

This produces four output directories:

./classifier/model.joblib         # sklearn RandomForestClassifier pipeline
./preprocessor/model.joblib       # sklearn ColumnTransformer preprocessor
./drift-detector/                 # alibi-detect TabularDrift detector
./outlier-detector/               # alibi-detect OutlierVAE detector

Verifying Drift Detector

from alibi_detect.utils.saving import load_detector

cd = load_detector("./drift-detector")
preds = cd.predict(x_test_batch)
print('Drift detected:', preds['data']['is_drift'])
print('p-value:', preds['data']['p_val'])

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment