Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:SeldonIO Seldon core Alibi Detect Training

From Leeroopedia
Property Value
Implementation Name Alibi Detect Training
Type API Doc
Overview Concrete tools for training drift and outlier detectors provided by the alibi-detect library
Domains MLOps, Statistical_Testing, Anomaly_Detection
Implements Principle SeldonIO_Seldon_core_Drift_And_Outlier_Detection_Training
Source samples/examples/income_classifier/train.py:L127-279
External Dependencies alibi_detect (TabularDrift, OutlierVAE, save_detector), sklearn, tensorflow, joblib, numpy
Knowledge Sources Repo (https://github.com/SeldonIO/seldon-core), Doc (https://docs.seldon.io/projects/alibi-detect), Paper (https://arxiv.org/abs/2311.01096)
Last Updated 2026-02-13 00:00 GMT

Code Reference

Drift Detector Training (lines 126-148)

categories_per_feature = {f: None for f in list(category_map.keys())}
cd = TabularDrift(x_ref, p_val=.05, categories_per_feature=categories_per_feature)

preds = cd.predict(x_h0)
print('Drift? {}'.format(labels[preds['data']['is_drift']]))

from alibi_detect.utils.saving import save_detector
save_detector(cd, "./drift-detector")

Classifier Training (lines 156-185)

ordinal_transformer = Pipeline(
    steps=[
        ('imputer', SimpleImputer(strategy='median')),
        ('scaler', StandardScaler())
    ]
)
categorical_transformer = Pipeline(
    steps=[
        ('imputer', SimpleImputer(strategy='median')),
        ('onehot', OneHotEncoder(handle_unknown='ignore'))
    ]
)
preprocessor = ColumnTransformer(
    transformers=[
        ('num', ordinal_transformer, ordinal_features),
        ('cat', categorical_transformer, categorical_features)
    ],
    sparse_threshold=0
)
clf = RandomForestClassifier(n_estimators=50)
train_pipeline = Pipeline(
    steps=[('preprocessor', preprocessor), ('classifier', clf)]
)
train_pipeline.fit(x_ref, y_ref)
dump(train_pipeline, './classifier/model.joblib')

Outlier Detector Training (lines 206-279)

n_features = X_train.shape[1]
latent_dim = 2

encoder_net = tf.keras.Sequential([
    InputLayer(input_shape=(n_features,)),
    Dense(25, activation=tf.nn.relu),
    Dense(10, activation=tf.nn.relu),
    Dense(5, activation=tf.nn.relu)
])

decoder_net = tf.keras.Sequential([
    InputLayer(input_shape=(latent_dim,)),
    Dense(5, activation=tf.nn.relu),
    Dense(10, activation=tf.nn.relu),
    Dense(25, activation=tf.nn.relu),
    Dense(n_features, activation=None)
])

od = OutlierVAE(
    threshold=None,
    score_type='mse',
    encoder_net=encoder_net,
    decoder_net=decoder_net,
    latent_dim=latent_dim,
    samples=5
)
od.fit(X_train, loss_fn=tf.keras.losses.mse, epochs=5, verbose=True)
save_detector(od, "./outlier-detector")

Key Parameters

Component Parameter Value Description
TabularDrift x_ref training data Reference distribution for drift comparison
TabularDrift p_val 0.05 Significance threshold for drift hypothesis test
TabularDrift categories_per_feature dict Maps feature indices to category info (None = infer)
OutlierVAE threshold None Outlier threshold (None = auto-calibrate)
OutlierVAE score_type 'mse' Anomaly scoring method (mean squared error)
OutlierVAE encoder_net Sequential Custom encoder architecture (n_features -> 25 -> 10 -> 5)
OutlierVAE decoder_net Sequential Custom decoder architecture (latent_dim -> 5 -> 10 -> 25 -> n_features)
OutlierVAE latent_dim 2 Dimensionality of VAE latent space
OutlierVAE samples 5 Number of latent samples for reconstruction
OutlierVAE.fit epochs 5 Training epochs for the VAE

I/O Contract

Inputs

Input Type Description
x_ref numpy.ndarray Reference training data (Adult Census dataset features)
y_ref numpy.ndarray Reference training labels
X_train numpy.ndarray Preprocessed training features for outlier detector
category_map dict Mapping of feature indices to categorical feature metadata
ordinal_features list Indices of continuous/ordinal features
categorical_features list Indices of categorical features

Outputs

Output Format Description
drift-detector/ Directory (alibi-detect format) Serialized TabularDrift detector
outlier-detector/ Directory (alibi-detect format) Serialized OutlierVAE detector
classifier/model.joblib Joblib file Serialized sklearn RandomForest pipeline
preprocessor/model.joblib Joblib file Serialized sklearn ColumnTransformer pipeline

Usage Examples

Full Training Pipeline

# Run the training script to produce all four artifacts
cd samples/examples/income_classifier
python train.py

This produces four output directories:

./classifier/model.joblib         # sklearn RandomForestClassifier pipeline
./preprocessor/model.joblib       # sklearn ColumnTransformer preprocessor
./drift-detector/                 # alibi-detect TabularDrift detector
./outlier-detector/               # alibi-detect OutlierVAE detector

Verifying Drift Detector

from alibi_detect.utils.saving import load_detector

cd = load_detector("./drift-detector")
preds = cd.predict(x_test_batch)
print('Drift detected:', preds['data']['is_drift'])
print('p-value:', preds['data']['p_val'])

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment