Implementation:SeldonIO Seldon core Alibi Detect Training
Appearance
| Property | Value |
|---|---|
| Implementation Name | Alibi Detect Training |
| Type | API Doc |
| Overview | Concrete tools for training drift and outlier detectors provided by the alibi-detect library |
| Domains | MLOps, Statistical_Testing, Anomaly_Detection |
| Implements Principle | SeldonIO_Seldon_core_Drift_And_Outlier_Detection_Training |
| Source | samples/examples/income_classifier/train.py:L127-279 |
| External Dependencies | alibi_detect (TabularDrift, OutlierVAE, save_detector), sklearn, tensorflow, joblib, numpy |
| Knowledge Sources | Repo (https://github.com/SeldonIO/seldon-core), Doc (https://docs.seldon.io/projects/alibi-detect), Paper (https://arxiv.org/abs/2311.01096) |
| Last Updated | 2026-02-13 00:00 GMT |
Code Reference
Drift Detector Training (lines 126-148)
categories_per_feature = {f: None for f in list(category_map.keys())}
cd = TabularDrift(x_ref, p_val=.05, categories_per_feature=categories_per_feature)
preds = cd.predict(x_h0)
print('Drift? {}'.format(labels[preds['data']['is_drift']]))
from alibi_detect.utils.saving import save_detector
save_detector(cd, "./drift-detector")
Classifier Training (lines 156-185)
ordinal_transformer = Pipeline(
steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())
]
)
categorical_transformer = Pipeline(
steps=[
('imputer', SimpleImputer(strategy='median')),
('onehot', OneHotEncoder(handle_unknown='ignore'))
]
)
preprocessor = ColumnTransformer(
transformers=[
('num', ordinal_transformer, ordinal_features),
('cat', categorical_transformer, categorical_features)
],
sparse_threshold=0
)
clf = RandomForestClassifier(n_estimators=50)
train_pipeline = Pipeline(
steps=[('preprocessor', preprocessor), ('classifier', clf)]
)
train_pipeline.fit(x_ref, y_ref)
dump(train_pipeline, './classifier/model.joblib')
Outlier Detector Training (lines 206-279)
n_features = X_train.shape[1]
latent_dim = 2
encoder_net = tf.keras.Sequential([
InputLayer(input_shape=(n_features,)),
Dense(25, activation=tf.nn.relu),
Dense(10, activation=tf.nn.relu),
Dense(5, activation=tf.nn.relu)
])
decoder_net = tf.keras.Sequential([
InputLayer(input_shape=(latent_dim,)),
Dense(5, activation=tf.nn.relu),
Dense(10, activation=tf.nn.relu),
Dense(25, activation=tf.nn.relu),
Dense(n_features, activation=None)
])
od = OutlierVAE(
threshold=None,
score_type='mse',
encoder_net=encoder_net,
decoder_net=decoder_net,
latent_dim=latent_dim,
samples=5
)
od.fit(X_train, loss_fn=tf.keras.losses.mse, epochs=5, verbose=True)
save_detector(od, "./outlier-detector")
Key Parameters
| Component | Parameter | Value | Description |
|---|---|---|---|
| TabularDrift | x_ref | training data | Reference distribution for drift comparison |
| TabularDrift | p_val | 0.05 | Significance threshold for drift hypothesis test |
| TabularDrift | categories_per_feature | dict | Maps feature indices to category info (None = infer) |
| OutlierVAE | threshold | None | Outlier threshold (None = auto-calibrate) |
| OutlierVAE | score_type | 'mse' | Anomaly scoring method (mean squared error) |
| OutlierVAE | encoder_net | Sequential | Custom encoder architecture (n_features -> 25 -> 10 -> 5) |
| OutlierVAE | decoder_net | Sequential | Custom decoder architecture (latent_dim -> 5 -> 10 -> 25 -> n_features) |
| OutlierVAE | latent_dim | 2 | Dimensionality of VAE latent space |
| OutlierVAE | samples | 5 | Number of latent samples for reconstruction |
| OutlierVAE.fit | epochs | 5 | Training epochs for the VAE |
I/O Contract
Inputs
| Input | Type | Description |
|---|---|---|
| x_ref | numpy.ndarray | Reference training data (Adult Census dataset features) |
| y_ref | numpy.ndarray | Reference training labels |
| X_train | numpy.ndarray | Preprocessed training features for outlier detector |
| category_map | dict | Mapping of feature indices to categorical feature metadata |
| ordinal_features | list | Indices of continuous/ordinal features |
| categorical_features | list | Indices of categorical features |
Outputs
| Output | Format | Description |
|---|---|---|
| drift-detector/ | Directory (alibi-detect format) | Serialized TabularDrift detector |
| outlier-detector/ | Directory (alibi-detect format) | Serialized OutlierVAE detector |
| classifier/model.joblib | Joblib file | Serialized sklearn RandomForest pipeline |
| preprocessor/model.joblib | Joblib file | Serialized sklearn ColumnTransformer pipeline |
Usage Examples
Full Training Pipeline
# Run the training script to produce all four artifacts
cd samples/examples/income_classifier
python train.py
This produces four output directories:
./classifier/model.joblib # sklearn RandomForestClassifier pipeline
./preprocessor/model.joblib # sklearn ColumnTransformer preprocessor
./drift-detector/ # alibi-detect TabularDrift detector
./outlier-detector/ # alibi-detect OutlierVAE detector
Verifying Drift Detector
from alibi_detect.utils.saving import load_detector
cd = load_detector("./drift-detector")
preds = cd.predict(x_test_batch)
print('Drift detected:', preds['data']['is_drift'])
print('p-value:', preds['data']['p_val'])
Related Pages
- SeldonIO_Seldon_core_Drift_And_Outlier_Detection_Training (principle) - Statistical methods underpinning drift and outlier detection training
- SeldonIO_Seldon_core_Seldon_Model_Load_For_Monitoring (next step) - Deploying trained detector artifacts as Seldon models
- SeldonIO_Seldon_core_Seldon_Pipeline_CRD_Monitoring (pipeline usage) - Composing deployed detectors into a monitoring pipeline
- Environment:SeldonIO_Seldon_core_Python_ML_Dependencies_Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment