Implementation:Kubeflow Pipelines XGBoost Cross Format Predict
Kubeflow Pipelines Machine_Learning Validation Last Updated: 2026-02-13
Overview
Pattern for cross-format prediction using XGBoost predict components on mismatched data/model format pairs.
Description
Wrapper Doc. Uses xgboost_predict_on_parquet_op with a CSV-trained model and xgboost_predict_on_csv_op with a Parquet-trained model to validate format independence. By deliberately crossing the data format and the model's training format, this pattern exposes any format-specific assumptions in the data loading or prediction pipeline.
Code Reference
Source: samples/core/XGBoost/xgboost_sample.py (L19-21 loading, L66-76 invocations)
Import:
from kfp import components
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
| data | CSV or Parquet | Yes | Test data in the opposite format from the model's training data |
| model | XGBoostModel | Yes | Model trained on the opposite format (CSV-trained or Parquet-trained) |
| label_column_name / label_column | str / int | Yes | Column identifier (name for Parquet, index for CSV) |
Outputs
| Name | Type | Description |
|---|---|---|
| prediction | Predictions | Prediction results validating cross-format compatibility |
Usage Examples
# Cross-prediction 1: Parquet data + CSV-trained model
xgboost_predict_on_parquet_op(
data=training_data_parquet,
model=model_trained_on_csv,
label_column_name='tips',
).set_memory_limit('1Gi')
# Cross-prediction 2: CSV data + Parquet-trained model
xgboost_predict_on_csv_op(
data=training_data_csv,
model=model_trained_on_parquet,
label_column=0,
).set_memory_limit('1Gi')