Implementation:Kubeflow Pipelines XGBoost Cross Format Predict

Kubeflow Pipelines Machine_Learning Validation Last Updated: 2026-02-13

Overview

Pattern for cross-format prediction using XGBoost predict components on mismatched data/model format pairs.

Description

Wrapper Doc. Uses xgboost_predict_on_parquet_op with a CSV-trained model and xgboost_predict_on_csv_op with a Parquet-trained model to validate format independence. By deliberately crossing the data format and the model's training format, this pattern exposes any format-specific assumptions in the data loading or prediction pipeline.

Code Reference

Source: samples/core/XGBoost/xgboost_sample.py (L19-21 loading, L66-76 invocations)

Import:

from kfp import components

I/O Contract

Inputs

Parameter	Type	Required	Description
data	CSV or Parquet	Yes	Test data in the opposite format from the model's training data
model	XGBoostModel	Yes	Model trained on the opposite format (CSV-trained or Parquet-trained)
label_column_name / label_column	str / int	Yes	Column identifier (name for Parquet, index for CSV)

Outputs

Name	Type	Description
prediction	Predictions	Prediction results validating cross-format compatibility

Usage Examples

# Cross-prediction 1: Parquet data + CSV-trained model
xgboost_predict_on_parquet_op(
    data=training_data_parquet,
    model=model_trained_on_csv,
    label_column_name='tips',
).set_memory_limit('1Gi')

# Cross-prediction 2: CSV data + Parquet-trained model
xgboost_predict_on_csv_op(
    data=training_data_csv,
    model=model_trained_on_parquet,
    label_column=0,
).set_memory_limit('1Gi')

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment