Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Kubeflow Pipelines XGBoost Cross Format Predict

From Leeroopedia

Kubeflow Pipelines Machine_Learning Validation Last Updated: 2026-02-13

Overview

Pattern for cross-format prediction using XGBoost predict components on mismatched data/model format pairs.

Description

Wrapper Doc. Uses xgboost_predict_on_parquet_op with a CSV-trained model and xgboost_predict_on_csv_op with a Parquet-trained model to validate format independence. By deliberately crossing the data format and the model's training format, this pattern exposes any format-specific assumptions in the data loading or prediction pipeline.

Code Reference

Source: samples/core/XGBoost/xgboost_sample.py (L19-21 loading, L66-76 invocations)

Import:

from kfp import components

I/O Contract

Inputs

Parameter Type Required Description
data CSV or Parquet Yes Test data in the opposite format from the model's training data
model XGBoostModel Yes Model trained on the opposite format (CSV-trained or Parquet-trained)
label_column_name / label_column str / int Yes Column identifier (name for Parquet, index for CSV)

Outputs

Name Type Description
prediction Predictions Prediction results validating cross-format compatibility

Usage Examples

# Cross-prediction 1: Parquet data + CSV-trained model
xgboost_predict_on_parquet_op(
    data=training_data_parquet,
    model=model_trained_on_csv,
    label_column_name='tips',
).set_memory_limit('1Gi')

# Cross-prediction 2: CSV data + Parquet-trained model
xgboost_predict_on_csv_op(
    data=training_data_csv,
    model=model_trained_on_parquet,
    label_column=0,
).set_memory_limit('1Gi')

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment