Implementation:Kubeflow Pipelines XGBoost Train On Parquet Op

Kubeflow Pipelines XGBoost Machine_Learning Data_Engineering Last Updated: 2026-02-13

Overview

Reusable KFP component for training XGBoost models on Parquet-formatted datasets.

Description

Wrapper Doc. Similar to the CSV training component, but accepts Parquet data and uses column names instead of indices. The label column is identified by its string name rather than a positional integer, making the interface more robust to schema evolution and column reordering.

Code Reference

Source: samples/core/XGBoost/xgboost_sample.py (L16-18 loading, L52-57 invocation)

Signature:

xgboost_train_on_parquet_op(
    training_data: Parquet,
    label_column_name: str,   # Column name (not index)
    objective: str,
    num_iterations: int,
) -> outputs['model']

Import:

from kfp import components

I/O Contract

Inputs

Parameter	Type	Required	Description
training_data	Parquet	Yes	Parquet-formatted training dataset
label_column_name	str	Yes	Name of the label column in the Parquet schema
objective	str	Yes	XGBoost training objective (e.g., `reg:squarederror`)
num_iterations	int	Yes	Number of boosting rounds

Outputs

Name	Type	Description
model	XGBoostModel	Trained XGBoost model artifact

Usage Examples

model_trained_on_parquet = xgboost_train_on_parquet_op(
    training_data=training_data_parquet,
    label_column_name='tips',
    objective='reg:squarederror',
    num_iterations=200,
).set_memory_limit('1Gi').outputs['model']

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment