Implementation:Kubeflow Pipelines XGBoost Train On Parquet Op
Appearance
Kubeflow Pipelines XGBoost Machine_Learning Data_Engineering Last Updated: 2026-02-13
Overview
Reusable KFP component for training XGBoost models on Parquet-formatted datasets.
Description
Wrapper Doc. Similar to the CSV training component, but accepts Parquet data and uses column names instead of indices. The label column is identified by its string name rather than a positional integer, making the interface more robust to schema evolution and column reordering.
Code Reference
Source: samples/core/XGBoost/xgboost_sample.py (L16-18 loading, L52-57 invocation)
Signature:
xgboost_train_on_parquet_op(
training_data: Parquet,
label_column_name: str, # Column name (not index)
objective: str,
num_iterations: int,
) -> outputs['model']
Import:
from kfp import components
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
| training_data | Parquet | Yes | Parquet-formatted training dataset |
| label_column_name | str | Yes | Name of the label column in the Parquet schema |
| objective | str | Yes | XGBoost training objective (e.g., reg:squarederror)
|
| num_iterations | int | Yes | Number of boosting rounds |
Outputs
| Name | Type | Description |
|---|---|---|
| model | XGBoostModel | Trained XGBoost model artifact |
Usage Examples
model_trained_on_parquet = xgboost_train_on_parquet_op(
training_data=training_data_parquet,
label_column_name='tips',
objective='reg:squarederror',
num_iterations=200,
).set_memory_limit('1Gi').outputs['model']
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment