Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Kubeflow Pipelines XGBoost Train On Parquet Op

From Leeroopedia

Kubeflow Pipelines XGBoost Machine_Learning Data_Engineering Last Updated: 2026-02-13

Overview

Reusable KFP component for training XGBoost models on Parquet-formatted datasets.

Description

Wrapper Doc. Similar to the CSV training component, but accepts Parquet data and uses column names instead of indices. The label column is identified by its string name rather than a positional integer, making the interface more robust to schema evolution and column reordering.

Code Reference

Source: samples/core/XGBoost/xgboost_sample.py (L16-18 loading, L52-57 invocation)

Signature:

xgboost_train_on_parquet_op(
    training_data: Parquet,
    label_column_name: str,   # Column name (not index)
    objective: str,
    num_iterations: int,
) -> outputs['model']

Import:

from kfp import components

I/O Contract

Inputs

Parameter Type Required Description
training_data Parquet Yes Parquet-formatted training dataset
label_column_name str Yes Name of the label column in the Parquet schema
objective str Yes XGBoost training objective (e.g., reg:squarederror)
num_iterations int Yes Number of boosting rounds

Outputs

Name Type Description
model XGBoostModel Trained XGBoost model artifact

Usage Examples

model_trained_on_parquet = xgboost_train_on_parquet_op(
    training_data=training_data_parquet,
    label_column_name='tips',
    objective='reg:squarederror',
    num_iterations=200,
).set_memory_limit('1Gi').outputs['model']

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment