Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Kubeflow Pipelines XGBoost Train On Parquet Op

From Leeroopedia

Kubeflow Pipelines XGBoost Machine_Learning Data_Engineering Last Updated: 2026-02-13

Overview

Reusable KFP component for training XGBoost models on Parquet-formatted datasets.

Description

Wrapper Doc. Similar to the CSV training component, but accepts Parquet data and uses column names instead of indices. The label column is identified by its string name rather than a positional integer, making the interface more robust to schema evolution and column reordering.

Code Reference

Source: samples/core/XGBoost/xgboost_sample.py (L16-18 loading, L52-57 invocation)

Signature:

xgboost_train_on_parquet_op(
    training_data: Parquet,
    label_column_name: str,   # Column name (not index)
    objective: str,
    num_iterations: int,
) -> outputs['model']

Import:

from kfp import components

I/O Contract

Inputs

Parameter Type Required Description
training_data Parquet Yes Parquet-formatted training dataset
label_column_name str Yes Name of the label column in the Parquet schema
objective str Yes XGBoost training objective (e.g., reg:squarederror)
num_iterations int Yes Number of boosting rounds

Outputs

Name Type Description
model XGBoostModel Trained XGBoost model artifact

Usage Examples

model_trained_on_parquet = xgboost_train_on_parquet_op(
    training_data=training_data_parquet,
    label_column_name='tips',
    objective='reg:squarederror',
    num_iterations=200,
).set_memory_limit('1Gi').outputs['model']

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment