Implementation:Kubeflow Pipelines XGBoost Train On CSV Op

Sources	Domains	Last Updated
Kubeflow Pipelines, XGBoost	Machine_Learning, Gradient_Boosting	2026-02-13

Overview

Reusable KFP component for training XGBoost models on CSV-formatted datasets, loaded from a remote YAML component definition.

Description

This is a Wrapper Doc. xgboost_train_on_csv_op is loaded via components.load_component_from_url() from a remote YAML spec. It wraps XGBoost's training API in a containerized component. The component accepts CSV training data, a label column index, an objective function, and number of iterations. It outputs a serialized model artifact. It also supports incremental training via an optional starting_model parameter.

External Reference

Component YAML hosted at GitHub (specific commit SHA for reproducibility).

Usage

Use within a KFP pipeline to train an XGBoost model on CSV data. Load the component at module level, then call within the pipeline function.

Code Reference

Source Location: Repository: kubeflow/pipelines, File: samples/core/XGBoost/xgboost_sample.py (L10-12 loading, L35-40 invocation)

Signature:

xgboost_train_on_csv_op = components.load_component_from_url(
    'https://raw.githubusercontent.com/kubeflow/pipelines/.../components/XGBoost/Train/component.yaml'
)
# Invocation:
xgboost_train_on_csv_op(
    training_data: CSV,      # Input CSV dataset
    label_column: int,        # Zero-based index of target column
    objective: str,           # XGBoost objective (e.g., 'reg:squarederror')
    num_iterations: int,      # Number of boosting rounds
    starting_model: XGBoostModel = None,  # Optional: existing model for incremental training
) -> outputs['model']  # Serialized XGBoost model artifact

Import:

from kfp import components

I/O Contract

Inputs:

Name	Type	Required	Description
training_data	CSV	Yes	Training dataset
label_column	int	Yes	Target column index (zero-based)
objective	str	Yes	XGBoost objective (e.g., reg:squarederror)
num_iterations	int	Yes	Number of boosting rounds
starting_model	XGBoostModel	No	Existing model for incremental training

Outputs:

Name	Type	Description
model	XGBoostModel	Serialized model artifact

Usage Examples

Example 1 -- Initial training (xgboost_sample.py):

model_trained_on_csv = xgboost_train_on_csv_op(
    training_data=training_data_csv,
    label_column=0,
    objective='reg:squarederror',
    num_iterations=200,
).set_memory_limit('1Gi').outputs['model']

Example 2 -- Incremental training (train_until_good.py):

model = xgboost_train_on_csv_op(
    training_data=training_data,
    starting_model=starting_model,
    label_column=0,
    objective='reg:squarederror',
    num_iterations=50,
).outputs['model']

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment