Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Kubeflow Pipelines XGBoost Train On CSV Op

From Leeroopedia
Sources Domains Last Updated
Kubeflow Pipelines, XGBoost Machine_Learning, Gradient_Boosting 2026-02-13

Overview

Reusable KFP component for training XGBoost models on CSV-formatted datasets, loaded from a remote YAML component definition.

Description

This is a Wrapper Doc. xgboost_train_on_csv_op is loaded via components.load_component_from_url() from a remote YAML spec. It wraps XGBoost's training API in a containerized component. The component accepts CSV training data, a label column index, an objective function, and number of iterations. It outputs a serialized model artifact. It also supports incremental training via an optional starting_model parameter.

External Reference

Component YAML hosted at GitHub (specific commit SHA for reproducibility).

Usage

Use within a KFP pipeline to train an XGBoost model on CSV data. Load the component at module level, then call within the pipeline function.

Code Reference

Source Location: Repository: kubeflow/pipelines, File: samples/core/XGBoost/xgboost_sample.py (L10-12 loading, L35-40 invocation)

Signature:

xgboost_train_on_csv_op = components.load_component_from_url(
    'https://raw.githubusercontent.com/kubeflow/pipelines/.../components/XGBoost/Train/component.yaml'
)
# Invocation:
xgboost_train_on_csv_op(
    training_data: CSV,      # Input CSV dataset
    label_column: int,        # Zero-based index of target column
    objective: str,           # XGBoost objective (e.g., 'reg:squarederror')
    num_iterations: int,      # Number of boosting rounds
    starting_model: XGBoostModel = None,  # Optional: existing model for incremental training
) -> outputs['model']  # Serialized XGBoost model artifact

Import:

from kfp import components

I/O Contract

Inputs:

Name Type Required Description
training_data CSV Yes Training dataset
label_column int Yes Target column index (zero-based)
objective str Yes XGBoost objective (e.g., reg:squarederror)
num_iterations int Yes Number of boosting rounds
starting_model XGBoostModel No Existing model for incremental training

Outputs:

Name Type Description
model XGBoostModel Serialized model artifact

Usage Examples

Example 1 -- Initial training (xgboost_sample.py):

model_trained_on_csv = xgboost_train_on_csv_op(
    training_data=training_data_csv,
    label_column=0,
    objective='reg:squarederror',
    num_iterations=200,
).set_memory_limit('1Gi').outputs['model']

Example 2 -- Incremental training (train_until_good.py):

model = xgboost_train_on_csv_op(
    training_data=training_data,
    starting_model=starting_model,
    label_column=0,
    objective='reg:squarederror',
    num_iterations=50,
).outputs['model']

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment