Implementation:Kubeflow Pipelines XGBoost Train On CSV Op
| Sources | Domains | Last Updated |
|---|---|---|
| Kubeflow Pipelines, XGBoost | Machine_Learning, Gradient_Boosting | 2026-02-13 |
Overview
Reusable KFP component for training XGBoost models on CSV-formatted datasets, loaded from a remote YAML component definition.
Description
This is a Wrapper Doc. xgboost_train_on_csv_op is loaded via components.load_component_from_url() from a remote YAML spec. It wraps XGBoost's training API in a containerized component. The component accepts CSV training data, a label column index, an objective function, and number of iterations. It outputs a serialized model artifact. It also supports incremental training via an optional starting_model parameter.
External Reference
Component YAML hosted at GitHub (specific commit SHA for reproducibility).
Usage
Use within a KFP pipeline to train an XGBoost model on CSV data. Load the component at module level, then call within the pipeline function.
Code Reference
Source Location: Repository: kubeflow/pipelines, File: samples/core/XGBoost/xgboost_sample.py (L10-12 loading, L35-40 invocation)
Signature:
xgboost_train_on_csv_op = components.load_component_from_url(
'https://raw.githubusercontent.com/kubeflow/pipelines/.../components/XGBoost/Train/component.yaml'
)
# Invocation:
xgboost_train_on_csv_op(
training_data: CSV, # Input CSV dataset
label_column: int, # Zero-based index of target column
objective: str, # XGBoost objective (e.g., 'reg:squarederror')
num_iterations: int, # Number of boosting rounds
starting_model: XGBoostModel = None, # Optional: existing model for incremental training
) -> outputs['model'] # Serialized XGBoost model artifact
Import:
from kfp import components
I/O Contract
Inputs:
| Name | Type | Required | Description |
|---|---|---|---|
| training_data | CSV | Yes | Training dataset |
| label_column | int | Yes | Target column index (zero-based) |
| objective | str | Yes | XGBoost objective (e.g., reg:squarederror) |
| num_iterations | int | Yes | Number of boosting rounds |
| starting_model | XGBoostModel | No | Existing model for incremental training |
Outputs:
| Name | Type | Description |
|---|---|---|
| model | XGBoostModel | Serialized model artifact |
Usage Examples
Example 1 -- Initial training (xgboost_sample.py):
model_trained_on_csv = xgboost_train_on_csv_op(
training_data=training_data_csv,
label_column=0,
objective='reg:squarederror',
num_iterations=200,
).set_memory_limit('1Gi').outputs['model']
Example 2 -- Incremental training (train_until_good.py):
model = xgboost_train_on_csv_op(
training_data=training_data,
starting_model=starting_model,
label_column=0,
objective='reg:squarederror',
num_iterations=50,
).outputs['model']