Principle:Kubeflow Pipelines XGBoost Model Training
| Sources | Domains | Last Updated |
|---|---|---|
| XGBoost Documentation, XGBoost | Machine_Learning, Gradient_Boosting | 2026-02-13 |
Overview
A supervised learning technique that trains gradient-boosted decision tree models on tabular data to produce predictions for regression or classification tasks.
Description
XGBoost (eXtreme Gradient Boosting) is an optimized implementation of gradient boosting. It iteratively adds decision trees that correct the errors of the ensemble. Key parameters include the objective function (e.g., reg:squarederror for regression), the number of boosting iterations, and the label column. In pipeline contexts, XGBoost training is wrapped as a reusable component that accepts CSV data and produces a serialized model artifact.
Usage
Use when training models on structured/tabular data for regression or classification. XGBoost is particularly effective for medium-sized datasets with well-defined features.
Theoretical Basis
Gradient boosting works by sequentially fitting trees to the negative gradient of the loss function. XGBoost adds regularization (L1/L2) and uses a second-order Taylor expansion of the loss for split finding.
Objective = loss + regularization
Pseudocode:
- Initialize prediction
- For each iteration:
- Compute gradients (first and second order)
- Fit a decision tree to the gradients
- Update the ensemble prediction