Principle:Scikit learn Scikit learn Decision Tree Learning
| Knowledge Sources | |
|---|---|
| Domains | Supervised Learning, Interpretable Models |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Decision tree learning constructs a tree-structured model that makes predictions by recursively partitioning the feature space along axis-aligned splits, yielding an interpretable sequence of if-then rules.
Description
Decision trees learn a hierarchy of binary decisions, each based on a single feature threshold, to partition the input space into regions with homogeneous target values. They solve both classification and regression problems while producing models that are inherently interpretable and easy to visualize. Trees handle non-linear relationships, feature interactions, and mixed feature types (numerical and categorical) without requiring feature scaling. However, individual trees are prone to overfitting and high variance, which motivates their use as base learners in ensemble methods (Random Forests, Gradient Boosting). Decision trees form one of the most fundamental and widely-used model families in machine learning.
Usage
Use decision trees when interpretability is paramount and the audience needs to understand the reasoning behind predictions. Use them as a baseline model for classification or regression before trying more complex methods. Apply pre-pruning (max depth, min samples per leaf) or post-pruning to control overfitting. Decision trees are also the foundation for ensemble methods that aggregate many trees to improve predictive performance. Use tree visualization and export capabilities to communicate model logic to stakeholders.
Theoretical Basis
Tree Construction (CART algorithm): The tree is built by recursively selecting the best split at each node:
- For each candidate feature and threshold , partition the data into left () and right () subsets.
- Evaluate the quality of the split using an impurity criterion.
- Select the split that maximizes the reduction in impurity.
- Repeat recursively on each subset until a stopping criterion is met.
Classification Impurity Measures:
Gini impurity:
where is the proportion of class in node . Gini impurity is zero when all samples belong to one class.
Entropy (Information Gain):
The information gain of a split is:
Regression Impurity Measure:
Mean Squared Error:
where is the mean target value in node .
Prediction:
- Classification: Majority class in the leaf node, or the class probability distribution.
- Regression: Mean (or median) of target values in the leaf node.
Pruning controls model complexity:
- Pre-pruning: Limit max depth, require minimum samples per split/leaf, limit max leaf nodes.
- Cost-complexity pruning (post-pruning): Minimize:
where is the training error, is the number of leaves, and is the complexity parameter. The optimal is chosen via cross-validation.
Feature importance is computed as the total reduction in impurity brought by each feature across all nodes: