Implementation:Scikit learn Scikit learn DecisionTreeClassifier
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Classification |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for decision tree-based classification and regression provided by scikit-learn.
Description
This module implements all tree-based estimator classes: DecisionTreeClassifier, DecisionTreeRegressor, ExtraTreeClassifier, and ExtraTreeRegressor. These estimators build tree models by recursively splitting the feature space using criteria such as Gini impurity, entropy, or log-loss for classification and MSE, MAE, or Poisson deviance for regression. The module supports both depth-first and best-first tree building strategies, cost-complexity pruning, sample weighting, and multi-output problems. Trees can be constrained via max_depth, min_samples_split, min_samples_leaf, and other parameters.
Usage
Use DecisionTreeClassifier for interpretable classification models, as a base estimator for ensemble methods (Random Forest, Gradient Boosting), or when you need feature importance rankings. ExtraTreeClassifier adds randomization to split thresholds for use in Extra-Trees ensembles.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/tree/_classes.py
Signature
class DecisionTreeClassifier(ClassifierMixin, BaseDecisionTree):
"""A decision tree classifier."""
def __init__(
self,
*,
criterion="gini",
splitter="best",
max_depth=None,
min_samples_split=2,
min_samples_leaf=1,
min_weight_fraction_leaf=0.0,
max_features=None,
random_state=None,
max_leaf_nodes=None,
min_impurity_decrease=0.0,
class_weight=None,
ccp_alpha=0.0,
monotonic_cst=None,
):
...
class DecisionTreeRegressor(RegressorMixin, BaseDecisionTree):
"""A decision tree regressor."""
...
class ExtraTreeClassifier(DecisionTreeClassifier):
"""An extremely randomized tree classifier."""
...
class ExtraTreeRegressor(DecisionTreeRegressor):
"""An extremely randomized tree regressor."""
...
Import
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.tree import ExtraTreeClassifier, ExtraTreeRegressor
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| X | array-like of shape (n_samples, n_features) | Yes | Training feature data |
| y | array-like of shape (n_samples,) or (n_samples, n_outputs) | Yes | Target values |
| criterion | str | No | Split quality function: 'gini', 'entropy', 'log_loss' (default: 'gini') |
| splitter | str | No | Split strategy: 'best' or 'random' (default: 'best') |
| max_depth | int or None | No | Maximum tree depth (default: None, fully grown) |
| min_samples_split | int or float | No | Minimum samples to split a node (default: 2) |
| min_samples_leaf | int or float | No | Minimum samples per leaf (default: 1) |
| max_features | int, float, str, or None | No | Features considered per split (default: None) |
| ccp_alpha | float | No | Cost-complexity pruning parameter (default: 0.0) |
| class_weight | dict or 'balanced' or None | No | Class weight mapping (default: None) |
Outputs
| Name | Type | Description |
|---|---|---|
| classes_ | ndarray | Unique class labels |
| feature_importances_ | ndarray of shape (n_features,) | Gini-based feature importance scores |
| tree_ | Tree | Internal tree structure object |
| n_classes_ | int or list | Number of classes per output |
| max_features_ | int | Inferred value of max_features |
Usage Examples
Basic Usage
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
clf = DecisionTreeClassifier(max_depth=3, random_state=42)
clf.fit(X_train, y_train)
print("Accuracy:", clf.score(X_test, y_test))
print("Feature importances:", clf.feature_importances_)