Principle:Scikit learn Scikit learn Random Forest Classification
Overview
An ensemble method that combines many independently trained decision trees using bootstrap sampling and random feature selection to improve prediction accuracy.
Description
Random forest classification constructs a large collection of decision trees, each trained on a different bootstrap sample of the training data. At every split within a tree, only a random subset of features is evaluated as candidate split variables. This combination of bagging (bootstrap aggregating) and the random subspace method produces trees that are diverse in their structure and predictions.
During prediction, each tree in the forest casts a vote for a class label, and the ensemble returns the majority vote across all trees. Because the trees are built independently and each sees a different perturbation of the data, the ensemble averages out the high variance of individual decision trees while maintaining low bias.
A key advantage of random forests is out-of-bag (OOB) estimation. Since each bootstrap sample leaves approximately one-third of the training data unused, these out-of-bag samples serve as a built-in validation set. The OOB error provides an unbiased estimate of the generalization error without requiring a separate holdout set or cross-validation procedure.
Usage
Random forest classification is appropriate when:
- You need a robust, general-purpose classifier with minimal hyperparameter tuning.
- You want built-in feature importance scores to understand which variables drive predictions.
- You need out-of-bag error estimates without a separate validation split.
- You are working with high-dimensional data where individual trees would overfit.
- You want straightforward parallelization across multiple CPU cores.
Theoretical Basis
The theoretical foundations of random forest classification rest on several principles:
- Bootstrap Aggregating (Bagging): By training each tree on a bootstrap sample drawn with replacement from the original dataset, the ensemble reduces variance compared to a single decision tree. The averaging effect smooths out the noisy predictions of individual trees.
- Random Subspace Method: At each split, only a random subset of features (typically the square root of the total number of features for classification) is considered. This decorrelates the trees, ensuring that a single dominant feature does not appear at the root of every tree.
- Variance Reduction Through Averaging: The variance of the ensemble prediction decreases as the number of trees grows, provided the individual trees remain sufficiently diverse. The correlation between trees is the key factor governing the rate of variance reduction.
- OOB Error Estimation: Each training sample is left out of roughly 37% of the bootstrap draws. Aggregating predictions from only those trees that did not include a given sample yields an honest estimate of the model's out-of-sample accuracy.