Implementation:Scikit learn Scikit learn RandomForestClassifier Init

Overview

Concrete tool for creating a random forest classifier ensemble provided by scikit-learn. The Template:Code class builds a collection of decision tree classifiers on bootstrap sub-samples of the dataset. It uses averaging to improve predictive accuracy and control over-fitting. Trees in the forest use the best split strategy. The sub-sample size is controlled with the Template:Code parameter when Template:Code (the default), otherwise the whole dataset is used to build each tree.

Constructor Signature

from sklearn.ensemble import RandomForestClassifier

RandomForestClassifier(
    n_estimators=100,
    *,
    criterion="gini",
    max_depth=None,
    min_samples_split=2,
    min_samples_leaf=1,
    min_weight_fraction_leaf=0.0,
    max_features="sqrt",
    max_leaf_nodes=None,
    min_impurity_decrease=0.0,
    bootstrap=True,
    oob_score=False,
    n_jobs=None,
    random_state=None,
    verbose=0,
    warm_start=False,
    class_weight=None,
    ccp_alpha=0.0,
    max_samples=None,
    monotonic_cst=None,
)

Parameters

n_estimators (int, default=100) -- The number of trees in the forest.
criterion ({"gini", "entropy", "log_loss"}, default="gini") -- The function to measure the quality of a split. "gini" for Gini impurity, "entropy" and "log_loss" for Shannon information gain.
max_depth (int, default=None) -- The maximum depth of the tree. If None, nodes are expanded until all leaves are pure or contain fewer than Template:Code samples.
min_samples_split (int or float, default=2) -- The minimum number of samples required to split an internal node. If float, interpreted as a fraction of Template:Code.
min_samples_leaf (int or float, default=1) -- The minimum number of samples required to be at a leaf node. If float, interpreted as a fraction of Template:Code.
min_weight_fraction_leaf (float, default=0.0) -- The minimum weighted fraction of the sum total of weights required to be at a leaf node.
max_features ({"sqrt", "log2", None}, int or float, default="sqrt") -- The number of features to consider when looking for the best split. "sqrt" uses the square root of the total number of features.
max_leaf_nodes (int, default=None) -- Grow trees with this many leaf nodes in best-first fashion. If None, unlimited.
min_impurity_decrease (float, default=0.0) -- A node will be split if the split induces a decrease in impurity greater than or equal to this value.
bootstrap (bool, default=True) -- Whether bootstrap samples are used when building trees.
oob_score (bool or callable, default=False) -- Whether to use out-of-bag samples to estimate the generalization score. By default uses accuracy. A callable with signature Template:Code can be provided.
n_jobs (int, default=None) -- The number of jobs to run in parallel. Template:Code, Template:Code, Template:Code, and Template:Code are all parallelized over the trees.
random_state (int, RandomState instance or None, default=None) -- Controls randomness of the bootstrapping and feature sampling.
verbose (int, default=0) -- Controls the verbosity when fitting and predicting.
warm_start (bool, default=False) -- When True, reuse the solution of the previous call to Template:Code and add more estimators to the ensemble.
class_weight ({"balanced", "balanced_subsample"}, dict or list of dicts, default=None) -- Weights associated with classes. "balanced" adjusts weights inversely proportional to class frequencies. "balanced_subsample" computes weights per bootstrap sample.
ccp_alpha (non-negative float, default=0.0) -- Complexity parameter used for Minimal Cost-Complexity Pruning.
max_samples (int or float, default=None) -- If bootstrap is True, the number of samples to draw from X to train each base estimator. If None, draws Template:Code samples.
monotonic_cst (array-like of int of shape (n_features), default=None) -- Monotonicity constraints for each feature: 1 for increase, 0 for no constraint, -1 for decrease.

Fitted Attributes

estimators_ -- The collection of fitted Template:Code sub-estimators.
classes_ -- The class labels (ndarray of shape Template:Code).
n_classes_ -- The number of classes.
n_features_in_ -- Number of features seen during Template:Code.
feature_names_in_ -- Names of features seen during Template:Code (only when X has string feature names).
n_outputs_ -- The number of outputs when Template:Code is performed.
feature_importances_ -- The impurity-based feature importances (ndarray of shape Template:Code).
oob_score_ -- Score of the training dataset obtained using an out-of-bag estimate (only when Template:Code).
oob_decision_function_ -- Decision function computed with out-of-bag estimate on the training set (only when Template:Code).
estimators_samples_ -- The subset of drawn samples (in-bag samples) for each base estimator.

Example Usage

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

X, y = make_classification(
    n_samples=1000, n_features=4,
    n_informative=2, n_redundant=0,
    random_state=0, shuffle=False,
)

clf = RandomForestClassifier(max_depth=2, random_state=0)
clf.fit(X, y)
print(clf.predict([[0, 0, 0, 0]]))
# [1]

Source Location

Template:Code, class Template:Code (lines 1175-1576).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment