Implementation:Scikit learn Scikit learn RandomForestClassifier Init
Appearance
Overview
Concrete tool for creating a random forest classifier ensemble provided by scikit-learn. The Template:Code class builds a collection of decision tree classifiers on bootstrap sub-samples of the dataset. It uses averaging to improve predictive accuracy and control over-fitting. Trees in the forest use the best split strategy. The sub-sample size is controlled with the Template:Code parameter when Template:Code (the default), otherwise the whole dataset is used to build each tree.
Constructor Signature
from sklearn.ensemble import RandomForestClassifier
RandomForestClassifier(
n_estimators=100,
*,
criterion="gini",
max_depth=None,
min_samples_split=2,
min_samples_leaf=1,
min_weight_fraction_leaf=0.0,
max_features="sqrt",
max_leaf_nodes=None,
min_impurity_decrease=0.0,
bootstrap=True,
oob_score=False,
n_jobs=None,
random_state=None,
verbose=0,
warm_start=False,
class_weight=None,
ccp_alpha=0.0,
max_samples=None,
monotonic_cst=None,
)
Parameters
- n_estimators (int, default=100) -- The number of trees in the forest.
- criterion ({"gini", "entropy", "log_loss"}, default="gini") -- The function to measure the quality of a split. "gini" for Gini impurity, "entropy" and "log_loss" for Shannon information gain.
- max_depth (int, default=None) -- The maximum depth of the tree. If None, nodes are expanded until all leaves are pure or contain fewer than Template:Code samples.
- min_samples_split (int or float, default=2) -- The minimum number of samples required to split an internal node. If float, interpreted as a fraction of Template:Code.
- min_samples_leaf (int or float, default=1) -- The minimum number of samples required to be at a leaf node. If float, interpreted as a fraction of Template:Code.
- min_weight_fraction_leaf (float, default=0.0) -- The minimum weighted fraction of the sum total of weights required to be at a leaf node.
- max_features ({"sqrt", "log2", None}, int or float, default="sqrt") -- The number of features to consider when looking for the best split. "sqrt" uses the square root of the total number of features.
- max_leaf_nodes (int, default=None) -- Grow trees with this many leaf nodes in best-first fashion. If None, unlimited.
- min_impurity_decrease (float, default=0.0) -- A node will be split if the split induces a decrease in impurity greater than or equal to this value.
- bootstrap (bool, default=True) -- Whether bootstrap samples are used when building trees.
- oob_score (bool or callable, default=False) -- Whether to use out-of-bag samples to estimate the generalization score. By default uses accuracy. A callable with signature Template:Code can be provided.
- n_jobs (int, default=None) -- The number of jobs to run in parallel. Template:Code, Template:Code, Template:Code, and Template:Code are all parallelized over the trees.
- random_state (int, RandomState instance or None, default=None) -- Controls randomness of the bootstrapping and feature sampling.
- verbose (int, default=0) -- Controls the verbosity when fitting and predicting.
- warm_start (bool, default=False) -- When True, reuse the solution of the previous call to Template:Code and add more estimators to the ensemble.
- class_weight ({"balanced", "balanced_subsample"}, dict or list of dicts, default=None) -- Weights associated with classes. "balanced" adjusts weights inversely proportional to class frequencies. "balanced_subsample" computes weights per bootstrap sample.
- ccp_alpha (non-negative float, default=0.0) -- Complexity parameter used for Minimal Cost-Complexity Pruning.
- max_samples (int or float, default=None) -- If bootstrap is True, the number of samples to draw from X to train each base estimator. If None, draws Template:Code samples.
- monotonic_cst (array-like of int of shape (n_features), default=None) -- Monotonicity constraints for each feature: 1 for increase, 0 for no constraint, -1 for decrease.
Fitted Attributes
- estimators_ -- The collection of fitted Template:Code sub-estimators.
- classes_ -- The class labels (ndarray of shape Template:Code).
- n_classes_ -- The number of classes.
- n_features_in_ -- Number of features seen during Template:Code.
- feature_names_in_ -- Names of features seen during Template:Code (only when X has string feature names).
- n_outputs_ -- The number of outputs when Template:Code is performed.
- feature_importances_ -- The impurity-based feature importances (ndarray of shape Template:Code).
- oob_score_ -- Score of the training dataset obtained using an out-of-bag estimate (only when Template:Code).
- oob_decision_function_ -- Decision function computed with out-of-bag estimate on the training set (only when Template:Code).
- estimators_samples_ -- The subset of drawn samples (in-bag samples) for each base estimator.
Example Usage
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
X, y = make_classification(
n_samples=1000, n_features=4,
n_informative=2, n_redundant=0,
random_state=0, shuffle=False,
)
clf = RandomForestClassifier(max_depth=2, random_state=0)
clf.fit(X, y)
print(clf.predict([[0, 0, 0, 0]]))
# [1]
Source Location
Template:Code, class Template:Code (lines 1175-1576).
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment