Implementation:Scikit learn Scikit learn BaseForest Fit

Overview

Concrete tool for training forest-based ensemble models by fitting trees in parallel provided by scikit-learn. The Template:Code method is the shared training procedure used by Template:Code, Template:Code, Template:Code, and Template:Code. It validates input data, handles bootstrap sampling, constructs decision trees in parallel using joblib, computes out-of-bag scores when requested, and supports warm-start incremental training.

Method Signature

def fit(self, X, y, sample_weight=None):
    """
    Build a forest of trees from the training set (X, y).

    Parameters
    ----------
    X : {array-like, sparse matrix} of shape (n_samples, n_features)
        The training input samples. Internally, its dtype will be converted
        to dtype=np.float32. If a sparse matrix is provided, it will be
        converted into a sparse csc_matrix.

    y : array-like of shape (n_samples,) or (n_samples, n_outputs)
        The target values (class labels in classification, real numbers
        in regression).

    sample_weight : array-like of shape (n_samples,), default=None
        Sample weights. If None, then samples are equally weighted.
        Splits that would create child nodes with net zero or negative
        weight are ignored while searching for a split in each node.

    Returns
    -------
    self : object
        Fitted estimator.
    """

Training Procedure

The Template:Code method proceeds through the following steps:

Input Validation: Validates and converts Template:Code to Template:Code and Template:Code to a contiguous array. Sparse matrices are converted to CSC format and pre-sorted by indices.
Missing Value Handling: Computes a missing values feature mask if the underlying tree estimator supports missing values (NaN-aware splitting).
Sample Weight Computation: Combines user-provided Template:Code with Template:Code (derived from the Template:Code parameter).
Bootstrap Configuration: If Template:Code, determines the number of bootstrap samples (Template:Code) based on Template:Code.
Warm-Start Logic: If Template:Code and estimators already exist, retains them and creates only the additional estimators needed to reach Template:Code.
Parallel Tree Construction: New trees are built in parallel using Template:Code with a threading backend preference (since the Cython tree-building code releases the Python GIL). Each tree is fitted via Template:Code, which handles bootstrap sampling and individual tree fitting.
OOB Score Computation: If Template:Code (and Template:Code), computes the out-of-bag score by aggregating predictions from trees that did not include a given sample in their bootstrap draw.
Attribute Finalization: Decapsulates Template:Code and Template:Code attributes for single-output problems.

Fitted Attributes

After calling Template:Code, the following attributes are set on the estimator:

estimators_ -- List of fitted decision tree estimators. Extended (not replaced) when Template:Code.
classes_ -- The class labels (classification only). Ndarray of shape Template:Code for single output.
n_classes_ -- The number of classes (classification only).
n_outputs_ -- The number of outputs when Template:Code is performed.
n_features_in_ -- Number of features seen during Template:Code.
feature_names_in_ -- Names of features seen during Template:Code (only when X has string feature names).
oob_score_ -- Out-of-bag score (only when Template:Code). For classifiers, the default metric is accuracy; a custom callable can be provided.
oob_decision_function_ -- OOB decision function values on the training set (classification only, only when Template:Code).
estimators_samples_ -- The subset of drawn samples (in-bag sample indices) for each base estimator.

Example Usage

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=10, random_state=42)

# Basic fit
clf = RandomForestClassifier(n_estimators=100, oob_score=True, random_state=42)
clf.fit(X, y)
print(f"OOB Score: {clf.oob_score_:.4f}")
print(f"Number of fitted trees: {len(clf.estimators_)}")

# Warm-start to add more trees
clf.set_params(n_estimators=150, warm_start=True)
clf.fit(X, y)
print(f"Number of fitted trees after warm-start: {len(clf.estimators_)}")

Source Location

Template:Code, method Template:Code (lines 304-523).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment