Implementation:Scikit learn Scikit learn IsolationForest
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Anomaly Detection, Ensemble Methods |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete implementation of the Isolation Forest anomaly detection algorithm provided by scikit-learn.
Description
The IsolationForest class implements the Isolation Forest algorithm for anomaly detection. It isolates observations by randomly selecting features and split values, creating random trees. Anomalies have shorter average path lengths because they are easier to isolate. The algorithm builds on BaseBagging with ExtraTreeRegressor as the base estimator and supports parallel tree depth computation.
Usage
Use Isolation Forest for unsupervised anomaly detection when you need to identify outliers in datasets. It works well with high-dimensional data and does not require labeled anomaly data for training.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/ensemble/_iforest.py
Signature
class IsolationForest(OutlierMixin, BaseBagging):
def __init__(
self,
*,
n_estimators=100,
max_samples="auto",
contamination="auto",
max_features=1.0,
bootstrap=False,
n_jobs=None,
random_state=None,
verbose=0,
warm_start=False,
):
...
def fit(self, X, y=None, sample_weight=None):
...
def predict(self, X):
...
def decision_function(self, X):
...
def score_samples(self, X):
...
Import
from sklearn.ensemble import IsolationForest
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| X | array-like of shape (n_samples, n_features) | Yes | Training input samples |
| n_estimators | int | No | Number of isolation trees (default: 100) |
| max_samples | int, float, or "auto" | No | Number of samples to draw for each tree |
| contamination | float or "auto" | No | Expected proportion of anomalies in the dataset |
| sample_weight | array-like of shape (n_samples,) | No | Per-sample weights |
Outputs
| Name | Type | Description |
|---|---|---|
| predictions | ndarray of shape (n_samples,) | 1 for inliers, -1 for outliers |
| scores | ndarray of shape (n_samples,) | Anomaly scores (lower is more anomalous) |
Usage Examples
Basic Usage
import numpy as np
from sklearn.ensemble import IsolationForest
# Generate data with outliers
rng = np.random.RandomState(42)
X_normal = rng.randn(100, 2)
X_outliers = rng.uniform(low=-6, high=6, size=(10, 2))
X = np.vstack([X_normal, X_outliers])
clf = IsolationForest(random_state=42, contamination=0.1)
clf.fit(X)
predictions = clf.predict(X)
print(f"Detected outliers: {(predictions == -1).sum()}")