Implementation:Scikit learn Scikit learn MLPClassifier
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Neural Networks |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for multi-layer perceptron classification and regression provided by scikit-learn.
Description
MLPClassifier implements a multi-layer perceptron (MLP) classifier that trains using backpropagation. It optimizes the log-loss function using LBFGS, stochastic gradient descent (SGD), or Adam optimizer. The module also contains the abstract base class BaseMultilayerPerceptron which provides the shared infrastructure for both classification and regression MLPs. The implementation supports multiple activation functions (relu, tanh, logistic, identity), configurable hidden layer architectures, and regularization via alpha parameter.
Usage
Use MLPClassifier when you need a neural network classifier within the scikit-learn ecosystem. It is suitable for moderate-sized datasets and provides a convenient interface with support for early stopping, learning rate scheduling, and warm starting for iterative training.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/neural_network/_multilayer_perceptron.py
Signature
class MLPClassifier(ClassifierMixin, BaseMultilayerPerceptron):
def __init__(
self,
hidden_layer_sizes=(100,),
activation="relu",
*,
solver="adam",
alpha=0.0001,
batch_size="auto",
learning_rate="constant",
learning_rate_init=0.001,
power_t=0.5,
max_iter=200,
shuffle=True,
random_state=None,
tol=1e-4,
verbose=False,
warm_start=False,
momentum=0.9,
nesterovs_momentum=True,
early_stopping=False,
validation_fraction=0.1,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-8,
n_iter_no_change=10,
max_fun=15000,
):
Import
from sklearn.neural_network import MLPClassifier
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| hidden_layer_sizes | tuple | No | Number of neurons in each hidden layer (default=(100,)) |
| activation | str | No | Activation function: 'identity', 'logistic', 'tanh', 'relu' (default='relu') |
| solver | str | No | Optimization solver: 'lbfgs', 'sgd', 'adam' (default='adam') |
| alpha | float | No | L2 regularization parameter (default=0.0001) |
| batch_size | int or str | No | Mini-batch size, or 'auto' (default='auto') |
| learning_rate | str | No | Learning rate schedule: 'constant', 'invscaling', 'adaptive' (default='constant') |
| learning_rate_init | float | No | Initial learning rate (default=0.001) |
| power_t | float | No | Exponent for inverse scaling learning rate (default=0.5) |
| max_iter | int | No | Maximum number of iterations (default=200) |
| shuffle | bool | No | Whether to shuffle samples in each iteration (default=True) |
| random_state | int, RandomState, or None | No | Random state for reproducibility |
| tol | float | No | Tolerance for optimization convergence (default=1e-4) |
| verbose | bool | No | Whether to print progress messages (default=False) |
| warm_start | bool | No | Reuse previous solution for initialization (default=False) |
| momentum | float | No | Momentum for SGD update (default=0.9) |
| nesterovs_momentum | bool | No | Whether to use Nesterov momentum (default=True) |
| early_stopping | bool | No | Whether to terminate early based on validation score (default=False) |
| validation_fraction | float | No | Proportion of training data for validation (default=0.1) |
| beta_1 | float | No | Exponential decay rate for first moment in Adam (default=0.9) |
| beta_2 | float | No | Exponential decay rate for second moment in Adam (default=0.999) |
| epsilon | float | No | Numerical stability value in Adam (default=1e-8) |
| n_iter_no_change | int | No | Maximum epochs without improvement before stopping (default=10) |
| max_fun | int | No | Maximum number of loss function calls for lbfgs solver (default=15000) |
Outputs
| Name | Type | Description |
|---|---|---|
| classes_ | ndarray of shape (n_classes,) | Class labels for each output |
| loss_ | float | Current loss computed with the loss function |
| best_loss_ | float | Minimum loss reached by the solver |
| coefs_ | list of shape (n_layers - 1,) | Weight matrices for each layer |
| intercepts_ | list of shape (n_layers - 1,) | Bias vectors for each layer |
| n_features_in_ | int | Number of features seen during fit |
| n_iter_ | int | Number of iterations the solver ran |
| n_layers_ | int | Number of layers |
| n_outputs_ | int | Number of outputs |
| out_activation_ | str | Name of the output activation function |
Usage Examples
Basic Usage
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
clf = MLPClassifier(hidden_layer_sizes=(100, 50), max_iter=300, random_state=42)
clf.fit(X_train, y_train)
print(f"Accuracy: {clf.score(X_test, y_test):.3f}")