Principle:Scikit learn Scikit learn Neural Networks
| Knowledge Sources | |
|---|---|
| Domains | Supervised Learning, Representation Learning |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Neural networks are computational models composed of layers of interconnected nodes (neurons) that learn hierarchical representations of data through iterative optimization.
Description
Neural networks model complex, non-linear relationships by composing simple parameterized functions (neurons) into layers. Each neuron applies a linear transformation followed by a non-linear activation function, and stacking multiple layers enables the network to learn increasingly abstract representations. They address the problem of approximating arbitrary continuous functions (universal approximation theorem) without requiring manual feature engineering. Multi-Layer Perceptrons (MLPs) are the classical feedforward architecture, while Restricted Boltzmann Machines (RBMs) are generative models that learn a probability distribution over inputs using an undirected graphical model structure.
Usage
Use MLP classifiers and regressors for tabular data when non-linear relationships are expected and sufficient training data is available. MLPs are appropriate when tree-based methods underperform or when automatic feature interaction learning is desired. Use RBMs for unsupervised feature learning, dimensionality reduction, or as building blocks for deep belief networks. Neural networks require careful hyperparameter tuning (number of layers, layer sizes, learning rate, regularization) and are best suited to problems where the dataset is large enough to support the model's capacity.
Theoretical Basis
Multi-Layer Perceptron (MLP) consists of an input layer, one or more hidden layers, and an output layer. For a network with hidden layers:
Forward pass:
where is the hidden layer activation function and is the output activation (identity for regression, softmax for classification).
Common activation functions:
- ReLU:
- Sigmoid:
- Tanh:
Backpropagation computes gradients of the loss with respect to all weights using the chain rule:
Weights are updated using gradient-based optimizers (SGD, Adam, L-BFGS):
Regularization techniques prevent overfitting:
- L2 penalty: added to the loss
- Early stopping: Training halts when validation performance degrades
Restricted Boltzmann Machine (RBM) is an undirected graphical model with visible units and hidden units . The energy function is:
The joint probability is . The conditional distributions are:
Training uses Contrastive Divergence (CD-k), which approximates the gradient of the log-likelihood using steps of Gibbs sampling.