Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Scikit learn Scikit learn Generalized Linear Models

From Leeroopedia


Knowledge Sources
Domains Supervised Learning, Statistical Modeling
Last Updated 2026-02-08 15:00 GMT

Overview

Generalized linear models extend ordinary linear regression by allowing the response variable to follow distributions from the exponential family and relating the mean to the linear predictor through a link function.

Description

Generalized Linear Models (GLMs) provide a unified framework for regression when the response variable does not follow a Gaussian distribution. They accommodate count data (Poisson), positive continuous data (Gamma), binary data (Bernoulli/Binomial), and other exponential family distributions. GLMs solve the problem of applying linear modeling principles to response variables that violate the normality assumption of ordinary least squares. They occupy a central role in statistical modeling, bridging classical linear regression with more flexible non-linear approaches.

Usage

Use GLMs when the response variable has a non-Gaussian distribution but a known relationship to the exponential family. Use PoissonRegressor for count data (e.g., number of events, insurance claims). Use GammaRegressor for positive continuous data that is right-skewed (e.g., insurance claim amounts, durations). Use TweedieRegressor when the response has a Tweedie distribution, which encompasses Poisson and Gamma as special cases and is particularly useful for data with exact zeros and a continuous positive component. GLMs are especially important in actuarial science, healthcare, and ecology.

Theoretical Basis

A GLM consists of three components:

  1. Random component: The response variable y follows a distribution from the exponential family:
    p(y|θ,ϕ)=exp(yθb(θ)a(ϕ)+c(y,ϕ))
    where θ is the natural parameter, ϕ is the dispersion parameter, and b(θ) is the cumulant function.
  1. Systematic component: A linear predictor η=Xβ.
  1. Link function: A monotonic function g relating the conditional mean μ=E[y|X] to the linear predictor: g(μ)=η.

Common GLM families and their canonical link functions:

Distribution Link Function g(μ)
Gaussian Identity μ
Poisson Log log(μ)
Gamma Reciprocal 1/μ
Bernoulli Logit log(μ/(1μ))

Parameter estimation is performed by maximizing the log-likelihood, typically via Iteratively Reweighted Least Squares (IRLS) or Newton's method. With an 2 penalty, the objective becomes:

β^=argminβ1ni=1nlogp(yi|xi,β)+α2β22

The Tweedie distribution is a special case of the exponential family parameterized by a power parameter p:

  • p=0: Gaussian
  • p=1: Poisson
  • 1<p<2: compound Poisson-Gamma (zero-inflated continuous)
  • p=2: Gamma
  • p=3: Inverse Gaussian

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment