Implementation:Scikit learn Scikit learn FetchCaliforniaHousing
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Data Loading |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for fetching and loading the California housing dataset for regression tasks, provided by scikit-learn.
Description
The fetch_california_housing function downloads and caches the California housing dataset, which contains 20,640 observations on 9 variables including median house value as the target and features such as average income, housing average age, average rooms, population, latitude, and longitude. The dataset originates from Pace and Barry (1997).
Usage
Use this function when you need a real-world regression dataset for benchmarking or prototyping regression models. It is commonly used as a standard regression benchmark in scikit-learn tutorials and examples.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/datasets/_california_housing.py
Signature
@validate_params(...)
def fetch_california_housing(
*,
data_home=None,
download_if_missing=True,
return_X_y=False,
as_frame=False,
n_retries=3,
delay=1.0,
):
Import
from sklearn.datasets import fetch_california_housing
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data_home | str, PathLike or None | No | Custom directory for caching (default None uses sklearn data home) |
| download_if_missing | bool | No | If True, download data if not cached (default True) |
| return_X_y | bool | No | If True, return (data, target) instead of Bunch (default False) |
| as_frame | bool | No | If True, return data as pandas DataFrame (default False) |
| n_retries | int | No | Number of download retries (default 3) |
| delay | float | No | Delay between retries in seconds (default 1.0) |
Outputs
| Name | Type | Description |
|---|---|---|
| dataset | Bunch | Dictionary-like object with data, target, feature_names, DESCR, and frame |
| (X, y) | tuple of ndarray | Feature matrix and target array when return_X_y=True |
Usage Examples
Basic Usage
from sklearn.datasets import fetch_california_housing
data = fetch_california_housing()
print(data.data.shape) # (20640, 8)
print(data.target.shape) # (20640,)
print(data.feature_names)
# As X, y tuple
X, y = fetch_california_housing(return_X_y=True)