Implementation:Scikit learn Scikit learn FetchCovtype
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Data Loading |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for fetching and loading the forest covertype dataset for classification benchmarks, provided by scikit-learn.
Description
The fetch_covtype function downloads and caches the forest covertype dataset from the UCI Machine Learning Repository. This classic classification benchmark features both categorical and real-valued features such as Elevation, Aspect, Slope, and distances to hydrology, roadways, and fire points. It contains 581,012 samples with 54 features and 7 forest cover type classes.
Usage
Use this function when you need a large-scale classification dataset for benchmarking classifiers. It is particularly useful for evaluating scalability of classification algorithms due to its size.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/datasets/_covtype.py
Signature
@validate_params(...)
def fetch_covtype(
*,
data_home=None,
download_if_missing=True,
random_state=None,
shuffle=False,
return_X_y=False,
as_frame=False,
n_retries=3,
delay=1.0,
):
Import
from sklearn.datasets import fetch_covtype
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data_home | str, PathLike or None | No | Custom directory for caching (default None) |
| download_if_missing | bool | No | If True, download data if not cached (default True) |
| random_state | int, RandomState or None | No | Random seed for shuffling (default None) |
| shuffle | bool | No | Whether to shuffle the dataset (default False) |
| return_X_y | bool | No | If True, return (data, target) instead of Bunch (default False) |
| as_frame | bool | No | If True, return data as pandas DataFrame (default False) |
| n_retries | int | No | Number of download retries (default 3) |
| delay | float | No | Delay between retries in seconds (default 1.0) |
Outputs
| Name | Type | Description |
|---|---|---|
| dataset | Bunch | Dictionary-like object with data, target, feature_names, and DESCR |
| (X, y) | tuple of ndarray | Feature matrix and target array when return_X_y=True |
Usage Examples
Basic Usage
from sklearn.datasets import fetch_covtype
covtype = fetch_covtype()
print(covtype.data.shape) # (581012, 54)
print(covtype.target.shape) # (581012,)
X, y = fetch_covtype(return_X_y=True, shuffle=True, random_state=42)