Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Scikit learn Scikit learn FetchCovtype

From Leeroopedia


Knowledge Sources
Domains Machine Learning, Data Loading
Last Updated 2026-02-08 15:00 GMT

Overview

Concrete tool for fetching and loading the forest covertype dataset for classification benchmarks, provided by scikit-learn.

Description

The fetch_covtype function downloads and caches the forest covertype dataset from the UCI Machine Learning Repository. This classic classification benchmark features both categorical and real-valued features such as Elevation, Aspect, Slope, and distances to hydrology, roadways, and fire points. It contains 581,012 samples with 54 features and 7 forest cover type classes.

Usage

Use this function when you need a large-scale classification dataset for benchmarking classifiers. It is particularly useful for evaluating scalability of classification algorithms due to its size.

Code Reference

Source Location

Signature

@validate_params(...)
def fetch_covtype(
    *,
    data_home=None,
    download_if_missing=True,
    random_state=None,
    shuffle=False,
    return_X_y=False,
    as_frame=False,
    n_retries=3,
    delay=1.0,
):

Import

from sklearn.datasets import fetch_covtype

I/O Contract

Inputs

Name Type Required Description
data_home str, PathLike or None No Custom directory for caching (default None)
download_if_missing bool No If True, download data if not cached (default True)
random_state int, RandomState or None No Random seed for shuffling (default None)
shuffle bool No Whether to shuffle the dataset (default False)
return_X_y bool No If True, return (data, target) instead of Bunch (default False)
as_frame bool No If True, return data as pandas DataFrame (default False)
n_retries int No Number of download retries (default 3)
delay float No Delay between retries in seconds (default 1.0)

Outputs

Name Type Description
dataset Bunch Dictionary-like object with data, target, feature_names, and DESCR
(X, y) tuple of ndarray Feature matrix and target array when return_X_y=True

Usage Examples

Basic Usage

from sklearn.datasets import fetch_covtype

covtype = fetch_covtype()
print(covtype.data.shape)    # (581012, 54)
print(covtype.target.shape)  # (581012,)

X, y = fetch_covtype(return_X_y=True, shuffle=True, random_state=42)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment