Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Scikit learn Scikit learn FetchKddcup99

From Leeroopedia


Knowledge Sources
Domains Data Loading, Anomaly Detection
Last Updated 2026-02-08 15:00 GMT

Overview

Concrete tool for fetching the KDD Cup 99 intrusion detection dataset provided by scikit-learn.

Description

This module implements the fetch_kddcup99 function that downloads and loads the KDD Cup 1999 dataset, a classic benchmark for anomaly detection and network intrusion detection. The dataset contains network connection records with labels indicating normal connections or specific attack types. The module supports loading various subsets (SA, SF, http, smtp), the full dataset or 10% sample, and can return data as NumPy arrays or pandas DataFrames.

Usage

Use this function to load the KDD Cup 99 dataset for evaluating anomaly detection, network intrusion detection, or outlier detection algorithms.

Code Reference

Source Location

Signature

@validate_params(...)
def fetch_kddcup99(
    *,
    subset=None,
    data_home=None,
    shuffle=False,
    random_state=None,
    percent10=True,
    download_if_missing=True,
    return_X_y=False,
    as_frame=False,
    n_retries=3,
    delay=1.0,
)

Import

from sklearn.datasets import fetch_kddcup99

I/O Contract

Inputs

Name Type Required Description
subset str or None No Subset to load: 'SA', 'SF', 'http', 'smtp', or None for full dataset
data_home str or PathLike or None No Custom directory for data storage
shuffle bool No Whether to shuffle the dataset (default: False)
random_state int or None No Random state for reproducible shuffling
percent10 bool No Whether to load 10% subset (default: True)
return_X_y bool No If True, return (data, target) tuple (default: False)
as_frame bool No If True, return data as pandas DataFrame (default: False)

Outputs

Name Type Description
data Bunch Dictionary-like object with data, target, feature_names, target_names, DESCR
(X, y) tuple Returned when return_X_y=True; feature matrix and target array

Usage Examples

Basic Usage

from sklearn.datasets import fetch_kddcup99

# Load the http subset
data = fetch_kddcup99(subset='http', percent10=True)
print("Shape:", data.data.shape)
print("Target classes:", set(data.target))

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment