Implementation:Scikit learn Scikit learn FetchKddcup99
| Knowledge Sources | |
|---|---|
| Domains | Data Loading, Anomaly Detection |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for fetching the KDD Cup 99 intrusion detection dataset provided by scikit-learn.
Description
This module implements the fetch_kddcup99 function that downloads and loads the KDD Cup 1999 dataset, a classic benchmark for anomaly detection and network intrusion detection. The dataset contains network connection records with labels indicating normal connections or specific attack types. The module supports loading various subsets (SA, SF, http, smtp), the full dataset or 10% sample, and can return data as NumPy arrays or pandas DataFrames.
Usage
Use this function to load the KDD Cup 99 dataset for evaluating anomaly detection, network intrusion detection, or outlier detection algorithms.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/datasets/_kddcup99.py
Signature
@validate_params(...)
def fetch_kddcup99(
*,
subset=None,
data_home=None,
shuffle=False,
random_state=None,
percent10=True,
download_if_missing=True,
return_X_y=False,
as_frame=False,
n_retries=3,
delay=1.0,
)
Import
from sklearn.datasets import fetch_kddcup99
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| subset | str or None | No | Subset to load: 'SA', 'SF', 'http', 'smtp', or None for full dataset |
| data_home | str or PathLike or None | No | Custom directory for data storage |
| shuffle | bool | No | Whether to shuffle the dataset (default: False) |
| random_state | int or None | No | Random state for reproducible shuffling |
| percent10 | bool | No | Whether to load 10% subset (default: True) |
| return_X_y | bool | No | If True, return (data, target) tuple (default: False) |
| as_frame | bool | No | If True, return data as pandas DataFrame (default: False) |
Outputs
| Name | Type | Description |
|---|---|---|
| data | Bunch | Dictionary-like object with data, target, feature_names, target_names, DESCR |
| (X, y) | tuple | Returned when return_X_y=True; feature matrix and target array |
Usage Examples
Basic Usage
from sklearn.datasets import fetch_kddcup99
# Load the http subset
data = fetch_kddcup99(subset='http', percent10=True)
print("Shape:", data.data.shape)
print("Target classes:", set(data.target))