Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Trailofbits Fickling Benchmark Dataset Construction

From Leeroopedia
Knowledge Sources
Domains Security, Data_Collection, Pickle_Safety
Last Updated 2026-02-14 14:00 GMT

Overview

Methodology for constructing representative datasets of real-world pickle and PyTorch model files from public repositories to serve as ground truth for security scanner evaluation.

Description

Benchmark Dataset Construction addresses the need for realistic test data when evaluating pickle security scanners. The approach downloads pickle and PyTorch model files from HuggingFace, applying size-based filtering to exclude files that are too large for practical scanning or too small to be meaningful. The resulting dataset serves as the clean baseline against which scanners are tested for false positives. The construction process supports incremental builds (adding files to existing datasets), size thresholds, and optional extraction of individual pickle files from PyTorch zip archives. A JSON manifest (`index.json`) tracks all downloaded files with their metadata.

Usage

Apply this principle when creating or extending the clean dataset for pickle scanner benchmarking. The clean dataset must contain only legitimate, non-malicious pickle files from real-world ML models to provide a reliable baseline for measuring false positive rates.

Theoretical Basis

The dataset construction follows a filter-download-index pipeline:

# Abstract algorithm
candidates = load_candidates(json_file)
dataset = []
for file in candidates:
    size = http_head(file.url).content_length
    if min_size <= size <= max_size:
        content = download(file.url)
        if file.is_pytorch and extract_mode:
            pickles = extract_pickles_from_zip(content)
            dataset.extend(pickles)
        else:
            dataset.append(save(content))

write_index(dataset)  # Manifest for downstream tools

Size filtering is critical because very large files slow benchmarking and very small files may not contain meaningful model data.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment