Principle:Trailofbits Fickling Benchmark Dataset Construction
| Knowledge Sources | |
|---|---|
| Domains | Security, Data_Collection, Pickle_Safety |
| Last Updated | 2026-02-14 14:00 GMT |
Overview
Methodology for constructing representative datasets of real-world pickle and PyTorch model files from public repositories to serve as ground truth for security scanner evaluation.
Description
Benchmark Dataset Construction addresses the need for realistic test data when evaluating pickle security scanners. The approach downloads pickle and PyTorch model files from HuggingFace, applying size-based filtering to exclude files that are too large for practical scanning or too small to be meaningful. The resulting dataset serves as the clean baseline against which scanners are tested for false positives. The construction process supports incremental builds (adding files to existing datasets), size thresholds, and optional extraction of individual pickle files from PyTorch zip archives. A JSON manifest (`index.json`) tracks all downloaded files with their metadata.
Usage
Apply this principle when creating or extending the clean dataset for pickle scanner benchmarking. The clean dataset must contain only legitimate, non-malicious pickle files from real-world ML models to provide a reliable baseline for measuring false positive rates.
Theoretical Basis
The dataset construction follows a filter-download-index pipeline:
# Abstract algorithm
candidates = load_candidates(json_file)
dataset = []
for file in candidates:
size = http_head(file.url).content_length
if min_size <= size <= max_size:
content = download(file.url)
if file.is_pytorch and extract_mode:
pickles = extract_pickles_from_zip(content)
dataset.extend(pickles)
else:
dataset.append(save(content))
write_index(dataset) # Manifest for downstream tools
Size filtering is critical because very large files slow benchmarking and very small files may not contain meaningful model data.