Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Trailofbits Fickling Run Benchmark

From Leeroopedia
Knowledge Sources
Domains Security, Benchmarking, Pickle_Safety
Last Updated 2026-02-14 14:00 GMT

Overview

Concrete tool for running comparative accuracy benchmarks of multiple pickle scanning tools against clean and malicious file datasets.

Description

The run_benchmark function is the core evaluation harness in the fickling benchmark suite. It loads clean and malicious pickle file indexes, randomly samples files at a configurable ratio, then runs each registered scanning tool on every sampled file. Results are tracked in ToolResults and BenchmarkResults dataclasses that record true positives, true negatives, false positives, false negatives, per-payload-type miss statistics, and scan failure counts. The module also includes wrapper functions for four scanning tools: Fickling, Modelscan, Picklescan, and Model Unpickler.

Usage

Use this module when you need to quantitatively compare pickle security scanning tools on realistic datasets. It is invoked as a CLI script with paths to clean and malicious dataset directories, or imported to call `run_benchmark()` programmatically with custom tool registrations.

Code Reference

Source Location

Signature

def run_benchmark(
    clean_dataset_dir: Path,
    malicious_dataset_dir: Path,
    tools: dict,
    n: int = 10000,
    clean_to_malicious_ratio: float = 2.0,
) -> None:
    """
    Run benchmark comparing scanning tools on clean and malicious datasets.

    Args:
        clean_dataset_dir: Path to directory containing clean file index.json.
        malicious_dataset_dir: Path to directory containing malicious file index.json.
        tools: Dict mapping tool names to callable run functions (signature: func(filepath, filetype) -> bool).
        n: Total number of files to sample for the benchmark.
        clean_to_malicious_ratio: Ratio of clean to malicious files in the sample.
    """

Import

from pickle_scanning_benchmark.benchmark import run_benchmark, BenchmarkResults, ToolResults

I/O Contract

Inputs

Name Type Required Description
clean_dataset_dir Path Yes Directory with clean dataset index.json
malicious_dataset_dir Path Yes Directory with malicious dataset index.json
tools dict Yes Map of tool name to callable `func(filepath, filetype) -> bool`
n int No Total files to sample (default: 10000)
clean_to_malicious_ratio float No Ratio of clean to malicious files (default: 2.0)

Outputs

Name Type Description
stdout str Formatted benchmark results printed to console
BenchmarkResults dataclass Contains per-tool ToolResults with TP/TN/FP/FN counts and payload-type miss stats

Usage Examples

Running from Command Line

python pickle_scanning_benchmark/benchmark.py /path/to/clean_dataset /path/to/malicious_dataset

Programmatic Usage

from pathlib import Path
from pickle_scanning_benchmark.benchmark import run_benchmark, run_fickling

# Define tools to benchmark
tools = {
    "Fickling": run_fickling,
}

# Run benchmark
run_benchmark(
    clean_dataset_dir=Path("/data/clean_pickles"),
    malicious_dataset_dir=Path("/data/malicious_pickles"),
    tools=tools,
    n=1000,
    clean_to_malicious_ratio=2.0,
)

Registering a Custom Scanner

def run_custom_scanner(filepath: str, filetype: str) -> bool:
    """Return True if file is considered safe, False otherwise."""
    # Custom scanning logic here
    return True

tools = {
    "Fickling": run_fickling,
    "CustomScanner": run_custom_scanner,
}
run_benchmark(Path("clean"), Path("malicious"), tools)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment