Implementation:Trailofbits Fickling Run Benchmark

Knowledge Sources	Trailofbits_Fickling
Domains	Security, Benchmarking, Pickle_Safety
Last Updated	2026-02-14 14:00 GMT

Overview

Concrete tool for running comparative accuracy benchmarks of multiple pickle scanning tools against clean and malicious file datasets.

Description

The run_benchmark function is the core evaluation harness in the fickling benchmark suite. It loads clean and malicious pickle file indexes, randomly samples files at a configurable ratio, then runs each registered scanning tool on every sampled file. Results are tracked in ToolResults and BenchmarkResults dataclasses that record true positives, true negatives, false positives, false negatives, per-payload-type miss statistics, and scan failure counts. The module also includes wrapper functions for four scanning tools: Fickling, Modelscan, Picklescan, and Model Unpickler.

Usage

Use this module when you need to quantitatively compare pickle security scanning tools on realistic datasets. It is invoked as a CLI script with paths to clean and malicious dataset directories, or imported to call `run_benchmark()` programmatically with custom tool registrations.

Code Reference

Source Location

Repository: Trailofbits_Fickling
File: pickle_scanning_benchmark/benchmark.py
Lines: 1-323

Signature

def run_benchmark(
    clean_dataset_dir: Path,
    malicious_dataset_dir: Path,
    tools: dict,
    n: int = 10000,
    clean_to_malicious_ratio: float = 2.0,
) -> None:
    """
    Run benchmark comparing scanning tools on clean and malicious datasets.

    Args:
        clean_dataset_dir: Path to directory containing clean file index.json.
        malicious_dataset_dir: Path to directory containing malicious file index.json.
        tools: Dict mapping tool names to callable run functions (signature: func(filepath, filetype) -> bool).
        n: Total number of files to sample for the benchmark.
        clean_to_malicious_ratio: Ratio of clean to malicious files in the sample.
    """

Import

from pickle_scanning_benchmark.benchmark import run_benchmark, BenchmarkResults, ToolResults

I/O Contract

Inputs

Name	Type	Required	Description
clean_dataset_dir	Path	Yes	Directory with clean dataset index.json
malicious_dataset_dir	Path	Yes	Directory with malicious dataset index.json
tools	dict	Yes	Map of tool name to callable `func(filepath, filetype) -> bool`
n	int	No	Total files to sample (default: 10000)
clean_to_malicious_ratio	float	No	Ratio of clean to malicious files (default: 2.0)

Outputs

Name	Type	Description
stdout	str	Formatted benchmark results printed to console
BenchmarkResults	dataclass	Contains per-tool ToolResults with TP/TN/FP/FN counts and payload-type miss stats

Usage Examples

Running from Command Line

python pickle_scanning_benchmark/benchmark.py /path/to/clean_dataset /path/to/malicious_dataset

Programmatic Usage

from pathlib import Path
from pickle_scanning_benchmark.benchmark import run_benchmark, run_fickling

# Define tools to benchmark
tools = {
    "Fickling": run_fickling,
}

# Run benchmark
run_benchmark(
    clean_dataset_dir=Path("/data/clean_pickles"),
    malicious_dataset_dir=Path("/data/malicious_pickles"),
    tools=tools,
    n=1000,
    clean_to_malicious_ratio=2.0,
)

Registering a Custom Scanner

def run_custom_scanner(filepath: str, filetype: str) -> bool:
    """Return True if file is considered safe, False otherwise."""
    # Custom scanning logic here
    return True

tools = {
    "Fickling": run_fickling,
    "CustomScanner": run_custom_scanner,
}
run_benchmark(Path("clean"), Path("malicious"), tools)

Related Pages

Implements Principle

Principle:Trailofbits_Fickling_Pickle_Scanner_Benchmarking

Requires Environment

Environment:Trailofbits_Fickling_Python_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment