Principle:Trailofbits Fickling Pickle Dataset Analysis

Knowledge Sources	Trailofbits_Fickling
Domains	Security, Data_Analysis, Pickle_Safety
Last Updated	2026-02-14 14:00 GMT

Overview

Analytical methodology for extracting and aggregating statistical properties from collections of pickle files to characterize real-world pickle usage patterns.

Description

Pickle Dataset Analysis inspects a corpus of pickle files to extract metadata about their contents, particularly the Python import statements embedded in pickle bytecode. By decompiling each pickle file's opcodes into an AST representation, the analysis identifies which modules and functions are being serialized (e.g., `torch.nn.Linear`, `numpy.array`). Combined with external metadata such as HuggingFace download counts for source models, this provides a comprehensive view of how pickle files are used in practice across the ML ecosystem. This data informs allowlist construction and helps researchers understand which import patterns are benign versus suspicious.

Usage

Apply this principle when building or validating an ML pickle allowlist, characterizing the attack surface of pickle-based model distribution, or understanding the composition of a benchmark dataset before running scanner evaluations.

Theoretical Basis

The analysis follows an extract-aggregate-export pipeline:

# Abstract algorithm
stats = {}
for file in dataset:
    pickled = decompile(file)        # Parse pickle bytecode to AST
    imports = extract_imports(pickled) # Extract import statements
    for imp in imports:
        stats[imp] = stats.get(imp, 0) + 1  # Aggregate frequency

# Sort by frequency and export
stats = sorted(stats, by_frequency)
export_csv(stats)

The import extraction leverages fickling's pickle-to-AST decompilation, which converts GLOBAL and STACK_GLOBAL opcodes into Python import statements.

Related Pages

Implementation:Trailofbits_Fickling_Get_Stats

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment