Principle:Trailofbits Fickling ML Allowlist Unpickling

Knowledge Sources	Never a dhat moment: Exploiting pickle for python deserialization attacks Fickling
Domains	Security, ML_Safety, Deserialization
Last Updated	2026-02-14 14:00 GMT

Overview

A restricted unpickler that validates every import against a curated machine learning allowlist during deserialization, blocking unauthorized module access at the pickle VM level.

Description

ML Allowlist Unpickling implements the find_class control point of Python's pickle protocol. When the pickle VM encounters a GLOBAL or STACK_GLOBAL opcode (which imports a module and name), the restricted unpickler intercepts the call and checks whether the requested import exists in the ML_ALLOWLIST dictionary. If the module or name is not allowlisted, an UnsafeFileError is raised before the import occurs.

The ML_ALLOWLIST is a curated dictionary mapping module paths to allowed names, covering safe imports from NumPy, PyTorch, Transformers, Ultralytics, and other common ML libraries. Each entry includes a human-readable justification explaining why the import is considered safe.

This principle operates at the deserialization level (runtime), as opposed to static analysis which examines pickle bytecode without executing it.

Usage

Use this principle when you need to actually load (deserialize) a pickle file containing ML model weights while ensuring only known-safe classes are instantiated. This is the mechanism that does the actual safe loading, as opposed to the hook mechanism that activates it.

Theoretical Basis

Python's pickle.Unpickler.find_class(module, name) is the sole control point for restricting imports during deserialization:

# Pseudocode: Allowlist-based find_class override
def find_class(module, name):
    if module not in allowlist:
        raise UnsafeFileError(f"{module}.{name} not allowed")
    if name not in allowlist[module]:
        raise UnsafeFileError(f"{module}.{name} not allowed")
    return super().find_class(module, name)

The allowlist contains approximately 250 entries organized by module, with categories including:

Storage types (torch.FloatStorage, torch.LongStorage) — safe because they redirect to __new__
Tensor reconstruction (torch._utils._rebuild_tensor_v2) — safe with backward_hooks caveat
Training arguments (transformers.training_args.TrainingArguments) — dataclasses, not callables
Binding classes (tokenizers.Tokenizer) — Rust bindings with restricted constructors

Related Pages

Implemented By

Implementation:Trailofbits_Fickling_FicklingMLUnpickler_Load

Uses Heuristic

Heuristic:Trailofbits_Fickling_Allowlist_Maintenance

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment