Implementation:Trailofbits Fickling Create Malicious Dataset

Knowledge Sources	Trailofbits_Fickling
Domains	Security, Benchmarking, Pickle_Safety
Last Updated	2026-02-14 14:00 GMT

Overview

Concrete tool for injecting diverse malicious payloads into clean pickle and PyTorch files to create a synthetic adversarial dataset for benchmarking scanner detection capabilities.

Description

The create_malicious_dataset function takes a clean dataset directory, iterates over its files, and injects randomly selected malicious payloads into each one, writing the results to a new output directory with its own `index.json`. The module defines a catalog of attack payloads in two categories: EXEC_PRIMITIVE_PAYLOADS (shell commands via `os.system`, `subprocess.run`, `builtins.exec`, `numpy` utilities) and DANGEROUS_PRIMITIVE_PAYLOADS (dangerous torch operations like `load_state_dict_from_url`). It uses fickling's `Pickled.insert_python()` to inject payloads into pickle bytecode with randomized import path splitting for evasion variety. For PyTorch archives, inject_pytorch_file opens the zip, injects into the first pickle file inside, and copies the rest unchanged.

Usage

Use this module as the adversarial dataset generation component of the benchmark pipeline. It provides realistic and varied malicious pickle samples that test whether scanners can detect different attack vectors including reverse shells, file exfiltration, SSH key injection, and remote code download. Run as a CLI script or call functions programmatically.

Code Reference

Source Location

Repository: Trailofbits_Fickling
File: pickle_scanning_benchmark/inject.py
Lines: 1-223

Signature

def create_malicious_dataset(
    clean_dataset_dir: Path,
    outdir: Path,
    n: int = 10,
) -> None:
    """
    Create a malicious dataset by injecting payloads into clean files.

    Args:
        clean_dataset_dir: Path to clean dataset directory with index.json.
        outdir: Output directory for malicious files.
        n: Maximum number of files to inject.
    """

def inject_pickle_file(
    infile: Path,
    outfile: Path,
    payload_key: Optional[str] = None,
) -> str:
    """Inject a payload into a pickle file. Returns the payload ID."""

def inject_pytorch_file(
    infile: Path,
    outfile: Path,
    payload_key: Optional[str] = None,
) -> str:
    """Inject a payload into a PyTorch archive file. Returns the payload ID."""

Import

from pickle_scanning_benchmark.inject import (
    create_malicious_dataset,
    inject_pickle_file,
    inject_pytorch_file,
    ALL_PAYLOADS,
)

I/O Contract

Inputs

Name	Type	Required	Description
clean_dataset_dir	Path	Yes	Directory with clean dataset index.json
outdir	Path	Yes	Output directory for malicious files
n	int	No	Maximum number of files to inject (default: 10)

Outputs

Name	Type	Description
outdir/	Directory	Malicious pickle/PyTorch files with injected payloads
outdir/index.json	File	JSON manifest mapping files to their payload types and original sources

Usage Examples

Command Line Usage

# Create malicious dataset from 50 clean files
python pickle_scanning_benchmark/inject.py /data/clean_pickles /data/malicious_pickles 50

Programmatic Usage

from pathlib import Path
from pickle_scanning_benchmark.inject import create_malicious_dataset

# Generate malicious variants of clean files
create_malicious_dataset(
    clean_dataset_dir=Path("/data/clean_pickles"),
    outdir=Path("/data/malicious_pickles"),
    n=100,
)

Injecting a Specific Payload

from pathlib import Path
from pickle_scanning_benchmark.inject import inject_pickle_file, ALL_PAYLOADS

# List available payloads
print(list(ALL_PAYLOADS.keys()))

# Inject a specific payload into a single file
payload_id = inject_pickle_file(
    infile=Path("clean_model.pkl"),
    outfile=Path("malicious_model.pkl"),
    payload_key="os.system reverse shell cmd",
)
print(f"Injected payload: {payload_id}")

Related Pages

Implements Principle

Principle:Trailofbits_Fickling_Benchmark_Payload_Injection

Requires Environment

Environment:Trailofbits_Fickling_Python_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment