Implementation:Trailofbits Fickling Create Malicious Dataset
| Knowledge Sources | |
|---|---|
| Domains | Security, Benchmarking, Pickle_Safety |
| Last Updated | 2026-02-14 14:00 GMT |
Overview
Concrete tool for injecting diverse malicious payloads into clean pickle and PyTorch files to create a synthetic adversarial dataset for benchmarking scanner detection capabilities.
Description
The create_malicious_dataset function takes a clean dataset directory, iterates over its files, and injects randomly selected malicious payloads into each one, writing the results to a new output directory with its own `index.json`. The module defines a catalog of attack payloads in two categories: EXEC_PRIMITIVE_PAYLOADS (shell commands via `os.system`, `subprocess.run`, `builtins.exec`, `numpy` utilities) and DANGEROUS_PRIMITIVE_PAYLOADS (dangerous torch operations like `load_state_dict_from_url`). It uses fickling's `Pickled.insert_python()` to inject payloads into pickle bytecode with randomized import path splitting for evasion variety. For PyTorch archives, inject_pytorch_file opens the zip, injects into the first pickle file inside, and copies the rest unchanged.
Usage
Use this module as the adversarial dataset generation component of the benchmark pipeline. It provides realistic and varied malicious pickle samples that test whether scanners can detect different attack vectors including reverse shells, file exfiltration, SSH key injection, and remote code download. Run as a CLI script or call functions programmatically.
Code Reference
Source Location
- Repository: Trailofbits_Fickling
- File: pickle_scanning_benchmark/inject.py
- Lines: 1-223
Signature
def create_malicious_dataset(
clean_dataset_dir: Path,
outdir: Path,
n: int = 10,
) -> None:
"""
Create a malicious dataset by injecting payloads into clean files.
Args:
clean_dataset_dir: Path to clean dataset directory with index.json.
outdir: Output directory for malicious files.
n: Maximum number of files to inject.
"""
def inject_pickle_file(
infile: Path,
outfile: Path,
payload_key: Optional[str] = None,
) -> str:
"""Inject a payload into a pickle file. Returns the payload ID."""
def inject_pytorch_file(
infile: Path,
outfile: Path,
payload_key: Optional[str] = None,
) -> str:
"""Inject a payload into a PyTorch archive file. Returns the payload ID."""
Import
from pickle_scanning_benchmark.inject import (
create_malicious_dataset,
inject_pickle_file,
inject_pytorch_file,
ALL_PAYLOADS,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| clean_dataset_dir | Path | Yes | Directory with clean dataset index.json |
| outdir | Path | Yes | Output directory for malicious files |
| n | int | No | Maximum number of files to inject (default: 10) |
Outputs
| Name | Type | Description |
|---|---|---|
| outdir/ | Directory | Malicious pickle/PyTorch files with injected payloads |
| outdir/index.json | File | JSON manifest mapping files to their payload types and original sources |
Usage Examples
Command Line Usage
# Create malicious dataset from 50 clean files
python pickle_scanning_benchmark/inject.py /data/clean_pickles /data/malicious_pickles 50
Programmatic Usage
from pathlib import Path
from pickle_scanning_benchmark.inject import create_malicious_dataset
# Generate malicious variants of clean files
create_malicious_dataset(
clean_dataset_dir=Path("/data/clean_pickles"),
outdir=Path("/data/malicious_pickles"),
n=100,
)
Injecting a Specific Payload
from pathlib import Path
from pickle_scanning_benchmark.inject import inject_pickle_file, ALL_PAYLOADS
# List available payloads
print(list(ALL_PAYLOADS.keys()))
# Inject a specific payload into a single file
payload_id = inject_pickle_file(
infile=Path("clean_model.pkl"),
outfile=Path("malicious_model.pkl"),
payload_key="os.system reverse shell cmd",
)
print(f"Injected payload: {payload_id}")