Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Trailofbits Fickling Create Malicious Dataset

From Leeroopedia
Revision as of 13:57, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Trailofbits_Fickling_Create_Malicious_Dataset.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Security, Benchmarking, Pickle_Safety
Last Updated 2026-02-14 14:00 GMT

Overview

Concrete tool for injecting diverse malicious payloads into clean pickle and PyTorch files to create a synthetic adversarial dataset for benchmarking scanner detection capabilities.

Description

The create_malicious_dataset function takes a clean dataset directory, iterates over its files, and injects randomly selected malicious payloads into each one, writing the results to a new output directory with its own `index.json`. The module defines a catalog of attack payloads in two categories: EXEC_PRIMITIVE_PAYLOADS (shell commands via `os.system`, `subprocess.run`, `builtins.exec`, `numpy` utilities) and DANGEROUS_PRIMITIVE_PAYLOADS (dangerous torch operations like `load_state_dict_from_url`). It uses fickling's `Pickled.insert_python()` to inject payloads into pickle bytecode with randomized import path splitting for evasion variety. For PyTorch archives, inject_pytorch_file opens the zip, injects into the first pickle file inside, and copies the rest unchanged.

Usage

Use this module as the adversarial dataset generation component of the benchmark pipeline. It provides realistic and varied malicious pickle samples that test whether scanners can detect different attack vectors including reverse shells, file exfiltration, SSH key injection, and remote code download. Run as a CLI script or call functions programmatically.

Code Reference

Source Location

Signature

def create_malicious_dataset(
    clean_dataset_dir: Path,
    outdir: Path,
    n: int = 10,
) -> None:
    """
    Create a malicious dataset by injecting payloads into clean files.

    Args:
        clean_dataset_dir: Path to clean dataset directory with index.json.
        outdir: Output directory for malicious files.
        n: Maximum number of files to inject.
    """

def inject_pickle_file(
    infile: Path,
    outfile: Path,
    payload_key: Optional[str] = None,
) -> str:
    """Inject a payload into a pickle file. Returns the payload ID."""

def inject_pytorch_file(
    infile: Path,
    outfile: Path,
    payload_key: Optional[str] = None,
) -> str:
    """Inject a payload into a PyTorch archive file. Returns the payload ID."""

Import

from pickle_scanning_benchmark.inject import (
    create_malicious_dataset,
    inject_pickle_file,
    inject_pytorch_file,
    ALL_PAYLOADS,
)

I/O Contract

Inputs

Name Type Required Description
clean_dataset_dir Path Yes Directory with clean dataset index.json
outdir Path Yes Output directory for malicious files
n int No Maximum number of files to inject (default: 10)

Outputs

Name Type Description
outdir/ Directory Malicious pickle/PyTorch files with injected payloads
outdir/index.json File JSON manifest mapping files to their payload types and original sources

Usage Examples

Command Line Usage

# Create malicious dataset from 50 clean files
python pickle_scanning_benchmark/inject.py /data/clean_pickles /data/malicious_pickles 50

Programmatic Usage

from pathlib import Path
from pickle_scanning_benchmark.inject import create_malicious_dataset

# Generate malicious variants of clean files
create_malicious_dataset(
    clean_dataset_dir=Path("/data/clean_pickles"),
    outdir=Path("/data/malicious_pickles"),
    n=100,
)

Injecting a Specific Payload

from pathlib import Path
from pickle_scanning_benchmark.inject import inject_pickle_file, ALL_PAYLOADS

# List available payloads
print(list(ALL_PAYLOADS.keys()))

# Inject a specific payload into a single file
payload_id = inject_pickle_file(
    infile=Path("clean_model.pkl"),
    outfile=Path("malicious_model.pkl"),
    payload_key="os.system reverse shell cmd",
)
print(f"Injected payload: {payload_id}")

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment