Principle:Trailofbits Fickling Polyglot File Creation
| Knowledge Sources | |
|---|---|
| Domains | Security_Research, Supply_Chain, File_Format |
| Last Updated | 2026-02-14 14:00 GMT |
Overview
A technique for creating polyglot files that are simultaneously valid in two different PyTorch formats, used for testing format identification and demonstrating supply chain attack surfaces.
Description
A polyglot file is one that can be validly interpreted as two different formats by different parsers. In the PyTorch ecosystem, this is a security concern because:
- torch.load() may interpret the file as PyTorch v1.3 (extracting data.pkl)
- torch.jit.load() may interpret the same file as TorchScript v1.4 (extracting constants.pkl)
- Each interpretation could contain different model data — or different payloads
Polyglot File Creation combines two PyTorch files of different formats into a single file that passes format identification for both. Supported combinations:
- MAR + PyTorch v0.1.10: Append pickle to MAR ZIP
- PyTorch v1.3 + TorchScript v1.4: Add constants.pkl and version to a PyTorch ZIP
- MAR + PyTorch v0.1.1: Append tar to MAR ZIP
Usage
Use this for security research: testing polyglot detection capabilities, demonstrating supply chain attack vectors, and building test datasets for file format validators.
Theoretical Basis
Polyglot creation exploits the fact that different parsers read different parts of a file:
# Pseudocode: Format combination strategies
# Strategy 1: Append (for formats that read different offsets)
polyglot = file_a_bytes + file_b_bytes
# Strategy 2: Merge ZIP entries (for ZIP-based formats)
polyglot = zip_a.add_entries_from(zip_b, ["constants.pkl", "version"])