Principle:Trailofbits Fickling Pickle Bytecode Parsing
| Knowledge Sources | |
|---|---|
| Domains | Security, Reverse_Engineering, Deserialization |
| Last Updated | 2026-02-14 14:00 GMT |
Overview
A parsing technique that transforms raw pickle bytecode into a structured sequence of typed opcode objects, enabling programmatic inspection and manipulation of pickle files without executing them.
Description
Pickle Bytecode Parsing addresses the need to inspect pickle files without deserializing them, which would execute any embedded malicious code. The pickle format is a stack-based virtual machine with over 30 opcodes. Parsing converts the raw byte stream into a structured list of Opcode objects, each carrying its operation info, arguments, raw data bytes, and position in the stream.
This is the foundational step for all pickle analysis — safety checking, decompilation, tracing, and injection all depend on first parsing the bytecode into a manipulable representation.
Usage
Use this principle whenever you need to examine a pickle file's contents without executing it. It is the mandatory first step in both the safety analysis and decompilation workflows.
Theoretical Basis
The pickle protocol defines a stream of opcodes, each with:
- A single-byte opcode identifier
- Optional arguments (integers, strings, bytes) with format determined by the opcode
- A position in the byte stream
Parsing uses pickletools.genops() to iterate the stream:
# Pseudocode for pickle bytecode parsing
for opcode_info, argument, position in pickletools.genops(data):
opcode = Opcode(info=opcode_info, argument=argument, position=position)
opcodes.append(opcode)
The parser handles edge cases:
- Truncated files: Returns partial results when fail_on_decode_error=False
- Invalid opcodes: Sets a flag for downstream analysis
- Multiple pickle streams: Stops at STOP opcode, allowing stacked pickle parsing