Principle:Protectai Modelscan Scanner Plugin Architecture
| Knowledge Sources | |
|---|---|
| Domains | ML_Security, Software_Architecture |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
A plugin-based architecture that allows modular, format-specific security scanners to be dynamically loaded, registered, and executed against model files.
Description
The Scanner Plugin Architecture addresses the challenge of supporting multiple, fundamentally different model serialization formats (pickle, HDF5, TensorFlow SavedModel, Keras .keras, NumPy) within a single scanning framework. Each format requires specialized parsing logic — pickle needs bytecode disassembly, HDF5 needs attribute inspection, TensorFlow needs protobuf parsing — but the orchestration logic (file iteration, result aggregation, reporting) is shared.
The architecture defines an abstract ScanBase contract that all scanners must implement. Scanners are registered in the settings dictionary by their fully-qualified Python class path and dynamically loaded at initialization via importlib. Each scanner receives a Model object wrapping a file path and byte stream, and returns either None (if the file format doesn't match) or a ScanResults object containing issues, errors, and skipped entries.
This design enables:
- Extensibility: New scanners can be added without modifying core code
- Configurability: Scanners can be enabled/disabled per deployment
- Isolation: Scanner failures don't crash the pipeline (errors are captured)
- Format dispatch: Scanners self-select based on the model's format context
Usage
Apply this principle when:
- Adding support for a new model serialization format (e.g., ONNX, SafeTensors)
- Understanding how modelscan dispatches files to the correct scanner
- Implementing a custom detection strategy beyond unsafe operator matching
- Extending modelscan for organization-specific model formats
Theoretical Basis
The architecture follows the Template Method pattern combined with dynamic loading:
# Abstract contract (Template Method)
class ScanBase(ABC):
def __init__(self, settings: Dict[str, Any]) -> None:
self._settings = settings
@abstractmethod
def scan(self, model: Model) -> Optional[ScanResults]:
"""
Return None if this scanner doesn't handle the model's format.
Return ScanResults with issues/errors/skipped otherwise.
"""
@abstractmethod
def name() -> str:
"""Short scanner name."""
@abstractmethod
def full_name() -> str:
"""Fully-qualified class path for identification."""
def label_results(self, results: ScanResults) -> ScanResults:
"""Stamp scanner name on all issues for provenance."""
The dispatch mechanism iterates all enabled scanners for every model file:
# Pseudo-code for scanner dispatch
for scanner_class in enabled_scanners:
scanner = scanner_class(settings)
results = scanner.scan(model)
if results is not None:
# Scanner handled this file
aggregate_results(results)
# If None, scanner doesn't handle this format — continue
Key design decisions:
- Null return = format mismatch: Scanners return None to indicate they don't handle the given format, avoiding the need for a separate format-checking step
- Exception isolation: Scanner exceptions are caught and recorded as errors, not propagated
- ScanResults dataclass: A simple container bundling issues, errors, and skipped entries into a single return value