Principle:Protectai Modelscan Scanner Plugin Architecture

Knowledge Sources	ModelScan
Domains	ML_Security, Software_Architecture
Last Updated	2026-02-14 12:00 GMT

Overview

A plugin-based architecture that allows modular, format-specific security scanners to be dynamically loaded, registered, and executed against model files.

Description

The Scanner Plugin Architecture addresses the challenge of supporting multiple, fundamentally different model serialization formats (pickle, HDF5, TensorFlow SavedModel, Keras .keras, NumPy) within a single scanning framework. Each format requires specialized parsing logic — pickle needs bytecode disassembly, HDF5 needs attribute inspection, TensorFlow needs protobuf parsing — but the orchestration logic (file iteration, result aggregation, reporting) is shared.

The architecture defines an abstract ScanBase contract that all scanners must implement. Scanners are registered in the settings dictionary by their fully-qualified Python class path and dynamically loaded at initialization via importlib. Each scanner receives a Model object wrapping a file path and byte stream, and returns either None (if the file format doesn't match) or a ScanResults object containing issues, errors, and skipped entries.

This design enables:

Extensibility: New scanners can be added without modifying core code
Configurability: Scanners can be enabled/disabled per deployment
Isolation: Scanner failures don't crash the pipeline (errors are captured)
Format dispatch: Scanners self-select based on the model's format context

Usage

Apply this principle when:

Adding support for a new model serialization format (e.g., ONNX, SafeTensors)
Understanding how modelscan dispatches files to the correct scanner
Implementing a custom detection strategy beyond unsafe operator matching
Extending modelscan for organization-specific model formats

Theoretical Basis

The architecture follows the Template Method pattern combined with dynamic loading:

# Abstract contract (Template Method)
class ScanBase(ABC):
    def __init__(self, settings: Dict[str, Any]) -> None:
        self._settings = settings

    @abstractmethod
    def scan(self, model: Model) -> Optional[ScanResults]:
        """
        Return None if this scanner doesn't handle the model's format.
        Return ScanResults with issues/errors/skipped otherwise.
        """

    @abstractmethod
    def name() -> str:
        """Short scanner name."""

    @abstractmethod
    def full_name() -> str:
        """Fully-qualified class path for identification."""

    def label_results(self, results: ScanResults) -> ScanResults:
        """Stamp scanner name on all issues for provenance."""

The dispatch mechanism iterates all enabled scanners for every model file:

# Pseudo-code for scanner dispatch
for scanner_class in enabled_scanners:
    scanner = scanner_class(settings)
    results = scanner.scan(model)
    if results is not None:
        # Scanner handled this file
        aggregate_results(results)
    # If None, scanner doesn't handle this format — continue

Key design decisions:

Null return = format mismatch: Scanners return None to indicate they don't handle the given format, avoiding the need for a separate format-checking step
Exception isolation: Scanner exceptions are caught and recorded as errors, not propagated
ScanResults dataclass: A simple container bundling issues, errors, and skipped entries into a single return value

Related Pages

Implemented By

Implementation:Protectai_Modelscan_ScanBase

Uses Heuristic

Heuristic:Protectai_Modelscan_Graceful_Scanner_Degradation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment