Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Workflow:Protectai Modelscan Custom Scanner Plugin

From Leeroopedia
Knowledge Sources
Domains ML_Security, Model_Scanning, Plugin_Development
Last Updated 2026-02-14 12:00 GMT

Overview

End-to-end process for extending ModelScan by implementing a custom scanner plugin to support new model formats or detection strategies.

Description

This workflow describes how to create a new scanner plugin for ModelScan that detects unsafe patterns in model formats not covered by the built-in scanners, or that applies additional detection logic to already-supported formats. ModelScan uses a plugin-based architecture where scanners are dynamically loaded via importlib based on a settings dictionary. Each scanner is a subclass of ScanBase that implements a scan() method receiving a Model object and returning ScanResults containing issues, errors, and skipped entries. The middleware pipeline tags each model with its format before scanners are invoked, allowing scanners to filter by format and only process relevant files.

Usage

Execute this workflow when you need to scan model formats not supported by the built-in scanners (e.g., ONNX, GGUF, custom proprietary formats), add additional detection rules beyond the default unsafe globals list, or integrate domain-specific security checks for your organization's model artifacts.

Execution Steps

Step 1: Understand the Scanner Architecture

Study the ScanBase abstract base class that all scanners must extend. It defines three abstract methods: name() returning a short identifier, full_name() returning the fully-qualified module path, and scan() which receives a Model and returns an optional ScanResults. The constructor receives the full settings dictionary. Scanners also have an optional handle_binary_dependencies() hook for checking that required libraries are installed, and a label_results() method that stamps the scanner name onto each issue.

Key considerations:

  • ScanBase is defined in modelscan/scanners/scan.py
  • ScanResults holds three lists: issues, errors, and skipped entries
  • Model wraps a file path and byte stream with context manager support
  • The scan() method returns None when the scanner does not handle the given file format
  • Returning None signals to the orchestrator that this scanner skipped the file

Step 2: Implement the Scanner Class

Create a new Python module containing a class that extends ScanBase. Implement name() and full_name() as static methods returning the scanner identifier and fully-qualified class path. Implement scan() to check the model's format context, read the model's byte stream, analyze the content for unsafe patterns, and return a ScanResults object. Use the model's format context (set by the middleware pipeline) to filter files by format before performing any analysis.

Key considerations:

  • Check model.get_context("formats") to determine if the file matches your target format
  • Use model.get_stream() to access the raw byte stream for analysis
  • Use model.get_source() to get the file path for error reporting
  • Create Issue objects with IssueCode.UNSAFE_OPERATOR, an IssueSeverity level, and OperatorIssueDetails
  • Call self.label_results(results) before returning to stamp the scanner name on issues
  • Override handle_binary_dependencies() if your scanner requires external libraries

Step 3: Register the Scanner in Settings

Add the new scanner to the settings dictionary under the scanners key, using the fully-qualified class path as the key. Configure enabled: True and list the supported_extensions the scanner handles. If the scanner defines custom configuration (like unsafe operator lists), include those as additional keys. The scanner can be added to DEFAULT_SETTINGS in code or to a modelscan-settings.toml file for external configuration.

Key considerations:

  • The key must be the exact importable path: module.path.ClassName
  • The enabled flag controls whether the scanner is loaded at initialization
  • The supported_extensions list determines which file extensions trigger the scanner
  • Custom keys (e.g., unsafe_operators) are accessible via self._settings["scanners"][self.full_name()]
  • For TOML-based configuration, the structure mirrors the Python dictionary

Step 4: Register Format Middleware (If Needed)

If your scanner handles a new file format not already recognized by the FormatViaExtensionMiddleware, add a mapping from a new SupportedModelFormats property to the appropriate file extensions in the middleware configuration. This ensures the middleware pipeline tags files with the correct format context before they reach your scanner.

Key considerations:

  • FormatViaExtensionMiddleware maps file extensions to format properties
  • Format properties are instances of Property from modelscan/settings.py
  • The middleware sets model.set_context("formats", ...) which scanners check
  • Multiple formats can map to the same extension (e.g., .pb is tagged as tensorflow)
  • Custom middleware classes can be created by extending MiddlewareBase

Step 5: Test the Scanner

Create test cases that exercise the scanner with both safe and unsafe model files. Generate test fixtures that contain known unsafe patterns for your target format, and verify that the scanner correctly identifies them with appropriate severity levels. Also test with clean files to ensure no false positives, and with unsupported formats to confirm the scanner returns None for irrelevant files.

Key considerations:

  • Reference tests/test_modelscan.py for examples of comprehensive scanner testing
  • Reference tests/test_utils.py for utilities that generate malicious model files
  • Test all severity levels your scanner can produce
  • Test error handling for corrupted or malformed files
  • Test that the scanner correctly skips files outside its supported format

Execution Diagram

GitHub URL

Workflow Repository