Workflow:Protectai Modelscan Programmatic Model Scanning

Knowledge Sources	ModelScan ModelScan README
Domains	ML_Security, Model_Scanning, Python_API
Last Updated	2026-02-14 12:00 GMT

Overview

End-to-end process for integrating ModelScan into Python applications and MLOps pipelines using the programmatic API.

Description

This workflow describes how to use the ModelScan Python API to scan ML model files from within application code, automated pipelines, or CI/CD systems. Rather than invoking the CLI, developers instantiate the ModelScan class directly, call its scan() method, and programmatically inspect the results including issues grouped by severity, errors, and skipped files. This approach enables custom scan logic, conditional pipeline gates, integration with logging/alerting systems, and dynamic settings configuration. The API returns structured results as a Python dictionary and supports custom reporting modules.

Usage

Execute this workflow when you need to embed model scanning into an automated system rather than running it manually. Common scenarios include: MLOps pipeline gates that block unsafe models from progressing, CI/CD checks that validate model artifacts before deployment, custom applications that accept model uploads and must verify safety, and batch scanning systems that process multiple models and aggregate results.

Execution Steps

Step 1: Import and Initialize

Import the ModelScan class and DEFAULT_SETTINGS from the modelscan package. Create an instance of ModelScan by passing settings (either the defaults or a customized copy). During initialization, the constructor dynamically loads all enabled scanner classes using importlib based on the settings dictionary, and builds the middleware pipeline that handles format detection.

Key considerations:

The ModelScan constructor loads scanners and middlewares immediately
Scanner loading failures are captured as init errors rather than raising exceptions
Each scanner is a subclass of ScanBase loaded dynamically from its module path
The middleware pipeline consists of FormatViaExtensionMiddleware by default

Step 2: Configure Custom Settings

Copy and modify DEFAULT_SETTINGS to customize scan behavior. The settings dictionary controls which scanners are enabled and their supported extensions, the unsafe globals list that defines which Python modules and functions trigger issues at each severity level, the reporting module (console or JSON), and the middleware pipeline. Settings can also be loaded from a TOML file and parsed using tomlkit.

Key considerations:

Always copy DEFAULT_SETTINGS before modifying to avoid side effects
The scanners key maps fully-qualified scanner class paths to their configuration
The unsafe_globals key maps severity levels to dictionaries of module-to-function mappings
Reporting can be switched between console, JSON, or custom modules
Middleware configuration controls format detection behavior

Step 3: Execute Scan

Call scanner.scan(path) with a string or Path pointing to a model file or directory. The scan method resets internal state, resolves the path, and iterates over all discovered model files. For each file, it opens a Model context manager wrapping the file stream, runs the middleware pipeline to tag the file with its format, and dispatches it to all registered scanners. Zip archives are automatically opened and their contents scanned individually. The method returns a structured results dictionary containing summary counts, issue details, errors, and skipped files.

Key considerations:

The scan() method resets issues, errors, and skipped lists on each call
Directory paths cause recursive file discovery
Zip files (.zip, .npz) are automatically extracted and inner files scanned
Nested zip files are not supported and produce NestedZipError
The returned dictionary contains summary, issues, errors, and skipped keys

Step 4: Inspect Issues

Access the scan results through the ModelScan instance properties. The issues property returns an Issues object containing all detected problems. Call group_by_severity() to organize issues into CRITICAL, HIGH, MEDIUM, and LOW buckets. Each issue contains an OperatorIssueDetails object with the unsafe module name, operator name, source file path, and severity level. The errors property lists any scanner failures, and skipped lists files that could not be scanned.

Key considerations:

scanner.issues.all_issues provides the flat list of all issues
scanner.issues.group_by_severity() returns a dictionary keyed by severity name
Each issue has code, severity, and details attributes
OperatorIssueDetails provides module, operator, source, and severity fields
The scanned property lists all successfully scanned file paths

Step 5: Generate Report or Process Results

Either call scanner.generate_report() to use the configured reporting module (ConsoleReport or JSONReport), or process the results dictionary directly for custom handling. The JSON report includes machine-readable output suitable for storage, comparison, or forwarding to security dashboards. For pipeline integration, use the presence and severity of issues to make pass/fail decisions programmatically.

Key considerations:

generate_report() dynamically loads the reporting module from settings
ConsoleReport uses the rich library for formatted terminal output
JSONReport outputs structured JSON and optionally writes to a file
Custom reporting modules can be created by subclassing Report and configuring in settings
The results dictionary from scan() can be serialized directly for custom processing

Execution Diagram

GitHub URL

Workflow Repository