Workflow:Protectai Modelscan Programmatic Model Scanning
| Knowledge Sources | |
|---|---|
| Domains | ML_Security, Model_Scanning, Python_API |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
End-to-end process for integrating ModelScan into Python applications and MLOps pipelines using the programmatic API.
Description
This workflow describes how to use the ModelScan Python API to scan ML model files from within application code, automated pipelines, or CI/CD systems. Rather than invoking the CLI, developers instantiate the ModelScan class directly, call its scan() method, and programmatically inspect the results including issues grouped by severity, errors, and skipped files. This approach enables custom scan logic, conditional pipeline gates, integration with logging/alerting systems, and dynamic settings configuration. The API returns structured results as a Python dictionary and supports custom reporting modules.
Usage
Execute this workflow when you need to embed model scanning into an automated system rather than running it manually. Common scenarios include: MLOps pipeline gates that block unsafe models from progressing, CI/CD checks that validate model artifacts before deployment, custom applications that accept model uploads and must verify safety, and batch scanning systems that process multiple models and aggregate results.
Execution Steps
Step 1: Import and Initialize
Import the ModelScan class and DEFAULT_SETTINGS from the modelscan package. Create an instance of ModelScan by passing settings (either the defaults or a customized copy). During initialization, the constructor dynamically loads all enabled scanner classes using importlib based on the settings dictionary, and builds the middleware pipeline that handles format detection.
Key considerations:
- The ModelScan constructor loads scanners and middlewares immediately
- Scanner loading failures are captured as init errors rather than raising exceptions
- Each scanner is a subclass of ScanBase loaded dynamically from its module path
- The middleware pipeline consists of FormatViaExtensionMiddleware by default
Step 2: Configure Custom Settings
Copy and modify DEFAULT_SETTINGS to customize scan behavior. The settings dictionary controls which scanners are enabled and their supported extensions, the unsafe globals list that defines which Python modules and functions trigger issues at each severity level, the reporting module (console or JSON), and the middleware pipeline. Settings can also be loaded from a TOML file and parsed using tomlkit.
Key considerations:
- Always copy DEFAULT_SETTINGS before modifying to avoid side effects
- The scanners key maps fully-qualified scanner class paths to their configuration
- The unsafe_globals key maps severity levels to dictionaries of module-to-function mappings
- Reporting can be switched between console, JSON, or custom modules
- Middleware configuration controls format detection behavior
Step 3: Execute Scan
Call scanner.scan(path) with a string or Path pointing to a model file or directory. The scan method resets internal state, resolves the path, and iterates over all discovered model files. For each file, it opens a Model context manager wrapping the file stream, runs the middleware pipeline to tag the file with its format, and dispatches it to all registered scanners. Zip archives are automatically opened and their contents scanned individually. The method returns a structured results dictionary containing summary counts, issue details, errors, and skipped files.
Key considerations:
- The scan() method resets issues, errors, and skipped lists on each call
- Directory paths cause recursive file discovery
- Zip files (.zip, .npz) are automatically extracted and inner files scanned
- Nested zip files are not supported and produce NestedZipError
- The returned dictionary contains summary, issues, errors, and skipped keys
Step 4: Inspect Issues
Access the scan results through the ModelScan instance properties. The issues property returns an Issues object containing all detected problems. Call group_by_severity() to organize issues into CRITICAL, HIGH, MEDIUM, and LOW buckets. Each issue contains an OperatorIssueDetails object with the unsafe module name, operator name, source file path, and severity level. The errors property lists any scanner failures, and skipped lists files that could not be scanned.
Key considerations:
- scanner.issues.all_issues provides the flat list of all issues
- scanner.issues.group_by_severity() returns a dictionary keyed by severity name
- Each issue has code, severity, and details attributes
- OperatorIssueDetails provides module, operator, source, and severity fields
- The scanned property lists all successfully scanned file paths
Step 5: Generate Report or Process Results
Either call scanner.generate_report() to use the configured reporting module (ConsoleReport or JSONReport), or process the results dictionary directly for custom handling. The JSON report includes machine-readable output suitable for storage, comparison, or forwarding to security dashboards. For pipeline integration, use the presence and severity of issues to make pass/fail decisions programmatically.
Key considerations:
- generate_report() dynamically loads the reporting module from settings
- ConsoleReport uses the rich library for formatted terminal output
- JSONReport outputs structured JSON and optionally writes to a file
- Custom reporting modules can be created by subclassing Report and configuring in settings
- The results dictionary from scan() can be serialized directly for custom processing