Principle:Protectai Modelscan Middleware Pipeline
| Knowledge Sources | |
|---|---|
| Domains | ML_Security, Software_Architecture |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
A chain-of-responsibility preprocessing pipeline that enriches model file metadata before scanners process them, enabling format detection and extensible pre-scan transformations.
Description
The Middleware Pipeline principle provides a preprocessing layer between file iteration and scanner dispatch. Before a model file reaches any scanner, it passes through a chain of middleware functions that can inspect and annotate the file with metadata. The primary use case is format detection: the middleware examines the file extension and tags the Model object with its format type (e.g., PICKLE, PYTORCH, TENSORFLOW), so scanners can quickly determine whether they should process the file.
The pipeline follows the chain-of-responsibility pattern: each middleware receives the model and a call_next function. It can modify the model's context, then call call_next to pass control to the next middleware in the chain. This design allows:
- Sequential processing: Middleware executes in registration order
- Short-circuiting: A middleware can choose not to call call_next
- Composability: New middleware can be added without modifying existing ones
- Dynamic loading: Middleware classes are loaded via importlib from settings
Usage
Apply this principle when:
- Understanding how modelscan determines which scanner should handle which file
- Adding a new file format that needs a format tag before scanning
- Implementing custom preprocessing (e.g., content-based format detection, metadata extraction)
- Configuring the middleware pipeline in settings
Theoretical Basis
The middleware pipeline implements a recursive chain-of-responsibility:
# Pseudo-code for middleware pipeline execution
def run(model, index=0):
if index < len(middlewares):
middlewares[index](model, lambda m: run(m, index + 1))
The FormatViaExtensionMiddleware performs extension-based format tagging:
# Pseudo-code for format detection
def __call__(self, model, call_next):
extension = model.get_source().suffix # e.g., ".pkl"
formats = [fmt for fmt, exts in format_map.items() if extension in exts]
if formats:
model.set_context("formats", formats)
call_next(model)
Scanners then check the format context to decide if they should process the file:
# Scanner format checking pattern
formats = model.get_context("formats") or []
if MY_FORMAT not in [f.value for f in formats]:
return None # Not my format
This decouples format detection from scanning logic, allowing either to be modified independently.