Principle:Protectai Modelscan Model File Abstraction

Knowledge Sources	ModelScan
Domains	ML_Security, Software_Architecture
Last Updated	2026-02-14 12:00 GMT

Overview

A uniform file abstraction that wraps both filesystem paths and in-memory byte streams into a single interface, enabling scanners to process top-level files and zip archive entries identically.

Description

Model File Abstraction solves a key challenge in model scanning: ML model files can exist as standalone files on disk, or as entries within zip archives (e.g., PyTorch .pt files, .npz files, .keras files). Without abstraction, every scanner would need separate code paths for filesystem files and zip entries.

The abstraction provides a unified interface with three capabilities:

Source identification: A path-like identifier for the file (filesystem path or "archive:entry" notation)
Stream access: A seekable byte stream for reading file contents
Context metadata: A key-value store for attaching preprocessing results (e.g., detected format)

The abstraction also implements the context manager protocol, automatically opening file streams on entry and closing them on exit, preventing resource leaks during scanning.

Usage

Apply this principle when:

Understanding how modelscan handles both regular files and zip archive contents
Implementing a scanner that needs to read model file bytes
Working with the middleware pipeline that attaches format context to models
Iterating over files in a directory that may contain zip archives

Theoretical Basis

The abstraction follows the Adapter pattern, presenting a uniform interface over two different data sources:

# Pseudo-code for the Model abstraction
class Model:
    def __init__(self, source, stream=None):
        """
        source: Path (for files) or str "archive:entry" (for zip contents)
        stream: None (will open file) or IO[bytes] (pre-opened zip entry)
        """

    def get_source(self) -> Path:
        """Return path identifier."""

    def get_stream(self, offset=0) -> IO[bytes]:
        """Return seekable byte stream positioned at offset."""

    def get_context(self, key) -> Any:
        """Get metadata (e.g., detected format)."""

    def set_context(self, key, value) -> None:
        """Set metadata (used by middleware)."""

The iteration logic in _iterate_models() produces Model objects for both cases:

# Pseudo-code for model iteration
for file in files:
    with Model(file) as model:
        yield model  # Top-level file

        if is_zipfile(file):
            for entry in zip.namelist():
                yield Model(f"{file}:{entry}", zip.open(entry))

This ensures scanners receive a consistent interface regardless of whether the data comes from disk or a zip entry.

Related Pages

Implemented By

Implementation:Protectai_Modelscan_Model

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment