Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Deepseek ai Janus Image Loading and Preprocessing

From Leeroopedia


Knowledge Sources
Domains Computer_Vision, Preprocessing
Last Updated 2026-02-10 09:30 GMT

Overview

A procedure for loading images from file paths or base64 strings and converting them to a standardized PIL format for downstream vision processing.

Description

Image loading and preprocessing extracts image data from conversation message dictionaries. Each message may contain an images field with file paths or base64-encoded data URIs. The images must be loaded into PIL Image objects and converted to RGB format before being passed to the vision encoder.

This step is distinct from the lower-level image normalization (resize, rescale, normalize) performed by the VLMImageProcessor, which happens internally during tokenization. This principle covers only the initial loading from disk or memory.

Usage

Use this principle in the multimodal understanding pipeline when conversation messages contain image references. It is called after constructing the conversation dict and before passing images to the processor.

Theoretical Basis

The loading supports two input formats:

  1. File path: Direct path to an image file on disk — opened via PIL.Image.open(path)
  2. Base64 data URI: String starting with "data:image" — decoded from base64, then opened from a bytes buffer

All loaded images are converted to RGB mode via .convert("RGB") to ensure consistent 3-channel input regardless of the source format (RGBA, grayscale, palette, etc.).

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment