Principle:Deepseek ai Janus Image Loading and Preprocessing

Knowledge Sources	Janus
Domains	Computer_Vision, Preprocessing
Last Updated	2026-02-10 09:30 GMT

Overview

A procedure for loading images from file paths or base64 strings and converting them to a standardized PIL format for downstream vision processing.

Description

Image loading and preprocessing extracts image data from conversation message dictionaries. Each message may contain an images field with file paths or base64-encoded data URIs. The images must be loaded into PIL Image objects and converted to RGB format before being passed to the vision encoder.

This step is distinct from the lower-level image normalization (resize, rescale, normalize) performed by the VLMImageProcessor, which happens internally during tokenization. This principle covers only the initial loading from disk or memory.

Usage

Use this principle in the multimodal understanding pipeline when conversation messages contain image references. It is called after constructing the conversation dict and before passing images to the processor.

Theoretical Basis

The loading supports two input formats:

File path: Direct path to an image file on disk — opened via PIL.Image.open(path)
Base64 data URI: String starting with "data:image" — decoded from base64, then opened from a bytes buffer

All loaded images are converted to RGB mode via .convert("RGB") to ensure consistent 3-channel input regardless of the source format (RGBA, grayscale, palette, etc.).

Related Pages

Implemented By

Implementation:Deepseek_ai_Janus_Load_Pil_Images

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment