Principle:Deepseek ai Janus Image Loading and Preprocessing
| Knowledge Sources | |
|---|---|
| Domains | Computer_Vision, Preprocessing |
| Last Updated | 2026-02-10 09:30 GMT |
Overview
A procedure for loading images from file paths or base64 strings and converting them to a standardized PIL format for downstream vision processing.
Description
Image loading and preprocessing extracts image data from conversation message dictionaries. Each message may contain an images field with file paths or base64-encoded data URIs. The images must be loaded into PIL Image objects and converted to RGB format before being passed to the vision encoder.
This step is distinct from the lower-level image normalization (resize, rescale, normalize) performed by the VLMImageProcessor, which happens internally during tokenization. This principle covers only the initial loading from disk or memory.
Usage
Use this principle in the multimodal understanding pipeline when conversation messages contain image references. It is called after constructing the conversation dict and before passing images to the processor.
Theoretical Basis
The loading supports two input formats:
- File path: Direct path to an image file on disk — opened via PIL.Image.open(path)
- Base64 data URI: String starting with "data:image" — decoded from base64, then opened from a bytes buffer
All loaded images are converted to RGB mode via .convert("RGB") to ensure consistent 3-channel input regardless of the source format (RGBA, grayscale, palette, etc.).