Implementation:Deepseek ai Janus JanusFlow Load Model
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Multimodal_AI, Model_Loading |
| Last Updated | 2026-02-10 09:30 GMT |
Overview
Concrete tool for loading the JanusFlow multimodal model, processor, and SDXL VAE provided by the Janus repository and diffusers library.
Description
Loading a JanusFlow model requires three separate from_pretrained calls:
- MultiModalityCausalLM.from_pretrained(model_path) — loads the JanusFlow model with UViT encoder/decoder, linear aligners, and LLM backbone
- VLChatProcessor.from_pretrained(model_path) — loads the JanusFlow processor with tokenizer
- AutoencoderKL.from_pretrained(vae_path) — loads the SDXL VAE for pixel decoding
All components must be cast to bfloat16 and moved to CUDA.
Usage
Call these three loading functions at the start of any JanusFlow generation script.
Code Reference
Source Location
- Repository: Janus
- File: janus/janusflow/models/modeling_vlm.py:L132-169 (MultiModalityCausalLM.__init__)
- File: janus/janusflow/models/processing_vlm.py:L72-152 (VLChatProcessor.__init__)
- Reference: demo/app_janusflow.py:L11-20
Signature
from janus.janusflow.models import MultiModalityCausalLM, VLChatProcessor
from diffusers.models import AutoencoderKL
# Model loading
vl_chat_processor = VLChatProcessor.from_pretrained(model_path: str)
vl_gpt = MultiModalityCausalLM.from_pretrained(model_path: str)
vae = AutoencoderKL.from_pretrained(vae_path: str)
Import
from janus.janusflow.models import MultiModalityCausalLM, VLChatProcessor
from diffusers.models import AutoencoderKL
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_path | str | Yes | HuggingFace model ID (e.g., "deepseek-ai/JanusFlow-1.3B") |
| vae_path | str | Yes | VAE model ID (e.g., "stabilityai/sdxl-vae") |
Outputs
| Name | Type | Description |
|---|---|---|
| vl_chat_processor | VLChatProcessor | Processor with tokenizer and image_gen_tag |
| vl_gpt | MultiModalityCausalLM | JanusFlow model with UViT components, in bfloat16 on CUDA |
| vae | AutoencoderKL | SDXL VAE decoder, in bfloat16 on CUDA |
Usage Examples
Standard JanusFlow Loading
import torch
from janus.janusflow.models import MultiModalityCausalLM, VLChatProcessor
from diffusers.models import AutoencoderKL
model_path = "deepseek-ai/JanusFlow-1.3B"
vl_chat_processor = VLChatProcessor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer
vl_gpt = MultiModalityCausalLM.from_pretrained(model_path)
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()
# SDXL VAE — must use bfloat16 (fp16 not supported for this VAE)
vae = AutoencoderKL.from_pretrained("stabilityai/sdxl-vae")
vae = vae.to(torch.bfloat16).cuda().eval()
Related Pages
Implements Principle
Requires Environment
Uses Heuristic
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment