Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Deepseek ai Janus JanusFlow Load Model

From Leeroopedia


Knowledge Sources
Domains Multimodal_AI, Model_Loading
Last Updated 2026-02-10 09:30 GMT

Overview

Concrete tool for loading the JanusFlow multimodal model, processor, and SDXL VAE provided by the Janus repository and diffusers library.

Description

Loading a JanusFlow model requires three separate from_pretrained calls:

  1. MultiModalityCausalLM.from_pretrained(model_path) — loads the JanusFlow model with UViT encoder/decoder, linear aligners, and LLM backbone
  2. VLChatProcessor.from_pretrained(model_path) — loads the JanusFlow processor with tokenizer
  3. AutoencoderKL.from_pretrained(vae_path) — loads the SDXL VAE for pixel decoding

All components must be cast to bfloat16 and moved to CUDA.

Usage

Call these three loading functions at the start of any JanusFlow generation script.

Code Reference

Source Location

  • Repository: Janus
  • File: janus/janusflow/models/modeling_vlm.py:L132-169 (MultiModalityCausalLM.__init__)
  • File: janus/janusflow/models/processing_vlm.py:L72-152 (VLChatProcessor.__init__)
  • Reference: demo/app_janusflow.py:L11-20

Signature

from janus.janusflow.models import MultiModalityCausalLM, VLChatProcessor
from diffusers.models import AutoencoderKL

# Model loading
vl_chat_processor = VLChatProcessor.from_pretrained(model_path: str)
vl_gpt = MultiModalityCausalLM.from_pretrained(model_path: str)
vae = AutoencoderKL.from_pretrained(vae_path: str)

Import

from janus.janusflow.models import MultiModalityCausalLM, VLChatProcessor
from diffusers.models import AutoencoderKL

I/O Contract

Inputs

Name Type Required Description
model_path str Yes HuggingFace model ID (e.g., "deepseek-ai/JanusFlow-1.3B")
vae_path str Yes VAE model ID (e.g., "stabilityai/sdxl-vae")

Outputs

Name Type Description
vl_chat_processor VLChatProcessor Processor with tokenizer and image_gen_tag
vl_gpt MultiModalityCausalLM JanusFlow model with UViT components, in bfloat16 on CUDA
vae AutoencoderKL SDXL VAE decoder, in bfloat16 on CUDA

Usage Examples

Standard JanusFlow Loading

import torch
from janus.janusflow.models import MultiModalityCausalLM, VLChatProcessor
from diffusers.models import AutoencoderKL

model_path = "deepseek-ai/JanusFlow-1.3B"
vl_chat_processor = VLChatProcessor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer

vl_gpt = MultiModalityCausalLM.from_pretrained(model_path)
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()

# SDXL VAE — must use bfloat16 (fp16 not supported for this VAE)
vae = AutoencoderKL.from_pretrained("stabilityai/sdxl-vae")
vae = vae.to(torch.bfloat16).cuda().eval()

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment