Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Deepseek ai Janus JanusFlow Diffusers Environment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Deep_Learning, Generative_Models
Last Updated 2026-02-10 09:30 GMT

Overview

Extended environment for JanusFlow rectified flow image generation, requiring the diffusers library and the Stability AI SDXL VAE model.

Description

JanusFlow uses a rectified flow ODE solver for image generation instead of the autoregressive VQ-token approach used by Janus/Janus-Pro. This requires the diffusers library to load the AutoencoderKL (SDXL VAE) model from Stability AI. The SDXL VAE operates in a continuous latent space (4-channel, 48x48) rather than discrete VQ tokens. This environment extends the base CUDA GPU environment with the additional diffusers dependency.

Usage

Use this environment specifically for the Rectified Flow Image Generation workflow (JanusFlow). It is not required for Multimodal Understanding or Autoregressive Image Generation workflows. The SDXL VAE must be loaded separately from the main JanusFlow model.

System Requirements

Category Requirement Notes
OS Linux (Ubuntu recommended) Same as base CUDA GPU environment
Hardware NVIDIA GPU with bfloat16 support SDXL VAE requires bfloat16; fp16 is explicitly unsupported
VRAM Additional ~2GB for SDXL VAE On top of JanusFlow model VRAM requirements
Network Internet access for initial model download Downloads `stabilityai/sdxl-vae` from HuggingFace Hub

Dependencies

System Packages

Python Packages

Credentials

No additional credentials are required. The SDXL VAE model (stabilityai/sdxl-vae) is publicly available on HuggingFace Hub.

Quick Install

# Install base Janus dependencies
pip install -e .

# Install diffusers for JanusFlow
pip install diffusers[torch]

Code Evidence

SDXL VAE loading with bfloat16 requirement from `demo/app_janusflow.py:18-20`:

# remember to use bfloat16 dtype, this vae doesn't work with fp16
vae = AutoencoderKL.from_pretrained("stabilityai/sdxl-vae")
vae = vae.to(torch.bfloat16).to(cuda_device).eval()

Diffusers import from `demo/app_janusflow.py:5`:

from diffusers.models import AutoencoderKL

VAE decode with scaling factor from `demo/app_janusflow.py:134`:

decoded_image = vae.decode(z / vae.config.scaling_factor).sample

README installation instructions for JanusFlow from `README.md:513-514`:

pip install -e .
pip install diffusers[torch]

Common Errors

Error Message Cause Solution
`ImportError: No module named 'diffusers'` diffusers not installed `pip install diffusers[torch]`
Model produces corrupted/black images VAE loaded with fp16 instead of bfloat16 Ensure VAE is loaded with `torch.bfloat16` — fp16 is explicitly unsupported
`OSError: stabilityai/sdxl-vae not found` No internet access or HuggingFace Hub unreachable Download model manually and provide local path

Compatibility Notes

  • bfloat16 only: The SDXL VAE is explicitly documented as incompatible with fp16. The comment in `demo/app_janusflow.py:18` states: "remember to use bfloat16 dtype, this vae doesn't work with fp16".
  • Separate model download: The SDXL VAE is loaded separately from the JanusFlow model using `AutoencoderKL.from_pretrained("stabilityai/sdxl-vae")`, not bundled with the main model weights.
  • Latent space dimensions: The VAE operates on 4-channel latent tensors of size 48x48 (for 384x384 output images).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment