Environment:Deepseek ai Janus CUDA GPU Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Deep_Learning, Computer_Vision |
| Last Updated | 2026-02-10 09:30 GMT |
Overview
NVIDIA CUDA GPU environment with bfloat16 support required for running Janus multimodal understanding and image generation models.
Description
This environment provides the GPU-accelerated context required by all Janus model variants (Janus-1.3B, Janus-Pro-7B, JanusFlow-1.3B). The codebase extensively uses torch.cuda for tensor operations, model placement, and inference. All inference scripts call .cuda() directly on model weights and intermediate tensors. The code checks torch.cuda.is_available() at startup to select between CUDA (with bfloat16) and CPU (with float16) execution paths, but all primary workflows assume CUDA availability.
Usage
Use this environment for all Janus workflows: Multimodal Understanding, Autoregressive Image Generation, and Rectified Flow Image Generation. Every inference script and Gradio demo requires GPU acceleration. CPU fallback exists in some demo scripts but is not the intended execution path.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (Ubuntu recommended) | Tested with Python >= 3.8 |
| Hardware | NVIDIA GPU with bfloat16 support | Ampere (A100) or newer recommended; bfloat16 required for optimal precision |
| VRAM | Minimum 8GB (1.3B models), 24GB+ (7B models) | Janus-Pro-7B requires significantly more VRAM |
| CUDA | CUDA toolkit compatible with PyTorch >= 2.0.1 | Required for torch.cuda operations |
Dependencies
System Packages
- NVIDIA GPU driver compatible with CUDA toolkit
- CUDA toolkit (version compatible with torch >= 2.0.1)
Python Packages
- torch >= 2.0.1
- transformers >= 4.38.2
- timm >= 0.9.16
- accelerate
- sentencepiece
- attrdict
- einops
- numpy
- Pillow (PIL)
Credentials
No credentials are required. Models are loaded from HuggingFace public model hub (deepseek-ai/Janus-1.3B, deepseek-ai/Janus-Pro-7B, deepseek-ai/JanusFlow-1.3B) without authentication.
Quick Install
# Install core dependencies
pip install -e .
# Or install manually
pip install torch>=2.0.1 transformers>=4.38.2 timm>=0.9.16 accelerate sentencepiece attrdict einops numpy Pillow
Code Evidence
CUDA availability check and device selection from `demo/app.py:22`:
cuda_device = 'cuda' if torch.cuda.is_available() else 'cpu'
Model loading with bfloat16 and CUDA placement from `inference.py:34`:
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()
Conditional dtype selection (bfloat16 on CUDA, float16 on CPU) from `demo/app_januspro.py:22-25`:
if torch.cuda.is_available():
vl_gpt = vl_gpt.to(torch.bfloat16).cuda()
else:
vl_gpt = vl_gpt.to(torch.float16)
Direct CUDA tensor creation from `generation_inference.py:69`:
tokens = torch.zeros((parallel_size*2, len(input_ids)), dtype=torch.int).cuda()
Python 3.10+ compatibility patch from `janus/__init__.py:24-31`:
if sys.version_info >= (3, 10):
print("Python version is above 3.10, patching the collections module.")
import collections
import collections.abc
for type_name in collections.abc.__all__:
setattr(collections, type_name, getattr(collections.abc, type_name))
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `RuntimeError: No CUDA GPUs are available` | No NVIDIA GPU detected | Ensure NVIDIA drivers and CUDA toolkit are installed; verify with `nvidia-smi` |
| `RuntimeError: CUDA out of memory` | Insufficient VRAM for model size | Use a smaller model (1.3B instead of 7B), or reduce `parallel_size` in image generation |
| `AttributeError: module 'collections' has no attribute 'MutableMapping'` | Python 3.10+ removed collections ABC classes | Ensure `import janus` is called before other imports (it patches collections) |
Compatibility Notes
- CPU fallback: Demo scripts (app.py, app_januspro.py, fastapi_app.py) include CPU fallback with float16, but this is significantly slower and not the intended execution path.
- bfloat16 requirement: The SDXL VAE used by JanusFlow specifically requires bfloat16 and does not work with float16 (documented in `demo/app_janusflow.py:18`).
- Python 3.10+: Both `janus/__init__.py` and `janus/janusflow/__init__.py` include a monkey-patch for the collections module to handle Python 3.10+ deprecations.
- Eager attention: Demo scripts explicitly set `language_config._attn_implementation = 'eager'` to avoid flash attention requirements.
Related Pages
- Implementation:Deepseek_ai_Janus_Load_Pretrained_Model
- Implementation:Deepseek_ai_Janus_Prepare_Inputs_Embeds
- Implementation:Deepseek_ai_Janus_LlamaForCausalLM_Generate
- Implementation:Deepseek_ai_Janus_AR_Token_Generation_Loop
- Implementation:Deepseek_ai_Janus_VQModel_Decode_Code
- Implementation:Deepseek_ai_Janus_CFG_Input_Preparation_AR
- Implementation:Deepseek_ai_Janus_Image_Post_Processing_AR
- Implementation:Deepseek_ai_Janus_JanusFlow_Load_Model
- Implementation:Deepseek_ai_Janus_CFG_Input_Preparation_Flow
- Implementation:Deepseek_ai_Janus_Noise_Initialization
- Implementation:Deepseek_ai_Janus_ODE_Denoising_Loop
- Implementation:Deepseek_ai_Janus_Image_Post_Processing_Flow