Implementation:Deepseek ai Janus Load Pretrained Model
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Multimodal_AI, Model_Loading |
| Last Updated | 2026-02-10 09:30 GMT |
Overview
Concrete tool for loading a Janus multimodal model, processor, and tokenizer provided by the Janus repository utilities.
Description
The load_pretrained_model function is a convenience wrapper that loads all three components needed for Janus inference in a single call. Internally it:
- Creates a VLChatProcessor via from_pretrained (which loads the tokenizer and image processor)
- Loads the MultiModalityCausalLM via AutoModelForCausalLM.from_pretrained with trust_remote_code=True
- Casts the model to bfloat16, moves to CUDA, and sets eval mode
Usage
Import this function when you need to set up a Janus model for inference. It returns the tokenizer, processor, and model as a tuple. This is the recommended entry point for all Janus inference scripts.
Code Reference
Source Location
- Repository: Janus
- File: janus/utils/io.py
- Lines: L32-41
Signature
def load_pretrained_model(model_path: str):
"""
Load pretrained Janus model, processor, and tokenizer.
Args:
model_path (str): HuggingFace model ID or local path
(e.g., "deepseek-ai/Janus-1.3B")
Returns:
Tuple[LlamaTokenizerFast, VLChatProcessor, MultiModalityCausalLM]:
tokenizer, vl_chat_processor, vl_gpt (in bfloat16 on CUDA, eval mode)
"""
Import
from janus.utils.io import load_pretrained_model
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_path | str | Yes | HuggingFace model ID (e.g., "deepseek-ai/Janus-1.3B") or local directory path |
Outputs
| Name | Type | Description |
|---|---|---|
| tokenizer | LlamaTokenizerFast | Tokenizer for encoding/decoding text |
| vl_chat_processor | VLChatProcessor | Combined processor with tokenizer + image processor |
| vl_gpt | MultiModalityCausalLM | Multimodal model in bfloat16 on CUDA in eval mode |
Usage Examples
Basic Loading
from janus.utils.io import load_pretrained_model
# Load model from HuggingFace Hub
tokenizer, vl_chat_processor, vl_gpt = load_pretrained_model("deepseek-ai/Janus-1.3B")
# Model is ready for inference
# tokenizer: LlamaTokenizerFast
# vl_chat_processor: VLChatProcessor (with .tokenizer and .image_processor)
# vl_gpt: MultiModalityCausalLM (bfloat16, CUDA, eval mode)
Manual Loading (Without Utility)
import torch
from transformers import AutoModelForCausalLM
from janus.models import MultiModalityCausalLM, VLChatProcessor
model_path = "deepseek-ai/Janus-1.3B"
vl_chat_processor = VLChatProcessor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer
vl_gpt = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()
Related Pages
Implements Principle
Requires Environment
Uses Heuristic
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment