Implementation:Deepseek ai Janus Load Pretrained Model

Knowledge Sources	Janus
Domains	Multimodal_AI, Model_Loading
Last Updated	2026-02-10 09:30 GMT

Overview

Concrete tool for loading a Janus multimodal model, processor, and tokenizer provided by the Janus repository utilities.

Description

The load_pretrained_model function is a convenience wrapper that loads all three components needed for Janus inference in a single call. Internally it:

Creates a VLChatProcessor via from_pretrained (which loads the tokenizer and image processor)
Loads the MultiModalityCausalLM via AutoModelForCausalLM.from_pretrained with trust_remote_code=True
Casts the model to bfloat16, moves to CUDA, and sets eval mode

Usage

Import this function when you need to set up a Janus model for inference. It returns the tokenizer, processor, and model as a tuple. This is the recommended entry point for all Janus inference scripts.

Code Reference

Source Location

Repository: Janus
File: janus/utils/io.py
Lines: L32-41

Signature

def load_pretrained_model(model_path: str):
    """
    Load pretrained Janus model, processor, and tokenizer.

    Args:
        model_path (str): HuggingFace model ID or local path
            (e.g., "deepseek-ai/Janus-1.3B")

    Returns:
        Tuple[LlamaTokenizerFast, VLChatProcessor, MultiModalityCausalLM]:
            tokenizer, vl_chat_processor, vl_gpt (in bfloat16 on CUDA, eval mode)
    """

Import

from janus.utils.io import load_pretrained_model

I/O Contract

Inputs

Name	Type	Required	Description
model_path	str	Yes	HuggingFace model ID (e.g., "deepseek-ai/Janus-1.3B") or local directory path

Outputs

Name	Type	Description
tokenizer	LlamaTokenizerFast	Tokenizer for encoding/decoding text
vl_chat_processor	VLChatProcessor	Combined processor with tokenizer + image processor
vl_gpt	MultiModalityCausalLM	Multimodal model in bfloat16 on CUDA in eval mode

Usage Examples

Basic Loading

from janus.utils.io import load_pretrained_model

# Load model from HuggingFace Hub
tokenizer, vl_chat_processor, vl_gpt = load_pretrained_model("deepseek-ai/Janus-1.3B")

# Model is ready for inference
# tokenizer: LlamaTokenizerFast
# vl_chat_processor: VLChatProcessor (with .tokenizer and .image_processor)
# vl_gpt: MultiModalityCausalLM (bfloat16, CUDA, eval mode)

Manual Loading (Without Utility)

import torch
from transformers import AutoModelForCausalLM
from janus.models import MultiModalityCausalLM, VLChatProcessor

model_path = "deepseek-ai/Janus-1.3B"
vl_chat_processor = VLChatProcessor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer

vl_gpt = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment