Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Deepseek ai Janus Load Pretrained Model

From Leeroopedia


Knowledge Sources
Domains Multimodal_AI, Model_Loading
Last Updated 2026-02-10 09:30 GMT

Overview

Concrete tool for loading a Janus multimodal model, processor, and tokenizer provided by the Janus repository utilities.

Description

The load_pretrained_model function is a convenience wrapper that loads all three components needed for Janus inference in a single call. Internally it:

  1. Creates a VLChatProcessor via from_pretrained (which loads the tokenizer and image processor)
  2. Loads the MultiModalityCausalLM via AutoModelForCausalLM.from_pretrained with trust_remote_code=True
  3. Casts the model to bfloat16, moves to CUDA, and sets eval mode

Usage

Import this function when you need to set up a Janus model for inference. It returns the tokenizer, processor, and model as a tuple. This is the recommended entry point for all Janus inference scripts.

Code Reference

Source Location

  • Repository: Janus
  • File: janus/utils/io.py
  • Lines: L32-41

Signature

def load_pretrained_model(model_path: str):
    """
    Load pretrained Janus model, processor, and tokenizer.

    Args:
        model_path (str): HuggingFace model ID or local path
            (e.g., "deepseek-ai/Janus-1.3B")

    Returns:
        Tuple[LlamaTokenizerFast, VLChatProcessor, MultiModalityCausalLM]:
            tokenizer, vl_chat_processor, vl_gpt (in bfloat16 on CUDA, eval mode)
    """

Import

from janus.utils.io import load_pretrained_model

I/O Contract

Inputs

Name Type Required Description
model_path str Yes HuggingFace model ID (e.g., "deepseek-ai/Janus-1.3B") or local directory path

Outputs

Name Type Description
tokenizer LlamaTokenizerFast Tokenizer for encoding/decoding text
vl_chat_processor VLChatProcessor Combined processor with tokenizer + image processor
vl_gpt MultiModalityCausalLM Multimodal model in bfloat16 on CUDA in eval mode

Usage Examples

Basic Loading

from janus.utils.io import load_pretrained_model

# Load model from HuggingFace Hub
tokenizer, vl_chat_processor, vl_gpt = load_pretrained_model("deepseek-ai/Janus-1.3B")

# Model is ready for inference
# tokenizer: LlamaTokenizerFast
# vl_chat_processor: VLChatProcessor (with .tokenizer and .image_processor)
# vl_gpt: MultiModalityCausalLM (bfloat16, CUDA, eval mode)

Manual Loading (Without Utility)

import torch
from transformers import AutoModelForCausalLM
from janus.models import MultiModalityCausalLM, VLChatProcessor

model_path = "deepseek-ai/Janus-1.3B"
vl_chat_processor = VLChatProcessor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer

vl_gpt = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment