Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Deepseek ai Janus LlamaForCausalLM Generate

From Leeroopedia


Knowledge Sources
Domains NLP, Language_Modeling
Last Updated 2026-02-10 09:30 GMT

Overview

HuggingFace Transformers LlamaForCausalLM.generate method used as the text decoder in the Janus multimodal understanding pipeline.

Description

The generate method is inherited from HuggingFace's GenerationMixin and provides autoregressive text generation with support for various decoding strategies. In Janus, it receives fused vision-language embeddings (inputs_embeds) rather than raw token IDs, enabling the language model to generate responses conditioned on both text and image context.

The Janus model stores its LlamaForCausalLM as the language_model attribute of MultiModalityCausalLM.

External Reference

Usage

Call vl_gpt.language_model.generate() after obtaining inputs_embeds from prepare_inputs_embeds(). Pass the attention mask from the processor output.

Code Reference

Source Location

Signature

# Called via: vl_gpt.language_model.generate(...)
LlamaForCausalLM.generate(
    inputs_embeds: torch.Tensor = None,    # [b, T, D] fused embeddings
    attention_mask: torch.Tensor = None,   # [b, T]
    pad_token_id: int = None,
    bos_token_id: int = None,
    eos_token_id: int = None,
    max_new_tokens: int = None,
    do_sample: bool = False,
    temperature: float = 1.0,
    top_p: float = 1.0,
    use_cache: bool = True,
) -> torch.LongTensor  # [b, T_out] generated token IDs

Import

# No direct import needed — accessed via model attribute:
# vl_gpt.language_model.generate(...)

I/O Contract

Inputs

Name Type Required Description
inputs_embeds torch.Tensor [b, T, D] Yes Fused vision-language embeddings from prepare_inputs_embeds
attention_mask torch.Tensor [b, T] Yes Attention mask from VLChatProcessor
pad_token_id int Yes Typically tokenizer.eos_token_id
bos_token_id int No Beginning of sequence token ID
eos_token_id int Yes End of sequence token ID (stops generation)
max_new_tokens int Yes Maximum tokens to generate (e.g., 512)
do_sample bool No Enable sampling (False = greedy, True = sampling)
use_cache bool No Enable KV-cache for efficiency (default True)

Outputs

Name Type Description
output_ids torch.LongTensor [b, T_out] Generated token IDs (including input tokens)

Usage Examples

Greedy Decoding

outputs = vl_gpt.language_model.generate(
    inputs_embeds=inputs_embeds,
    attention_mask=prepare_inputs.attention_mask,
    pad_token_id=tokenizer.eos_token_id,
    bos_token_id=tokenizer.bos_token_id,
    eos_token_id=tokenizer.eos_token_id,
    max_new_tokens=512,
    do_sample=False,
    use_cache=True,
)

Sampling With Temperature

outputs = vl_gpt.language_model.generate(
    inputs_embeds=inputs_embeds,
    attention_mask=prepare_inputs.attention_mask,
    pad_token_id=tokenizer.eos_token_id,
    bos_token_id=tokenizer.bos_token_id,
    eos_token_id=tokenizer.eos_token_id,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    use_cache=True,
)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment