Implementation:Deepseek ai Janus LlamaForCausalLM Generate
| Knowledge Sources | |
|---|---|
| Domains | NLP, Language_Modeling |
| Last Updated | 2026-02-10 09:30 GMT |
Overview
HuggingFace Transformers LlamaForCausalLM.generate method used as the text decoder in the Janus multimodal understanding pipeline.
Description
The generate method is inherited from HuggingFace's GenerationMixin and provides autoregressive text generation with support for various decoding strategies. In Janus, it receives fused vision-language embeddings (inputs_embeds) rather than raw token IDs, enabling the language model to generate responses conditioned on both text and image context.
The Janus model stores its LlamaForCausalLM as the language_model attribute of MultiModalityCausalLM.
External Reference
Usage
Call vl_gpt.language_model.generate() after obtaining inputs_embeds from prepare_inputs_embeds(). Pass the attention mask from the processor output.
Code Reference
Source Location
- Repository: External — HuggingFace Transformers
- File (model attribute): janus/models/modeling_vlm.py:L219
Signature
# Called via: vl_gpt.language_model.generate(...)
LlamaForCausalLM.generate(
inputs_embeds: torch.Tensor = None, # [b, T, D] fused embeddings
attention_mask: torch.Tensor = None, # [b, T]
pad_token_id: int = None,
bos_token_id: int = None,
eos_token_id: int = None,
max_new_tokens: int = None,
do_sample: bool = False,
temperature: float = 1.0,
top_p: float = 1.0,
use_cache: bool = True,
) -> torch.LongTensor # [b, T_out] generated token IDs
Import
# No direct import needed — accessed via model attribute:
# vl_gpt.language_model.generate(...)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| inputs_embeds | torch.Tensor [b, T, D] | Yes | Fused vision-language embeddings from prepare_inputs_embeds |
| attention_mask | torch.Tensor [b, T] | Yes | Attention mask from VLChatProcessor |
| pad_token_id | int | Yes | Typically tokenizer.eos_token_id |
| bos_token_id | int | No | Beginning of sequence token ID |
| eos_token_id | int | Yes | End of sequence token ID (stops generation) |
| max_new_tokens | int | Yes | Maximum tokens to generate (e.g., 512) |
| do_sample | bool | No | Enable sampling (False = greedy, True = sampling) |
| use_cache | bool | No | Enable KV-cache for efficiency (default True) |
Outputs
| Name | Type | Description |
|---|---|---|
| output_ids | torch.LongTensor [b, T_out] | Generated token IDs (including input tokens) |
Usage Examples
Greedy Decoding
outputs = vl_gpt.language_model.generate(
inputs_embeds=inputs_embeds,
attention_mask=prepare_inputs.attention_mask,
pad_token_id=tokenizer.eos_token_id,
bos_token_id=tokenizer.bos_token_id,
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=512,
do_sample=False,
use_cache=True,
)
Sampling With Temperature
outputs = vl_gpt.language_model.generate(
inputs_embeds=inputs_embeds,
attention_mask=prepare_inputs.attention_mask,
pad_token_id=tokenizer.eos_token_id,
bos_token_id=tokenizer.bos_token_id,
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=512,
do_sample=True,
temperature=0.7,
top_p=0.9,
use_cache=True,
)