Principle:Deepseek ai Janus Autoregressive Text Generation

Knowledge Sources	Attention Is All You Need Janus: Decoupling Visual Encoding
Domains	NLP, Language_Modeling
Last Updated	2026-02-10 09:30 GMT

Overview

A decoding strategy where tokens are generated one at a time, with each new token conditioned on all previously generated tokens and the input context.

Description

Autoregressive text generation is the standard method for producing text from a language model. Given an input sequence of embeddings (which may include fused vision-language features), the model generates output tokens sequentially. At each step, the model predicts a probability distribution over the vocabulary, selects the next token (via greedy decoding, sampling, or other strategies), and appends it to the context for the next prediction.

In Janus, the language backbone is a LlamaForCausalLM model. The generation uses HuggingFace's generate() method, which supports various decoding strategies including greedy (do_sample=False), nucleus sampling (top_p), and temperature scaling.

Usage

Use this principle after vision-language embedding fusion to generate text answers in the multimodal understanding pipeline. The fused inputs_embeds tensor is passed directly to generate() instead of raw token IDs.

Theoretical Basis

Autoregressive generation models the joint probability of the output sequence as:

$P (y_{1}, y_{2}, . . ., y_{T}) = \prod_{t = 1}^{T} P (y_{t} | y_{< t}, x)$

Where x is the input context (including vision embeddings) and each y_t is conditioned on all previous tokens.

Key decoding parameters:

temperature: Scales logits before softmax. Lower = more deterministic, higher = more diverse
top_p (nucleus sampling): Samples from the smallest set of tokens whose cumulative probability exceeds p
max_new_tokens: Maximum number of tokens to generate
KV-cache (use_cache=True): Caches key-value pairs from previous steps for efficient generation

Related Pages

Implemented By

Implementation:Deepseek_ai_Janus_LlamaForCausalLM_Generate

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment