Principle:Deepseek ai Janus Autoregressive Text Generation
| Knowledge Sources | |
|---|---|
| Domains | NLP, Language_Modeling |
| Last Updated | 2026-02-10 09:30 GMT |
Overview
A decoding strategy where tokens are generated one at a time, with each new token conditioned on all previously generated tokens and the input context.
Description
Autoregressive text generation is the standard method for producing text from a language model. Given an input sequence of embeddings (which may include fused vision-language features), the model generates output tokens sequentially. At each step, the model predicts a probability distribution over the vocabulary, selects the next token (via greedy decoding, sampling, or other strategies), and appends it to the context for the next prediction.
In Janus, the language backbone is a LlamaForCausalLM model. The generation uses HuggingFace's generate() method, which supports various decoding strategies including greedy (do_sample=False), nucleus sampling (top_p), and temperature scaling.
Usage
Use this principle after vision-language embedding fusion to generate text answers in the multimodal understanding pipeline. The fused inputs_embeds tensor is passed directly to generate() instead of raw token IDs.
Theoretical Basis
Autoregressive generation models the joint probability of the output sequence as:
Where x is the input context (including vision embeddings) and each y_t is conditioned on all previous tokens.
Key decoding parameters:
- temperature: Scales logits before softmax. Lower = more deterministic, higher = more diverse
- top_p (nucleus sampling): Samples from the smallest set of tokens whose cumulative probability exceeds p
- max_new_tokens: Maximum number of tokens to generate
- KV-cache (use_cache=True): Caches key-value pairs from previous steps for efficient generation