Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Shiyu coder Kronos Autoregressive Token Generation

From Leeroopedia


Field Value
principle_name Autoregressive_Token_Generation
repo Shiyu_coder_Kronos
domains Autoregressive_Models, Token_Generation, Sampling
last_updated 2026-02-09 14:00 GMT
implemented_by Implementation:Shiyu_coder_Kronos_Auto_Regressive_Inference

Summary

Step-by-step generation of hierarchical discrete tokens (s1 coarse, s2 fine) using a sliding context window, temperature-controlled sampling, and multi-sample averaging.

Concept

Autoregressive token generation is the core inference mechanism of the Kronos system. Given an initial sequence of encoded tokens representing historical financial data, the model generates future tokens one at a time. Each newly generated token is appended to the context, and the process repeats until the desired prediction length is reached.

The generation operates on hierarchical tokens: at each timestep, a coarse (s1) token is generated first, then a fine (s2) token is generated conditioned on the s1 prediction. This two-stage process enables the model to first capture broad price movement patterns and then refine the details.

Theory

Hierarchical Two-Stage Sampling

At each generation step t, the process follows this sequence:

1. Run Transformer on current token buffer -> get context representation
2. Extract s1 logits from DualHead for position t
3. Apply temperature scaling: logits_s1 = logits_s1 / T
4. Apply top-k or top-p (nucleus) filtering
5. Sample s1 token from filtered distribution
6. Condition s2 prediction on sampled s1 via DependencyAwareLayer
7. Extract s2 logits from DualHead.cond_forward()
8. Apply temperature scaling and filtering to s2 logits
9. Sample s2 token
10. Append (s1, s2) to the token buffer

Sliding Context Window

The Transformer has a fixed maximum context length (max_context). When the token sequence exceeds this limit, a sliding window is used:

  • A fixed-size buffer of length max_context is maintained.
  • When the buffer is full, tokens are shifted left (oldest token dropped) and the new token is appended at the end.
  • The corresponding temporal features are also windowed to match.

This ensures constant memory usage regardless of prediction length, while still providing the model with the most recent context.

Temperature-Controlled Sampling

The temperature parameter T controls the entropy of the sampling distribution:

  • T < 1.0: Sharper distribution, more deterministic predictions.
  • T = 1.0: Unmodified model probabilities.
  • T > 1.0: Flatter distribution, more diverse/random predictions.

Top-k and Top-p (Nucleus) Filtering

  • Top-k: Only the k most probable tokens are considered. All others have their probability set to zero.
  • Top-p (nucleus): The smallest set of tokens whose cumulative probability exceeds p is kept. This dynamically adjusts the number of candidates based on the distribution shape.

Multi-Sample Averaging

The input sequence is replicated sample_count times along the batch dimension. Each replica independently samples through the generation loop (with different random outcomes due to stochastic sampling). After generation, the decoded continuous values are averaged across samples:

final_prediction = mean(sample_1, sample_2, ..., sample_N)

This reduces the variance inherent in stochastic token sampling and produces more stable predictions.

Source

Domains

  • Autoregressive_Models: Sequential token-by-token generation.
  • Token_Generation: Discrete token sampling from predicted distributions.
  • Sampling: Temperature, top-k, and nucleus sampling strategies.

Related Principles

Heuristic Links

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment