Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Shiyu coder Kronos Kronos From Pretrained

From Leeroopedia


Field Value
implementation_name Kronos_From_Pretrained
repo Shiyu_coder_Kronos
type API Doc
source_file model/kronos.py:L180-328
class Kronos(nn.Module, PyTorchModelHubMixin)
implements Principle:Shiyu_coder_Kronos_Model_Loading
last_updated 2026-02-09 14:00 GMT

Summary

The Kronos.from_pretrained class method loads a pre-trained autoregressive Transformer model from a HuggingFace Hub repository or a local directory path, returning a fully initialized Kronos nn.Module on CPU.

API Signature

Kronos.from_pretrained(
    pretrained_model_name_or_path: str,
    **kwargs
) -> Kronos

Import

from model import Kronos
# or
from model.kronos import Kronos

Parameters

from_pretrained Parameters

Parameter Type Description
pretrained_model_name_or_path str HuggingFace Hub model ID (e.g., "NeoQuasar/Kronos-small") or local filesystem path.
**kwargs dict Additional keyword arguments passed to HuggingFace Hub download and model initialization.

__init__ Parameters (loaded from config)

Parameter Type Description
s1_bits int Number of bits for coarse (s1) tokens.
s2_bits int Number of bits for fine (s2) tokens.
n_layers int Number of Transformer blocks.
d_model int Dimension of model embeddings and hidden states.
n_heads int Number of attention heads in MultiheadAttention layers.
ff_dim int Feed-forward network dimension in Transformer blocks.
ffn_dropout_p float Dropout probability for feed-forward networks.
attn_dropout_p float Dropout probability for attention layers.
resid_dropout_p float Dropout probability for residual connections.
token_dropout_p float Dropout probability for token embeddings.
learn_te bool Whether to use learnable temporal embeddings.

Input

  • pretrained_model_name_or_path (str): A HuggingFace Hub model ID string (e.g., "NeoQuasar/Kronos-small") or a local filesystem path to a directory containing model weights and configuration.

Output

  • Kronos: A fully initialized nn.Module instance on CPU with pre-trained weights loaded, ready for inference or further device placement.

Dependencies

  • torch
  • huggingface_hub (provides the PyTorchModelHubMixin base class and download utilities)

Architecture

The loaded Kronos model contains:

Kronos
  +-- embedding: HierarchicalEmbedding(s1_bits, s2_bits, d_model)
  +-- time_emb: TemporalEmbedding(d_model, learn_te)
  +-- token_drop: nn.Dropout(token_dropout_p)
  +-- transformer: nn.ModuleList[TransformerBlock x n_layers]
  +-- norm: RMSNorm(d_model)
  +-- dep_layer: DependencyAwareLayer(d_model)
  +-- head: DualHead(s1_bits, s2_bits, d_model)

The DualHead supports two modes:

  • head(x) produces s1 logits (coarse prediction).
  • head.cond_forward(x2) produces s2 logits (fine prediction conditioned on s1).

Example

from model import Kronos

# Load from HuggingFace Hub
model = Kronos.from_pretrained("NeoQuasar/Kronos-small")

# Load from a local path
model = Kronos.from_pretrained("/path/to/local/model/")

# Move to GPU for inference
model = model.to("cuda:0")
model.eval()

Source Code Reference

File: model/kronos.py, lines 180-328.

class Kronos(nn.Module, PyTorchModelHubMixin):
    def __init__(self, s1_bits, s2_bits, n_layers, d_model, n_heads, ff_dim,
                 ffn_dropout_p, attn_dropout_p, resid_dropout_p, token_dropout_p, learn_te):
        super().__init__()
        self.s1_vocab_size = 2 ** self.s1_bits
        self.embedding = HierarchicalEmbedding(self.s1_bits, self.s2_bits, self.d_model)
        self.time_emb = TemporalEmbedding(self.d_model, self.learn_te)
        self.transformer = nn.ModuleList([...])
        self.norm = RMSNorm(self.d_model)
        self.dep_layer = DependencyAwareLayer(self.d_model)
        self.head = DualHead(self.s1_bits, self.s2_bits, self.d_model)

Key Methods

  • forward(s1_ids, s2_ids, stamp, padding_mask, use_teacher_forcing, s1_targets): Full forward pass returning (s1_logits, s2_logits).
  • decode_s1(s1_ids, s2_ids, stamp, padding_mask): Returns s1_logits and context representation.
  • decode_s2(context, s1_ids, padding_mask): Returns s2_logits conditioned on context and s1 tokens.

Notes

  • The model is returned on CPU by default. Move it to the desired device before inference.
  • The from_pretrained method is inherited from PyTorchModelHubMixin and is not explicitly defined in the class body.
  • Weight initialization uses Xavier normal for linear layers and scaled normal for embeddings (applied via _init_weights during __init__, then overwritten by loaded checkpoint weights).

Environment & Heuristic Links

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment