Implementation:Shiyu coder Kronos Kronos From Pretrained

Field	Value
implementation_name	Kronos_From_Pretrained
repo	Shiyu_coder_Kronos
type	API Doc
source_file	model/kronos.py:L180-328
class	Kronos(nn.Module, PyTorchModelHubMixin)
implements	Principle:Shiyu_coder_Kronos_Model_Loading
last_updated	2026-02-09 14:00 GMT

Summary

The Kronos.from_pretrained class method loads a pre-trained autoregressive Transformer model from a HuggingFace Hub repository or a local directory path, returning a fully initialized Kronos nn.Module on CPU.

API Signature

Kronos.from_pretrained(
    pretrained_model_name_or_path: str,
    **kwargs
) -> Kronos

Import

from model import Kronos
# or
from model.kronos import Kronos

Parameters

from_pretrained Parameters

Parameter	Type	Description
pretrained_model_name_or_path	str	HuggingFace Hub model ID (e.g., `"NeoQuasar/Kronos-small"`) or local filesystem path.
**kwargs	dict	Additional keyword arguments passed to HuggingFace Hub download and model initialization.

init Parameters (loaded from config)

Parameter	Type	Description
s1_bits	int	Number of bits for coarse (s1) tokens.
s2_bits	int	Number of bits for fine (s2) tokens.
n_layers	int	Number of Transformer blocks.
d_model	int	Dimension of model embeddings and hidden states.
n_heads	int	Number of attention heads in MultiheadAttention layers.
ff_dim	int	Feed-forward network dimension in Transformer blocks.
ffn_dropout_p	float	Dropout probability for feed-forward networks.
attn_dropout_p	float	Dropout probability for attention layers.
resid_dropout_p	float	Dropout probability for residual connections.
token_dropout_p	float	Dropout probability for token embeddings.
learn_te	bool	Whether to use learnable temporal embeddings.

Input

pretrained_model_name_or_path (str): A HuggingFace Hub model ID string (e.g., "NeoQuasar/Kronos-small") or a local filesystem path to a directory containing model weights and configuration.

Output

Kronos: A fully initialized nn.Module instance on CPU with pre-trained weights loaded, ready for inference or further device placement.

Dependencies

torch
huggingface_hub (provides the PyTorchModelHubMixin base class and download utilities)

Architecture

The loaded Kronos model contains:

Kronos
  +-- embedding: HierarchicalEmbedding(s1_bits, s2_bits, d_model)
  +-- time_emb: TemporalEmbedding(d_model, learn_te)
  +-- token_drop: nn.Dropout(token_dropout_p)
  +-- transformer: nn.ModuleList[TransformerBlock x n_layers]
  +-- norm: RMSNorm(d_model)
  +-- dep_layer: DependencyAwareLayer(d_model)
  +-- head: DualHead(s1_bits, s2_bits, d_model)

The DualHead supports two modes:

head(x) produces s1 logits (coarse prediction).
head.cond_forward(x2) produces s2 logits (fine prediction conditioned on s1).

Example

from model import Kronos

# Load from HuggingFace Hub
model = Kronos.from_pretrained("NeoQuasar/Kronos-small")

# Load from a local path
model = Kronos.from_pretrained("/path/to/local/model/")

# Move to GPU for inference
model = model.to("cuda:0")
model.eval()

Source Code Reference

File: model/kronos.py, lines 180-328.

class Kronos(nn.Module, PyTorchModelHubMixin):
    def __init__(self, s1_bits, s2_bits, n_layers, d_model, n_heads, ff_dim,
                 ffn_dropout_p, attn_dropout_p, resid_dropout_p, token_dropout_p, learn_te):
        super().__init__()
        self.s1_vocab_size = 2 ** self.s1_bits
        self.embedding = HierarchicalEmbedding(self.s1_bits, self.s2_bits, self.d_model)
        self.time_emb = TemporalEmbedding(self.d_model, self.learn_te)
        self.transformer = nn.ModuleList([...])
        self.norm = RMSNorm(self.d_model)
        self.dep_layer = DependencyAwareLayer(self.d_model)
        self.head = DualHead(self.s1_bits, self.s2_bits, self.d_model)

Key Methods

forward(s1_ids, s2_ids, stamp, padding_mask, use_teacher_forcing, s1_targets): Full forward pass returning (s1_logits, s2_logits).
decode_s1(s1_ids, s2_ids, stamp, padding_mask): Returns s1_logits and context representation.
decode_s2(context, s1_ids, padding_mask): Returns s2_logits conditioned on context and s1 tokens.

Notes

The model is returned on CPU by default. Move it to the desired device before inference.
The from_pretrained method is inherited from PyTorchModelHubMixin and is not explicitly defined in the class body.
Weight initialization uses Xavier normal for linear layers and scaled normal for embeddings (applied via _init_weights during __init__, then overwritten by loaded checkpoint weights).

Environment & Heuristic Links

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment