Implementation:Shiyu coder Kronos Kronos From Pretrained
Appearance
| Field | Value |
|---|---|
| implementation_name | Kronos_From_Pretrained |
| repo | Shiyu_coder_Kronos |
| type | API Doc |
| source_file | model/kronos.py:L180-328 |
| class | Kronos(nn.Module, PyTorchModelHubMixin) |
| implements | Principle:Shiyu_coder_Kronos_Model_Loading |
| last_updated | 2026-02-09 14:00 GMT |
Summary
The Kronos.from_pretrained class method loads a pre-trained autoregressive Transformer model from a HuggingFace Hub repository or a local directory path, returning a fully initialized Kronos nn.Module on CPU.
API Signature
Kronos.from_pretrained(
pretrained_model_name_or_path: str,
**kwargs
) -> Kronos
Import
from model import Kronos
# or
from model.kronos import Kronos
Parameters
from_pretrained Parameters
| Parameter | Type | Description |
|---|---|---|
| pretrained_model_name_or_path | str | HuggingFace Hub model ID (e.g., "NeoQuasar/Kronos-small") or local filesystem path.
|
| **kwargs | dict | Additional keyword arguments passed to HuggingFace Hub download and model initialization. |
__init__ Parameters (loaded from config)
| Parameter | Type | Description |
|---|---|---|
| s1_bits | int | Number of bits for coarse (s1) tokens. |
| s2_bits | int | Number of bits for fine (s2) tokens. |
| n_layers | int | Number of Transformer blocks. |
| d_model | int | Dimension of model embeddings and hidden states. |
| n_heads | int | Number of attention heads in MultiheadAttention layers. |
| ff_dim | int | Feed-forward network dimension in Transformer blocks. |
| ffn_dropout_p | float | Dropout probability for feed-forward networks. |
| attn_dropout_p | float | Dropout probability for attention layers. |
| resid_dropout_p | float | Dropout probability for residual connections. |
| token_dropout_p | float | Dropout probability for token embeddings. |
| learn_te | bool | Whether to use learnable temporal embeddings. |
Input
- pretrained_model_name_or_path (str): A HuggingFace Hub model ID string (e.g.,
"NeoQuasar/Kronos-small") or a local filesystem path to a directory containing model weights and configuration.
Output
- Kronos: A fully initialized
nn.Moduleinstance on CPU with pre-trained weights loaded, ready for inference or further device placement.
Dependencies
torchhuggingface_hub(provides thePyTorchModelHubMixinbase class and download utilities)
Architecture
The loaded Kronos model contains:
Kronos
+-- embedding: HierarchicalEmbedding(s1_bits, s2_bits, d_model)
+-- time_emb: TemporalEmbedding(d_model, learn_te)
+-- token_drop: nn.Dropout(token_dropout_p)
+-- transformer: nn.ModuleList[TransformerBlock x n_layers]
+-- norm: RMSNorm(d_model)
+-- dep_layer: DependencyAwareLayer(d_model)
+-- head: DualHead(s1_bits, s2_bits, d_model)
The DualHead supports two modes:
head(x)produces s1 logits (coarse prediction).head.cond_forward(x2)produces s2 logits (fine prediction conditioned on s1).
Example
from model import Kronos
# Load from HuggingFace Hub
model = Kronos.from_pretrained("NeoQuasar/Kronos-small")
# Load from a local path
model = Kronos.from_pretrained("/path/to/local/model/")
# Move to GPU for inference
model = model.to("cuda:0")
model.eval()
Source Code Reference
File: model/kronos.py, lines 180-328.
class Kronos(nn.Module, PyTorchModelHubMixin):
def __init__(self, s1_bits, s2_bits, n_layers, d_model, n_heads, ff_dim,
ffn_dropout_p, attn_dropout_p, resid_dropout_p, token_dropout_p, learn_te):
super().__init__()
self.s1_vocab_size = 2 ** self.s1_bits
self.embedding = HierarchicalEmbedding(self.s1_bits, self.s2_bits, self.d_model)
self.time_emb = TemporalEmbedding(self.d_model, self.learn_te)
self.transformer = nn.ModuleList([...])
self.norm = RMSNorm(self.d_model)
self.dep_layer = DependencyAwareLayer(self.d_model)
self.head = DualHead(self.s1_bits, self.s2_bits, self.d_model)
Key Methods
- forward(s1_ids, s2_ids, stamp, padding_mask, use_teacher_forcing, s1_targets): Full forward pass returning (s1_logits, s2_logits).
- decode_s1(s1_ids, s2_ids, stamp, padding_mask): Returns s1_logits and context representation.
- decode_s2(context, s1_ids, padding_mask): Returns s2_logits conditioned on context and s1 tokens.
Notes
- The model is returned on CPU by default. Move it to the desired device before inference.
- The
from_pretrainedmethod is inherited fromPyTorchModelHubMixinand is not explicitly defined in the class body. - Weight initialization uses Xavier normal for linear layers and scaled normal for embeddings (applied via
_init_weightsduring__init__, then overwritten by loaded checkpoint weights).
Environment & Heuristic Links
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment