Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Shiyu coder Kronos CustomFinetuneConfig Init

From Leeroopedia


Field Value
Implementation Name CustomFinetuneConfig_Init
Repository Shiyu_coder_Kronos
Repository URL https://github.com/shiyu-coder/Kronos
Type API Doc
Source File finetune_csv/config_loader.py
Lines L109-267 (CustomFinetuneConfig), L6-106 (ConfigLoader)
Class CustomFinetuneConfig
Implements Principle Principle:Shiyu_coder_Kronos_CSV_Finetuning_Configuration
Dependencies yaml, os
Last Updated 2026-02-09 14:00 GMT

Overview

CustomFinetuneConfig is the primary configuration class for Kronos CSV finetuning. It loads a YAML configuration file via the internal ConfigLoader helper, resolves dynamic paths, and exposes all configuration values as typed Python attributes.

API

from config_loader import CustomFinetuneConfig

config = CustomFinetuneConfig(config_path: str = None) -> CustomFinetuneConfig

Import

from config_loader import CustomFinetuneConfig

Constructor Parameters

Parameter Type Default Description
config_path str None Path to YAML config file. If None, defaults to config.yaml in the same directory as config_loader.py.

Output Attributes

Data Attributes

Attribute Type Default Description
data_path str -- Path to the CSV data file
lookback_window int 512 Number of historical time steps for input context
predict_window int 48 Number of future time steps to predict
max_context int 512 Maximum context length for the model
clip float 5.0 Clipping threshold for normalized data
train_ratio float 0.9 Fraction of data for training
val_ratio float 0.1 Fraction of data for validation
test_ratio float 0.0 Fraction of data for testing

Training Attributes

Attribute Type Default Description
tokenizer_epochs int 30 Training epochs for the tokenizer phase
basemodel_epochs int 30 Training epochs for the predictor phase
batch_size int 160 Batch size for DataLoader
tokenizer_learning_rate float 2e-4 Learning rate for tokenizer training
predictor_learning_rate float 4e-5 Learning rate for predictor training
accumulation_steps int 1 Gradient accumulation steps for tokenizer
adam_weight_decay float 0.1 Weight decay for AdamW optimizer
adam_beta1 float 0.9 AdamW beta1
adam_beta2 float 0.95 AdamW beta2
log_interval int 50 Steps between log messages
num_workers int 6 DataLoader worker count
seed int 100 Random seed

Model Path Attributes

Attribute Type Default Description
pretrained_tokenizer_path str -- Path to pretrained tokenizer checkpoint
pretrained_predictor_path str -- Path to pretrained predictor checkpoint
base_save_path str -- Root directory for saving finetuned models
finetuned_tokenizer_path str -- Path to the finetuned tokenizer (auto-resolved)
exp_name str "default_experiment" Experiment name used for path generation

Computed Path Attributes

Attribute Computation Description
tokenizer_save_path os.path.join(base_save_path, tokenizer_save_name) Directory for tokenizer training outputs
tokenizer_best_model_path os.path.join(tokenizer_save_path, 'best_model') Path to best tokenizer checkpoint
basemodel_save_path os.path.join(base_save_path, basemodel_save_name) Directory for basemodel training outputs
basemodel_best_model_path os.path.join(basemodel_save_path, 'best_model') Path to best basemodel checkpoint

Experiment Attributes

Attribute Type Default Description
train_tokenizer bool True Whether to run the tokenizer training phase
train_basemodel bool True Whether to run the basemodel training phase
skip_existing bool False Skip training if model already exists on disk
pre_trained_tokenizer bool True Load pretrained tokenizer weights (False = random init)
pre_trained_predictor bool True Load pretrained predictor weights (False = random init)
use_comet bool False Enable Comet ML experiment tracking

ConfigLoader Helper (L6-106)

The internal ConfigLoader class handles YAML loading and dynamic path resolution:

class ConfigLoader:
    def __init__(self, config_path: str):
        self.config_path = config_path
        self.config = self._load_config()

    def _load_config(self) -> Dict[str, Any]:
        # Validates file exists, loads YAML, resolves dynamic paths
        ...

    def _resolve_dynamic_paths(self, config: Dict[str, Any]) -> Dict[str, Any]:
        # Uses exp_name and base_path to auto-fill empty path values
        # Or replaces {exp_name} template placeholders in path strings
        ...

    def get(self, key: str, default=None):
        # Dot-notation access: loader.get('data.lookback_window')
        ...

    def get_data_config(self) -> Dict[str, Any]: ...
    def get_training_config(self) -> Dict[str, Any]: ...
    def get_model_paths(self) -> Dict[str, str]: ...
    def get_experiment_config(self) -> Dict[str, Any]: ...
    def update_config(self, updates: Dict[str, Any]): ...
    def save_config(self, save_path: str = None): ...

Example YAML Configuration

The template config file is located at finetune_csv/configs/config_ali09988_candle-5min.yaml:

data:
  data_path: "/xxxx/Kronos/finetune_csv/data/HK_ali_09988_kline_5min_all.csv"
  lookback_window: 512
  predict_window: 48
  max_context: 512
  clip: 5.0
  train_ratio: 0.9
  val_ratio: 0.1
  test_ratio: 0.0

training:
  tokenizer_epochs: 30
  basemodel_epochs: 20
  batch_size: 32
  log_interval: 50
  num_workers: 6
  seed: 42
  tokenizer_learning_rate: 0.0002
  predictor_learning_rate: 0.000001
  adam_beta1: 0.9
  adam_beta2: 0.95
  adam_weight_decay: 0.1
  accumulation_steps: 1

model_paths:
  pretrained_tokenizer: "/xxx/Kronos/pretrained/Kronos-Tokenizer-base"
  pretrained_predictor: "/xxx/Kronos/pretrained/Kronos-base"
  exp_name: "HK_ali_09988_kline_5min_all"
  base_path: "/xxx/Kronos/finetune_csv/finetuned/"
  base_save_path: ""
  finetuned_tokenizer: ""
  tokenizer_save_name: "tokenizer"
  basemodel_save_name: "basemodel"

experiment:
  name: "kronos_custom_finetune"
  description: "Custom finetune for HK stock data"
  use_comet: false
  train_tokenizer: true
  train_basemodel: true
  skip_existing: false

device:
  use_cuda: true
  device_id: 0

Usage Example

from config_loader import CustomFinetuneConfig

# Load configuration from YAML
config = CustomFinetuneConfig("configs/config_ali09988_candle-5min.yaml")

# Access typed attributes directly
print(config.data_path)              # "/xxxx/.../HK_ali_09988_kline_5min_all.csv"
print(config.lookback_window)        # 512
print(config.tokenizer_epochs)       # 30
print(config.tokenizer_best_model_path)  # auto-computed path

# Get sub-config dictionaries for training functions
tok_config = config.get_tokenizer_config()
base_config = config.get_basemodel_config()

# Print summary
config.print_config_summary()

Convenience Methods

  • get_tokenizer_config() -- Returns a flat dictionary suitable for passing to the tokenizer training function.
  • get_basemodel_config() -- Returns a flat dictionary suitable for passing to the basemodel training function.
  • print_config_summary() -- Prints a formatted summary of all key configuration values.

See Also

Environment & Heuristic Links

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment