Implementation:Shiyu coder Kronos CustomFinetuneConfig Init

Field	Value
Implementation Name	CustomFinetuneConfig_Init
Repository	Shiyu_coder_Kronos
Repository URL	https://github.com/shiyu-coder/Kronos
Type	API Doc
Source File	finetune_csv/config_loader.py
Lines	L109-267 (CustomFinetuneConfig), L6-106 (ConfigLoader)
Class	CustomFinetuneConfig
Implements Principle	Principle:Shiyu_coder_Kronos_CSV_Finetuning_Configuration
Dependencies	yaml, os
Last Updated	2026-02-09 14:00 GMT

Overview

CustomFinetuneConfig is the primary configuration class for Kronos CSV finetuning. It loads a YAML configuration file via the internal ConfigLoader helper, resolves dynamic paths, and exposes all configuration values as typed Python attributes.

API

from config_loader import CustomFinetuneConfig

config = CustomFinetuneConfig(config_path: str = None) -> CustomFinetuneConfig

Import

from config_loader import CustomFinetuneConfig

Constructor Parameters

Parameter	Type	Default	Description
config_path	str	None	Path to YAML config file. If None, defaults to `config.yaml` in the same directory as config_loader.py.

Output Attributes

Data Attributes

Attribute	Type	Default	Description
data_path	str	--	Path to the CSV data file
lookback_window	int	512	Number of historical time steps for input context
predict_window	int	48	Number of future time steps to predict
max_context	int	512	Maximum context length for the model
clip	float	5.0	Clipping threshold for normalized data
train_ratio	float	0.9	Fraction of data for training
val_ratio	float	0.1	Fraction of data for validation
test_ratio	float	0.0	Fraction of data for testing

Training Attributes

Attribute	Type	Default	Description
tokenizer_epochs	int	30	Training epochs for the tokenizer phase
basemodel_epochs	int	30	Training epochs for the predictor phase
batch_size	int	160	Batch size for DataLoader
tokenizer_learning_rate	float	2e-4	Learning rate for tokenizer training
predictor_learning_rate	float	4e-5	Learning rate for predictor training
accumulation_steps	int	1	Gradient accumulation steps for tokenizer
adam_weight_decay	float	0.1	Weight decay for AdamW optimizer
adam_beta1	float	0.9	AdamW beta1
adam_beta2	float	0.95	AdamW beta2
log_interval	int	50	Steps between log messages
num_workers	int	6	DataLoader worker count
seed	int	100	Random seed

Model Path Attributes

Attribute	Type	Default	Description
pretrained_tokenizer_path	str	--	Path to pretrained tokenizer checkpoint
pretrained_predictor_path	str	--	Path to pretrained predictor checkpoint
base_save_path	str	--	Root directory for saving finetuned models
finetuned_tokenizer_path	str	--	Path to the finetuned tokenizer (auto-resolved)
exp_name	str	"default_experiment"	Experiment name used for path generation

Computed Path Attributes

Attribute	Computation	Description
tokenizer_save_path	`os.path.join(base_save_path, tokenizer_save_name)`	Directory for tokenizer training outputs
tokenizer_best_model_path	`os.path.join(tokenizer_save_path, 'best_model')`	Path to best tokenizer checkpoint
basemodel_save_path	`os.path.join(base_save_path, basemodel_save_name)`	Directory for basemodel training outputs
basemodel_best_model_path	`os.path.join(basemodel_save_path, 'best_model')`	Path to best basemodel checkpoint

Experiment Attributes

Attribute	Type	Default	Description
train_tokenizer	bool	True	Whether to run the tokenizer training phase
train_basemodel	bool	True	Whether to run the basemodel training phase
skip_existing	bool	False	Skip training if model already exists on disk
pre_trained_tokenizer	bool	True	Load pretrained tokenizer weights (False = random init)
pre_trained_predictor	bool	True	Load pretrained predictor weights (False = random init)
use_comet	bool	False	Enable Comet ML experiment tracking

ConfigLoader Helper (L6-106)

The internal ConfigLoader class handles YAML loading and dynamic path resolution:

class ConfigLoader:
    def __init__(self, config_path: str):
        self.config_path = config_path
        self.config = self._load_config()

    def _load_config(self) -> Dict[str, Any]:
        # Validates file exists, loads YAML, resolves dynamic paths
        ...

    def _resolve_dynamic_paths(self, config: Dict[str, Any]) -> Dict[str, Any]:
        # Uses exp_name and base_path to auto-fill empty path values
        # Or replaces {exp_name} template placeholders in path strings
        ...

    def get(self, key: str, default=None):
        # Dot-notation access: loader.get('data.lookback_window')
        ...

    def get_data_config(self) -> Dict[str, Any]: ...
    def get_training_config(self) -> Dict[str, Any]: ...
    def get_model_paths(self) -> Dict[str, str]: ...
    def get_experiment_config(self) -> Dict[str, Any]: ...
    def update_config(self, updates: Dict[str, Any]): ...
    def save_config(self, save_path: str = None): ...

Example YAML Configuration

The template config file is located at finetune_csv/configs/config_ali09988_candle-5min.yaml:

data:
  data_path: "/xxxx/Kronos/finetune_csv/data/HK_ali_09988_kline_5min_all.csv"
  lookback_window: 512
  predict_window: 48
  max_context: 512
  clip: 5.0
  train_ratio: 0.9
  val_ratio: 0.1
  test_ratio: 0.0

training:
  tokenizer_epochs: 30
  basemodel_epochs: 20
  batch_size: 32
  log_interval: 50
  num_workers: 6
  seed: 42
  tokenizer_learning_rate: 0.0002
  predictor_learning_rate: 0.000001
  adam_beta1: 0.9
  adam_beta2: 0.95
  adam_weight_decay: 0.1
  accumulation_steps: 1

model_paths:
  pretrained_tokenizer: "/xxx/Kronos/pretrained/Kronos-Tokenizer-base"
  pretrained_predictor: "/xxx/Kronos/pretrained/Kronos-base"
  exp_name: "HK_ali_09988_kline_5min_all"
  base_path: "/xxx/Kronos/finetune_csv/finetuned/"
  base_save_path: ""
  finetuned_tokenizer: ""
  tokenizer_save_name: "tokenizer"
  basemodel_save_name: "basemodel"

experiment:
  name: "kronos_custom_finetune"
  description: "Custom finetune for HK stock data"
  use_comet: false
  train_tokenizer: true
  train_basemodel: true
  skip_existing: false

device:
  use_cuda: true
  device_id: 0

Usage Example

from config_loader import CustomFinetuneConfig

# Load configuration from YAML
config = CustomFinetuneConfig("configs/config_ali09988_candle-5min.yaml")

# Access typed attributes directly
print(config.data_path)              # "/xxxx/.../HK_ali_09988_kline_5min_all.csv"
print(config.lookback_window)        # 512
print(config.tokenizer_epochs)       # 30
print(config.tokenizer_best_model_path)  # auto-computed path

# Get sub-config dictionaries for training functions
tok_config = config.get_tokenizer_config()
base_config = config.get_basemodel_config()

# Print summary
config.print_config_summary()

Convenience Methods

get_tokenizer_config() -- Returns a flat dictionary suitable for passing to the tokenizer training function.
get_basemodel_config() -- Returns a flat dictionary suitable for passing to the basemodel training function.
print_config_summary() -- Prints a formatted summary of all key configuration values.

Environment & Heuristic Links

Environment:Shiyu_coder_Kronos_PyTorch_CUDA_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment