Principle:Shiyu coder Kronos CSV Finetuning Configuration
| Field | Value |
|---|---|
| Principle Name | CSV_Finetuning_Configuration |
| Repository | Shiyu_coder_Kronos |
| Repository URL | https://github.com/shiyu-coder/Kronos |
| Domains | Configuration_Management, Machine_Learning |
| Implemented By | Implementation:Shiyu_coder_Kronos_CustomFinetuneConfig_Init |
| Last Updated | 2026-02-09 14:00 GMT |
Overview
This principle describes YAML-driven configuration for finetuning the Kronos time series forecasting model on custom CSV data with automatic path computation and validation. The design separates configuration from code, enabling users to specify all finetuning parameters in a single YAML file without modifying any Python source.
Concept
A YAML configuration file specifies data paths, training hyperparameters, model paths, and experiment settings. A ConfigLoader class reads and resolves dynamic paths (including template-based path expansion using {exp_name} placeholders). A CustomFinetuneConfig class provides typed attribute access to all configuration values, with sensible defaults for every parameter.
Theory
The principle of separation of configuration from code is fundamental to reproducible machine learning experiments:
- Single source of truth: All experiment parameters live in one YAML file, making experiments easy to reproduce, compare, and version-control.
- Dynamic path resolution: The ConfigLoader resolves path templates at load time. If
base_save_pathorfinetuned_tokenizerare left as empty strings, the system auto-generates full paths fromexp_nameandbase_path. If the value contains{exp_name}, it performs string substitution. - Typed attribute access: CustomFinetuneConfig converts the nested YAML dictionary into flat, typed Python attributes (e.g.,
config.lookback_windowreturns an integer), eliminating error-prone dictionary key lookups throughout training code. - Computed paths: Derived paths such as
tokenizer_save_path,tokenizer_best_model_path,basemodel_save_path, andbasemodel_best_model_pathare automatically computed from base paths and save names usingos.path.join. - Defaults with override: Every parameter has a default value. The YAML file only needs to specify values that differ from defaults.
Configuration Sections
The YAML configuration file is organized into the following sections:
- data: Data path, lookback/predict windows, max context, clip value, train/val/test split ratios
- training: Epoch counts (separate for tokenizer and basemodel), batch size, learning rates, optimizer parameters, gradient accumulation steps
- model_paths: Pretrained model paths, experiment name, base save path, finetuned tokenizer path, save names
- experiment: Experiment name and description, phase control flags (train_tokenizer, train_basemodel), skip_existing flag, pre_trained flags for tokenizer and predictor
- device: CUDA usage flag and device ID
- distributed: DDP usage flag and backend selection
Relationship to Training Pipeline
The configuration is consumed by the entire finetuning pipeline:
- CustomKlineDataset uses data section parameters for loading and splitting CSV data.
- SequentialTrainer uses experiment flags to control which training phases to execute.
- train_tokenizer and train_model functions use training hyperparameters.
- Model loading logic uses model paths to locate pretrained checkpoints and determine save locations.
See Also
- Implementation:Shiyu_coder_Kronos_CustomFinetuneConfig_Init -- API documentation for CustomFinetuneConfig
- Principle:Shiyu_coder_Kronos_Sequential_Two_Stage_Training -- Training pipeline that consumes this configuration