Principle:Shiyu coder Kronos CSV Finetuning Configuration

Field	Value
Principle Name	CSV_Finetuning_Configuration
Repository	Shiyu_coder_Kronos
Repository URL	https://github.com/shiyu-coder/Kronos
Domains	Configuration_Management, Machine_Learning
Implemented By	Implementation:Shiyu_coder_Kronos_CustomFinetuneConfig_Init
Last Updated	2026-02-09 14:00 GMT

Overview

This principle describes YAML-driven configuration for finetuning the Kronos time series forecasting model on custom CSV data with automatic path computation and validation. The design separates configuration from code, enabling users to specify all finetuning parameters in a single YAML file without modifying any Python source.

Concept

A YAML configuration file specifies data paths, training hyperparameters, model paths, and experiment settings. A ConfigLoader class reads and resolves dynamic paths (including template-based path expansion using {exp_name} placeholders). A CustomFinetuneConfig class provides typed attribute access to all configuration values, with sensible defaults for every parameter.

Theory

The principle of separation of configuration from code is fundamental to reproducible machine learning experiments:

Single source of truth: All experiment parameters live in one YAML file, making experiments easy to reproduce, compare, and version-control.
Dynamic path resolution: The ConfigLoader resolves path templates at load time. If base_save_path or finetuned_tokenizer are left as empty strings, the system auto-generates full paths from exp_name and base_path. If the value contains {exp_name}, it performs string substitution.
Typed attribute access: CustomFinetuneConfig converts the nested YAML dictionary into flat, typed Python attributes (e.g., config.lookback_window returns an integer), eliminating error-prone dictionary key lookups throughout training code.
Computed paths: Derived paths such as tokenizer_save_path, tokenizer_best_model_path, basemodel_save_path, and basemodel_best_model_path are automatically computed from base paths and save names using os.path.join.
Defaults with override: Every parameter has a default value. The YAML file only needs to specify values that differ from defaults.

Configuration Sections

The YAML configuration file is organized into the following sections:

data: Data path, lookback/predict windows, max context, clip value, train/val/test split ratios
training: Epoch counts (separate for tokenizer and basemodel), batch size, learning rates, optimizer parameters, gradient accumulation steps
model_paths: Pretrained model paths, experiment name, base save path, finetuned tokenizer path, save names
experiment: Experiment name and description, phase control flags (train_tokenizer, train_basemodel), skip_existing flag, pre_trained flags for tokenizer and predictor
device: CUDA usage flag and device ID
distributed: DDP usage flag and backend selection

Relationship to Training Pipeline

The configuration is consumed by the entire finetuning pipeline:

CustomKlineDataset uses data section parameters for loading and splitting CSV data.
SequentialTrainer uses experiment flags to control which training phases to execute.
train_tokenizer and train_model functions use training hyperparameters.
Model loading logic uses model paths to locate pretrained checkpoints and determine save locations.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment