Principle:Shiyu coder Kronos Qlib Experiment Configuration

Field	Value
principle_name	Qlib_Experiment_Configuration
repository	https://github.com/shiyu-coder/Kronos
domains	Machine_Learning, Experiment_Management
implemented_by	Implementation:Shiyu_coder_Kronos_Config_Init
last_updated	2026-02-09 14:00 GMT

Summary

Centralizing all experiment hyperparameters, data paths, time ranges, and training settings into a single configuration object for reproducible Qlib-based finetuning experiments.

Concept

The Qlib Experiment Configuration principle addresses the need for a unified, self-contained configuration management pattern for machine learning experiments. Rather than scattering hyperparameters, file paths, time ranges, and training settings across multiple scripts or argument parsers, all settings are grouped into a single Python class. This enables:

Reproducibility: Every experiment is fully described by one configuration snapshot.
Traceability: Configuration can be serialized (e.g., to JSON) alongside model checkpoints.
Convenience: Derived paths (such as fine-tuned model checkpoint locations) are computed automatically from base settings.

Theory

This principle follows the Configuration Object pattern commonly used in ML experiment management. The key design choices are:

Single-class encapsulation: All parameters live as attributes of one class, organized into logical groups: data parameters, dataset splitting, training hyperparameters, experiment logging, model paths, and backtesting parameters.
Derived attributes: Some attributes (e.g., finetuned_tokenizer_path, finetuned_predictor_path, n_train_iter) are computed from other base attributes in the constructor, preventing inconsistency.
No constructor arguments: The configuration is instantiated with sensible defaults. Users modify attributes directly or subclass for different experiments.
Dict conversion: The configuration can be converted to a plain dictionary via __dict__ for compatibility with functions expecting dict-based configs.

This pattern is lighter weight than YAML-based configuration frameworks (e.g., Hydra) but provides the same benefits of centralization and reproducibility for single-experiment workflows.

Parameter Groups

The configuration is organized into the following logical sections:

Data and Feature Parameters: Qlib data path, instrument universe, sliding window sizes, feature lists
Dataset Splitting and Paths: Time-based train/val/test ranges with overlapping buffers to account for lookback windows
Training Hyperparameters: Epochs, batch size, learning rates, optimizer parameters, gradient accumulation
Experiment Logging and Saving: Comet ML integration settings, checkpoint save paths
Model and Checkpoint Paths: Pretrained model locations, auto-derived finetuned model paths
Backtesting Parameters: Portfolio strategy settings, inference sampling parameters

Domains

Machine_Learning: Configuration management for ML training pipelines
Experiment_Management: Reproducible experiment tracking and parameter organization

Related Principles

Data preprocessing, training, and backtesting pipelines all consume this configuration object
The dict-converted form is passed to DDP training functions for distributed compatibility

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment