Principle:Shiyu coder Kronos Qlib Experiment Configuration
| Field | Value |
|---|---|
| principle_name | Qlib_Experiment_Configuration |
| repository | https://github.com/shiyu-coder/Kronos |
| domains | Machine_Learning, Experiment_Management |
| implemented_by | Implementation:Shiyu_coder_Kronos_Config_Init |
| last_updated | 2026-02-09 14:00 GMT |
Summary
Centralizing all experiment hyperparameters, data paths, time ranges, and training settings into a single configuration object for reproducible Qlib-based finetuning experiments.
Concept
The Qlib Experiment Configuration principle addresses the need for a unified, self-contained configuration management pattern for machine learning experiments. Rather than scattering hyperparameters, file paths, time ranges, and training settings across multiple scripts or argument parsers, all settings are grouped into a single Python class. This enables:
- Reproducibility: Every experiment is fully described by one configuration snapshot.
- Traceability: Configuration can be serialized (e.g., to JSON) alongside model checkpoints.
- Convenience: Derived paths (such as fine-tuned model checkpoint locations) are computed automatically from base settings.
Theory
This principle follows the Configuration Object pattern commonly used in ML experiment management. The key design choices are:
- Single-class encapsulation: All parameters live as attributes of one class, organized into logical groups: data parameters, dataset splitting, training hyperparameters, experiment logging, model paths, and backtesting parameters.
- Derived attributes: Some attributes (e.g.,
finetuned_tokenizer_path,finetuned_predictor_path,n_train_iter) are computed from other base attributes in the constructor, preventing inconsistency. - No constructor arguments: The configuration is instantiated with sensible defaults. Users modify attributes directly or subclass for different experiments.
- Dict conversion: The configuration can be converted to a plain dictionary via
__dict__for compatibility with functions expecting dict-based configs.
This pattern is lighter weight than YAML-based configuration frameworks (e.g., Hydra) but provides the same benefits of centralization and reproducibility for single-experiment workflows.
Parameter Groups
The configuration is organized into the following logical sections:
- Data and Feature Parameters: Qlib data path, instrument universe, sliding window sizes, feature lists
- Dataset Splitting and Paths: Time-based train/val/test ranges with overlapping buffers to account for lookback windows
- Training Hyperparameters: Epochs, batch size, learning rates, optimizer parameters, gradient accumulation
- Experiment Logging and Saving: Comet ML integration settings, checkpoint save paths
- Model and Checkpoint Paths: Pretrained model locations, auto-derived finetuned model paths
- Backtesting Parameters: Portfolio strategy settings, inference sampling parameters
Domains
- Machine_Learning: Configuration management for ML training pipelines
- Experiment_Management: Reproducible experiment tracking and parameter organization
Related Principles
- Data preprocessing, training, and backtesting pipelines all consume this configuration object
- The dict-converted form is passed to DDP training functions for distributed compatibility