Workflow:Shiyu coder Kronos Qlib Finetuning
| Knowledge Sources | |
|---|---|
| Domains | Financial_Forecasting, Fine_Tuning, Distributed_Training, Quantitative_Finance |
| Last Updated | 2026-02-09 14:00 GMT |
Overview
End-to-end pipeline for finetuning the Kronos foundation model on Chinese A-share market data using Microsoft Qlib, with distributed multi-GPU training and integrated backtesting evaluation.
Description
This workflow covers the full finetuning lifecycle for adapting pre-trained Kronos models to a specific market domain. It uses Microsoft Qlib for data sourcing and backtesting infrastructure. The process follows a two-stage training approach: first the KronosTokenizer is finetuned to learn the data distribution of the target domain (VQ-VAE style reconstruction loss plus BSQ loss), then the Kronos predictor model is finetuned on token sequences produced by the frozen finetuned tokenizer (cross-entropy on hierarchical s1/s2 token predictions). Both stages use DistributedDataParallel (DDP) for multi-GPU training via torchrun. After training, the finetuned models are evaluated through a TopK portfolio backtest using Qlib's backtesting framework.
Usage
Execute this workflow when you want to adapt a pre-trained Kronos model to the Chinese A-share stock market (CSI300, CSI800, or CSI1000 universes). You need access to Qlib's data infrastructure with daily frequency data, multiple GPUs for efficient distributed training, and the pre-trained Kronos model weights. This pipeline is designed for quantitative researchers who want to specialize the general Kronos model for a specific market with proper train/validation/test time splits and portfolio-level backtesting.
Execution Steps
Step 1: Configure Experiment
Edit the centralized Config class to set all paths and hyperparameters. This includes Qlib data paths, dataset output paths, model checkpoint paths, training hyperparameters (epochs, batch size, learning rates), time range splits for train/validation/test, and backtesting parameters (number of stocks to hold, benchmark index).
Key considerations:
- Set qlib_data_path to your local Qlib data directory
- Configure pretrained_tokenizer_path and pretrained_predictor_path (can be HuggingFace Hub names or local paths)
- Time ranges overlap intentionally: validation starts before training ends to account for the lookback window
- Default lookback is 90 days with 10-day prediction window for daily frequency data
- Optionally configure Comet ML logging for experiment tracking
Step 2: Prepare Dataset
Run the Qlib data preprocessing script to load raw market data, process it symbol-by-symbol, and split into train/validation/test pickle files. The preprocessor initializes Qlib, loads OHLCV data for the configured instrument universe, computes derived features (volume and amount), filters symbols with insufficient data, and saves time-split datasets.
Key considerations:
- Requires Qlib to be installed and initialized with the correct data provider
- The script processes all symbols in the configured instrument universe (e.g., CSI300 = ~300 stocks)
- Output is three pickle files: train_data.pkl, val_data.pkl, test_data.pkl, each a dict mapping symbol to DataFrame
- Symbols with fewer rows than lookback_window + predict_window are automatically filtered out
Step 3: Finetune Tokenizer
Launch distributed training of the KronosTokenizer using torchrun. The tokenizer is loaded from pretrained weights, wrapped in DDP, and trained on the Qlib dataset. The training objective is reconstruction: the tokenizer encodes OHLCV sequences into hierarchical discrete tokens via BSQ and decodes them back, optimizing MSE reconstruction loss plus BSQ quantization loss. Validation tracks reconstruction MSE, and the best checkpoint is saved.
Key considerations:
- Launch with torchrun for multi-GPU DDP training
- Uses AdamW optimizer with OneCycleLR scheduler
- Gradient accumulation is configurable for simulating larger effective batch sizes
- The QlibDataset performs random sampling within each epoch for efficiency on large datasets
- Best model is saved based on validation reconstruction loss
Step 4: Finetune Predictor
Launch distributed training of the Kronos predictor model using the frozen finetuned tokenizer from Step 3. The tokenizer encodes training data into s1/s2 token sequences on-the-fly, and the predictor is trained with next-token prediction (cross-entropy loss on both s1 and s2 token heads). The DualHead architecture predicts s1 tokens first, then s2 tokens conditioned on s1 through a DependencyAwareLayer.
Key considerations:
- The finetuned tokenizer must exist from Step 3 (loaded from the checkpoint path)
- The tokenizer is frozen (eval mode, no gradients) during predictor training
- Loss is the sum of s1 cross-entropy and s2 cross-entropy
- Uses teacher forcing: input tokens are shifted by one position relative to target tokens
- Gradient clipping at max_norm=3.0 to stabilize training
Step 5: Run Backtest
Load the finetuned tokenizer and predictor, run inference on the test dataset to generate prediction signals, and evaluate through Qlib's TopkDropoutStrategy backtesting framework. The inference generates multiple signal types (mean, last, max, min predicted close price change), which are each backtested independently. Results include cumulative return curves compared against the benchmark index.
Key considerations:
- Inference uses auto_regressive_inference with configurable temperature, top_p, top_k, and sample_count
- The QlibTestDataset iterates all sliding windows in the test period sequentially
- Multiple signal extraction strategies are generated: last-day close change, mean close change, max close change, min close change
- Backtesting uses TopkDropoutStrategy with configurable holding parameters
- Performance analysis includes excess return with and without transaction costs
- Results are saved as pickle files and plotted as cumulative return charts