Principle:Shiyu coder Kronos Qlib Test Inference
| Field | Value |
|---|---|
| principle_name | Qlib_Test_Inference |
| repository | https://github.com/shiyu-coder/Kronos |
| domains | Inference, Batch_Processing, Financial_Forecasting |
| implemented_by | Implementation:Shiyu_coder_Kronos_Generate_Predictions_Qlib |
| last_updated | 2026-02-09 14:00 GMT |
Summary
Batch inference over an entire test dataset to generate trading signal predictions for backtesting evaluation.
Concept
The Qlib Test Inference principle describes the end-to-end process of converting fine-tuned models into actionable trading signals. This is the bridge between model training and backtesting evaluation, where the model produces quantitative predictions for every symbol on every day in the test period.
Theory
The inference pipeline follows a structured sequence:
Model Loading
Both the fine-tuned tokenizer and predictor are loaded and set to eval mode. The tokenizer converts continuous inputs to discrete tokens, and the predictor generates future token predictions autoregressively.
Sequential Sliding Window Dataset
Unlike training (which uses random sampling), the test dataset (QlibTestDataset) iterates sequentially through all valid sliding windows. For each window, it yields:
- Context features (lookback window)
- Context time stamps
- Future time stamps (for the prediction horizon)
- Symbol name and timestamp metadata (for mapping predictions back to the calendar)
The metadata is critical because predictions must be associated with specific (datetime, symbol) pairs for portfolio construction.
Autoregressive Inference
For each batch, the pipeline uses auto_regressive_inference() to generate predictions:
- The tokenizer encodes the context window into discrete tokens
- The predictor generates future tokens autoregressively, one step at a time
- Multiple samples are drawn (controlled by
inference_sample_count) and averaged to reduce variance - Temperature (
inference_T) and nucleus sampling (inference_top_p) control generation diversity
Signal Extraction
From the raw multi-feature predictions, trading signals are extracted by computing the close price delta (predicted close minus last observed close). Four signal variants are computed:
- mean: Average predicted close across the prediction horizon
- last: Predicted close on the final prediction day
- max: Maximum predicted close across the horizon
- min: Minimum predicted close across the horizon
Each variant captures a different aspect of the price trajectory and may perform differently in backtesting.
Pivot to DataFrames
The per-sample (timestamp, symbol, score) records are pivoted into DataFrames with:
- Index: datetime
- Columns: symbol names
- Values: prediction scores
This format is directly consumable by the backtesting framework.
Key Design Decisions
- Custom collate function: Required because each batch contains a mix of tensors (features), strings (symbols), and Timestamp objects that cannot be handled by PyTorch's default collation
- Batch size adjustment: The DataLoader batch size is divided by
sample_countbecauseauto_regressive_inferenceinternally expands each sample by the sample count - Instance normalization: Applied identically to training to ensure consistency
Domains
- Inference: Batch model prediction generation
- Batch_Processing: Efficient DataLoader-based processing of large test sets
- Financial_Forecasting: Time series price prediction for trading signals