Principle:Shiyu coder Kronos Batch Forecasting
| Field | Value |
|---|---|
| principle_name | Batch_Forecasting |
| repo | Shiyu_coder_Kronos |
| domains | Time_Series, Financial_Forecasting, Batch_Processing |
| last_updated | 2026-02-09 14:00 GMT |
| implemented_by | Implementation:Shiyu_coder_Kronos_KronosPredictor_Predict_Batch |
Summary
GPU-parallel generation of candlestick forecasts across multiple time series simultaneously with consistent input dimensions.
Concept
Batch forecasting extends the single-series prediction pipeline to process multiple financial time series in parallel within a single GPU kernel execution. Instead of iterating over series one at a time, all series are stacked into a single batch tensor and processed simultaneously through the tokenization, generation, and decoding stages.
This approach is critical for production use cases where forecasts are needed for many instruments (e.g., hundreds of stocks) within tight latency constraints. GPU parallelism provides significant speedup compared to sequential per-series prediction.
Theory
The batch forecasting pipeline follows the same five stages as single-series forecasting, but operates on stacked tensors:
Input: N series, each with their own DataFrame + timestamps
|
v
Per-series normalization (each series uses its own mean/std)
|
v
Stack into batch tensor: (N, seq_len, features)
|
v
Batch tokenization: tokenizer.encode() on full batch
|
v
Batch autoregressive generation: auto_regressive_inference()
| (all N series generate tokens in parallel)
v
Batch decoding: tokenizer.decode() on full batch
|
v
Per-series denormalization (restore each series' original scale)
|
v
Output: N DataFrames, one per series
Key Constraint: Uniform Dimensions
For batch processing to work, all series must have:
- Same historical length (number of input timesteps).
- Same prediction length (
pred_len).
This is a fundamental requirement because the batch tensor must be rectangular (all rows have the same number of columns). If series have different lengths, they cannot be stacked into a single tensor without padding, which is not supported by this implementation.
Per-Series Normalization
Even though the series are batched together for GPU execution, normalization remains per-series:
- Each series is independently normalized using its own mean and standard deviation.
- After generation, each series is independently denormalized back to its own price scale.
This ensures that a $10 stock and a $10,000 stock in the same batch receive appropriate treatment despite vastly different absolute values.
Memory Considerations
Batch GPU memory usage scales as:
memory ~ N_series * sample_count * max_context * d_model
Where N_series is the batch size, sample_count is the number of parallel samples per series, and max_context * d_model is the per-sequence memory footprint.
Source
- Repository: Kronos on GitHub
Domains
- Time_Series: Parallel sequential data forecasting.
- Financial_Forecasting: Multi-instrument candlestick prediction.
- Batch_Processing: GPU-efficient parallel execution of the inference pipeline.
Related Principles
- Principle:Shiyu_coder_Kronos_Single_Series_Forecasting - The single-series version of this pipeline.
- Principle:Shiyu_coder_Kronos_Autoregressive_Token_Generation - The core generation loop used inside batch forecasting.
- Principle:Shiyu_coder_Kronos_Predictor_Initialization - Setting up the predictor that runs this pipeline.