Workflow:Shiyu coder Kronos Batch Prediction
| Knowledge Sources | |
|---|---|
| Domains | Financial_Forecasting, Time_Series, LLMs |
| Last Updated | 2026-02-09 14:00 GMT |
Overview
End-to-end process for generating candlestick forecasts on multiple financial time series simultaneously using GPU-parallel batch inference with the Kronos foundation model.
Description
This workflow extends the single-series prediction pipeline to handle multiple time series in a single batched forward pass. It loads the pre-trained KronosTokenizer and Kronos model, wraps them in a KronosPredictor, prepares lists of DataFrames and timestamp Series for multiple instruments, and calls the predict_batch method. The batch method stacks all series into a single tensor, runs autoregressive generation in parallel across the batch dimension, and returns per-series denormalized prediction DataFrames. Each series is independently normalized and denormalized, but tokenization and generation share GPU compute.
Usage
Execute this workflow when you need to forecast multiple financial instruments (or multiple time windows of the same instrument) simultaneously and want to leverage GPU parallelism. All series must have the same historical length (lookback) and the same prediction length (pred_len). This is suitable for portfolio-level forecasting, screening multiple assets, or generating predictions across multiple rolling windows.
Execution Steps
Step 1: Load Tokenizer and Model
Load a pre-trained KronosTokenizer and Kronos model from the HuggingFace Hub. This step is identical to the single-series workflow. The tokenizer and model are loaded once and reused across all series in the batch.
Key considerations:
- For batch prediction, a GPU device is strongly recommended for performance
- The tokenizer and model must be a matched pair
Step 2: Instantiate Predictor
Create a KronosPredictor with the loaded model, tokenizer, and target device. For batch prediction, explicitly specifying a CUDA device is recommended for efficient parallelism.
Key considerations:
- Set max_context to match the model variant (512 for small/base)
- GPU memory requirements scale linearly with batch size and sample_count
Step 3: Prepare Batch Inputs
Construct three parallel lists: a list of DataFrames (one per series), a list of historical timestamp Series, and a list of future timestamp Series. Each DataFrame must contain at least open, high, low, close columns. All DataFrames must have identical row counts (same lookback), and all future timestamp Series must have length equal to pred_len.
Key considerations:
- All series must share the same historical length and prediction length
- Each series is independently normalized (instance-level mean/std)
- Volume and amount columns are optional; missing values are zero-filled per series
- The method validates input consistency and raises clear errors on mismatches
Step 4: Generate Batch Forecast
Call predict_batch with the lists of inputs and sampling parameters. Internally, the method stacks all normalized series into a single batch tensor, runs autoregressive generation for the entire batch in parallel, and splits the results back into per-series predictions. Each series is denormalized with its own statistics.
Key considerations:
- GPU memory usage is proportional to batch_size multiplied by sample_count
- The method returns a list of DataFrames in the same order as input
- Temperature, top_p, and sample_count parameters apply uniformly to all series
- Progress bar (verbose=True) tracks autoregressive token generation across the batch
Step 5: Process Results
Iterate through the returned list of prediction DataFrames to analyze, save, or visualize results for each individual series. Each DataFrame contains open, high, low, close, volume, amount columns indexed by the corresponding y_timestamp.
Key considerations:
- Results maintain the same ordering as the input lists
- Per-series post-processing (e.g., price limit clamping) can be applied independently
- For large batches, consider processing results in chunks to manage memory