Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Shiyu coder Kronos Qlib Data Environment

From Leeroopedia


Knowledge Sources
Domains Financial_Data, Infrastructure
Last Updated 2026-02-09 13:47 GMT

Overview

Microsoft Qlib environment with Chinese A-share market data (CSI300/CSI800/CSI1000) for the Kronos finetuning and backtesting pipeline.

Description

This environment provides the financial data infrastructure required by the Qlib finetuning workflow. It depends on the Microsoft Qlib library initialized with the Chinese market data provider (`REG_CN`). The data must be pre-downloaded to a local directory (default `~/.qlib/qlib_data/cn_data`). The environment supports instruments including CSI300, CSI800, and CSI1000, and provides OHLCV data through the `QlibDataLoader` interface. Processed datasets are serialized as pickle files for training.

Usage

Use this environment for the Qlib Finetuning Pipeline workflow: data preprocessing, dataset creation, tokenizer finetuning, predictor finetuning, inference, and backtesting. This is not required for basic prediction or CSV finetuning workflows.

System Requirements

Category Requirement Notes
OS Linux or macOS Qlib supports these platforms
Disk 10GB+ for Qlib CN data Data stored at `~/.qlib/qlib_data/cn_data` by default
Network Internet access for initial data download Subsequent runs are offline

Dependencies

Python Packages

  • `qlib` (Microsoft Qlib library)
  • `pickle` (standard library, for dataset serialization)
  • All packages from the base PyTorch CUDA environment

Credentials

No API keys required. Qlib CN data is freely downloadable:

# Download Chinese market data
python -m qlib.run.get_data qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn

The following configuration paths must be set in `finetune/config.py`:

  • `qlib_data_path`: Path to Qlib data directory (default: `~/.qlib/qlib_data/cn_data`)
  • `dataset_path`: Path for processed pickle datasets (default: `./data/processed_datasets`)
  • `pretrained_tokenizer_path`: HuggingFace model ID or local path to pretrained tokenizer
  • `pretrained_predictor_path`: HuggingFace model ID or local path to pretrained predictor

Quick Install

# Install Qlib
pip install qlib

# Download Chinese A-share market data
python -m qlib.run.get_data qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn

Code Evidence

Qlib initialization from `finetune/qlib_data_preprocess.py:25-28`:

def initialize_qlib(self):
    """Initializes the Qlib environment."""
    print("Initializing Qlib...")
    qlib.init(provider_uri=self.config.qlib_data_path, region=REG_CN)

Configuration paths requiring user update from `finetune/config.py:12-13`:

# TODO: Update this path to your Qlib data directory.
self.qlib_data_path = "~/.qlib/qlib_data/cn_data"

Dataset pickle file paths from `finetune/dataset.py:41-42`:

self.data_path = f"{self.config.dataset_path}/train_data.pkl"

Instrument and benchmark mapping from `finetune/config.py:122-131`:

def _set_benchmark(self, instrument):
    dt_benchmark = {
        'csi800': "SH000906",
        'csi1000': "SH000852",
        'csi300': "SH000300",
    }

Common Errors

Error Message Cause Solution
`ModuleNotFoundError: No module named 'qlib'` Qlib not installed `pip install qlib`
`FileNotFoundError: ... cn_data` Qlib data not downloaded Run `python -m qlib.run.get_data qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn`
`FileNotFoundError: ... train_data.pkl` Data not preprocessed Run `qlib_data_preprocess.py` first to generate pickle files
`ValueError: Benchmark not defined for instrument` Invalid instrument name Use one of: `csi300`, `csi800`, `csi1000`

Compatibility Notes

  • Region: Currently hardcoded to Chinese market (`REG_CN`). To use other markets, modify the `region` parameter in Qlib initialization.
  • Instruments: Supported instruments: CSI300, CSI800, CSI1000. Each maps to a specific benchmark index.
  • Data freshness: Qlib data must be re-downloaded periodically for up-to-date market data.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment