Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:HKUDS AI Trader Merge JSONL Pattern

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, ETL
Last Updated 2026-02-09 14:00 GMT

Overview

Script-level data transformation pattern for merging per-symbol JSON price files into consolidated JSONL format with field renaming and anti-look-ahead masking.

Description

The merge_jsonl.py script (and its market-specific variants) runs as a standalone script rather than exposing a callable function. It globs for daily_prices_*.json files, filters by the target symbol universe, renames OHLCV fields to trading semantics, applies anti-look-ahead masking on the latest date, and writes one JSON line per symbol to merged.jsonl.

Market-specific variants exist for US stocks, A-shares (both Alpha Vantage and Tushare sources), A-share hourly data, and cryptocurrency data. All follow the same pattern with minor schema differences.

Usage

Run this script directly via Python after the price data fetching step completes. It is typically invoked from shell scripts (e.g., main_step2.sh) as part of the automated pipeline.

Code Reference

Source Location

  • Repository: AI-Trader
  • File: data/merge_jsonl.py (US stocks)
  • Lines: L109-156
  • Variants:
    • data/crypto/merge_crypto_jsonl.py (Crypto)
    • data/A_stock/merge_jsonl_alphavantage.py (A-share AlphaVantage)
    • data/A_stock/merge_jsonl_hourly.py (A-share hourly)
    • data/A_stock/merge_jsonl_tushare.py (A-share Tushare)

Signature

# Script-level execution (no function wrapper)
# Key variables:
current_dir = os.path.dirname(__file__)
pattern = os.path.join(current_dir, "daily_price*.json")
files = sorted(glob.glob(pattern))
output_file = os.path.join(current_dir, "merged.jsonl")

# Core logic:
with open(output_file, "w", encoding="utf-8") as fout:
    for fp in files:
        symbol = extract_symbol(fp)
        if symbol not in target_universe:
            continue
        data = json.load(open(fp))
        # Rename fields and apply anti-look-ahead
        fout.write(json.dumps(data) + "\n")

Import

# Run as script:
# python data/merge_jsonl.py

# Or from shell:
# cd data && python merge_jsonl.py

I/O Contract

Inputs

Name Type Required Description
daily_prices_*.json Files Yes Individual per-symbol JSON files in the data directory
all_nasdaq_100_symbols List[str] Yes Symbol filter list (hardcoded in script)

Outputs

Name Type Description
merged.jsonl File One JSON line per symbol with renamed fields and anti-look-ahead masking

Usage Examples

Run US Stock Merge

# From shell (typical usage):
# cd data && python merge_jsonl.py

# The script reads all daily_prices_*.json files in the current directory,
# filters by NASDAQ-100 symbols, renames fields, and writes merged.jsonl

Field Renaming Example

# Input field names (Alpha Vantage format):
# "1. open", "2. high", "3. low", "4. close", "5. volume"

# Output field names (trading format):
# "1. buy price", "2. high", "3. low", "4. sell price", "5. volume"

# Latest date entry (anti-look-ahead):
# Only "1. buy price" is kept; "4. sell price" is removed

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment