Implementation:HKUDS AI Trader Merge JSONL Pattern
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, ETL |
| Last Updated | 2026-02-09 14:00 GMT |
Overview
Script-level data transformation pattern for merging per-symbol JSON price files into consolidated JSONL format with field renaming and anti-look-ahead masking.
Description
The merge_jsonl.py script (and its market-specific variants) runs as a standalone script rather than exposing a callable function. It globs for daily_prices_*.json files, filters by the target symbol universe, renames OHLCV fields to trading semantics, applies anti-look-ahead masking on the latest date, and writes one JSON line per symbol to merged.jsonl.
Market-specific variants exist for US stocks, A-shares (both Alpha Vantage and Tushare sources), A-share hourly data, and cryptocurrency data. All follow the same pattern with minor schema differences.
Usage
Run this script directly via Python after the price data fetching step completes. It is typically invoked from shell scripts (e.g., main_step2.sh) as part of the automated pipeline.
Code Reference
Source Location
- Repository: AI-Trader
- File: data/merge_jsonl.py (US stocks)
- Lines: L109-156
- Variants:
- data/crypto/merge_crypto_jsonl.py (Crypto)
- data/A_stock/merge_jsonl_alphavantage.py (A-share AlphaVantage)
- data/A_stock/merge_jsonl_hourly.py (A-share hourly)
- data/A_stock/merge_jsonl_tushare.py (A-share Tushare)
Signature
# Script-level execution (no function wrapper)
# Key variables:
current_dir = os.path.dirname(__file__)
pattern = os.path.join(current_dir, "daily_price*.json")
files = sorted(glob.glob(pattern))
output_file = os.path.join(current_dir, "merged.jsonl")
# Core logic:
with open(output_file, "w", encoding="utf-8") as fout:
for fp in files:
symbol = extract_symbol(fp)
if symbol not in target_universe:
continue
data = json.load(open(fp))
# Rename fields and apply anti-look-ahead
fout.write(json.dumps(data) + "\n")
Import
# Run as script:
# python data/merge_jsonl.py
# Or from shell:
# cd data && python merge_jsonl.py
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| daily_prices_*.json | Files | Yes | Individual per-symbol JSON files in the data directory |
| all_nasdaq_100_symbols | List[str] | Yes | Symbol filter list (hardcoded in script) |
Outputs
| Name | Type | Description |
|---|---|---|
| merged.jsonl | File | One JSON line per symbol with renamed fields and anti-look-ahead masking |
Usage Examples
Run US Stock Merge
# From shell (typical usage):
# cd data && python merge_jsonl.py
# The script reads all daily_prices_*.json files in the current directory,
# filters by NASDAQ-100 symbols, renames fields, and writes merged.jsonl
Field Renaming Example
# Input field names (Alpha Vantage format):
# "1. open", "2. high", "3. low", "4. close", "5. volume"
# Output field names (trading format):
# "1. buy price", "2. high", "3. low", "4. sell price", "5. volume"
# Latest date entry (anti-look-ahead):
# Only "1. buy price" is kept; "4. sell price" is removed