Principle:HKUDS AI Trader JSONL Data Merging
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, ETL |
| Last Updated | 2026-02-09 14:00 GMT |
Overview
A data transformation pattern that merges individual per-symbol JSON price files into a single JSONL file with field renaming and anti-look-ahead masking for agent consumption.
Description
JSONL Data Merging is a critical ETL step that transforms raw per-symbol JSON price data into a consolidated format consumable by the trading agent's prompt system. The merge process performs three key operations:
- Consolidation: Multiple per-symbol JSON files are combined into a single JSONL file (one JSON object per line per symbol)
- Field renaming: Raw OHLCV fields are renamed to trading-oriented names (e.g., "1. open" becomes "1. buy price", "4. close" becomes "4. sell price")
- Anti-look-ahead masking: For the most recent date in each symbol's data, only the buy price is retained while the sell price is removed, preventing the agent from seeing future closing prices
This pattern ensures data integrity for backtesting by preventing information leakage from future data.
Usage
Use this principle after fetching raw price data and before running the trading agent. The merged JSONL file is the primary data source consumed by the agent's prompt construction and price lookup tools.
Theoretical Basis
# Pseudocode for JSONL merge with anti-look-ahead
output = open("merged.jsonl", "w")
for each symbol_file in glob("daily_prices_*.json"):
data = load(symbol_file)
# Rename fields
for each date_entry in data:
rename("1. open" -> "1. buy price")
rename("4. close" -> "4. sell price")
# Anti-look-ahead: mask latest date
latest_date = max(data.keys())
del data[latest_date]["4. sell price"]
output.write(json_line(data))
Key properties:
- Anti-look-ahead bias: Prevents the agent from seeing the closing price of the most recent date
- Field normalization: Renames fields to match trading semantics (open = buy price, close = sell price)
- Symbol filtering: Only includes symbols in the target universe (e.g., NASDAQ-100 list)