Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:HKUDS AI Trader JSONL Data Merging

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, ETL
Last Updated 2026-02-09 14:00 GMT

Overview

A data transformation pattern that merges individual per-symbol JSON price files into a single JSONL file with field renaming and anti-look-ahead masking for agent consumption.

Description

JSONL Data Merging is a critical ETL step that transforms raw per-symbol JSON price data into a consolidated format consumable by the trading agent's prompt system. The merge process performs three key operations:

  1. Consolidation: Multiple per-symbol JSON files are combined into a single JSONL file (one JSON object per line per symbol)
  2. Field renaming: Raw OHLCV fields are renamed to trading-oriented names (e.g., "1. open" becomes "1. buy price", "4. close" becomes "4. sell price")
  3. Anti-look-ahead masking: For the most recent date in each symbol's data, only the buy price is retained while the sell price is removed, preventing the agent from seeing future closing prices

This pattern ensures data integrity for backtesting by preventing information leakage from future data.

Usage

Use this principle after fetching raw price data and before running the trading agent. The merged JSONL file is the primary data source consumed by the agent's prompt construction and price lookup tools.

Theoretical Basis

# Pseudocode for JSONL merge with anti-look-ahead
output = open("merged.jsonl", "w")
for each symbol_file in glob("daily_prices_*.json"):
    data = load(symbol_file)
    # Rename fields
    for each date_entry in data:
        rename("1. open" -> "1. buy price")
        rename("4. close" -> "4. sell price")
    # Anti-look-ahead: mask latest date
    latest_date = max(data.keys())
    del data[latest_date]["4. sell price"]
    output.write(json_line(data))

Key properties:

  • Anti-look-ahead bias: Prevents the agent from seeing the closing price of the most recent date
  • Field normalization: Renames fields to match trading semantics (open = buy price, close = sell price)
  • Symbol filtering: Only includes symbols in the target universe (e.g., NASDAQ-100 list)

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment