Principle:HKUDS AI Trader Web Search Integration
| Knowledge Sources | |
|---|---|
| Domains | Information_Retrieval, MCP_Tools, Agent_Tooling |
| Last Updated | 2026-02-09 14:00 GMT |
Overview
Principle of providing trading agents with real-time web search capabilities to retrieve financial news and analysis as contextual input for trading decisions.
Description
Web search integration enables LLM-based trading agents to access external information beyond their training data and the local price data. This is a form of retrieval-augmented generation (RAG) applied to financial decision-making: the agent formulates a search query, retrieves relevant web content, and incorporates it into its reasoning. The implementation enforces temporal correctness by filtering search results against the simulation date (TODAY_DATE) to prevent future data leakage during backtesting. The search-then-scrape pipeline first discovers relevant URLs, then extracts structured content (title, description, text) from each page.
Usage
Apply this principle when the trading agent needs access to qualitative information (news, analysis, earnings reports) that cannot be derived from price data alone. The web search tool is exposed via MCP and called by the agent during its decision loop.
Theoretical Basis
The web search integration follows a two-stage retrieval pipeline:
# Abstract algorithm description
# Stage 1: Search (query → URLs)
urls = search_api(query, max_results=N)
urls = filter_by_date(urls, cutoff=TODAY_DATE) # Prevent data leakage
# Stage 2: Scrape (URL → structured content)
for url in sample(urls, k=1):
content = reader_api(url)
yield {title, description, content[:1000], publish_time}
Key design decisions:
- Single result sampling: Only 1 URL is scraped per query to minimize latency and API costs
- Date filtering: Results published after the simulation date are excluded
- Content truncation: Only the first 1000 characters are returned to fit LLM context windows