Heuristic:Infiniflow Ragflow Agent Max Rounds Strategy
| Knowledge Sources | |
|---|---|
| Domains | Agents, Optimization |
| Last Updated | 2026-02-12 06:00 GMT |
Overview
Agent tool-calling loops are capped at 5 rounds (configurable via max_rounds), after which the agent is force-prompted to synthesize a final answer from all gathered information.
Description
RAGFlow's agent system uses a tool-calling loop where the LLM can invoke external tools (search, SQL, API calls, etc.) iteratively. To prevent infinite loops and excessive API costs, the loop is capped at `max_rounds` (default 5). When the limit is reached, the agent receives a forced instruction to "provide a DIRECT and COMPREHENSIVE final answer" using all information gathered during the tool-calling rounds. Additionally, conversation history is compressed when it exceeds 12 messages by keeping only the first 2 messages and last 10.
Usage
This heuristic applies to all agent workflows that use tool-calling. Configure `max_rounds` in the agent component settings. Increase it for complex multi-step research tasks; decrease it for cost-sensitive applications. The history compression activates automatically for long conversations.
The Insight (Rule of Thumb)
- Action: Set `max_rounds=5` (default). When exceeded, force a final answer prompt. Compress history at >12 messages.
- Value: max_rounds=5, history compression threshold=12 messages (keep first 2 + last 10).
- Trade-off: Higher max_rounds allows more thorough research but increases latency and API costs. Lower values may produce incomplete answers for complex queries.
Reasoning
Five rounds is sufficient for most tool-calling scenarios (e.g., search → refine → verify → answer). The forced final answer ensures the user always gets a response rather than an error. History compression prevents context window overflow while preserving the system prompt (message 0), initial user query (message 1), and the most recent conversation turns (last 10). Citations are only inserted when history length is less than 7 to avoid cluttering multi-turn conversations.
Code Evidence from `agent/component/agent_with_tools.py:77,417-429`:
for _ in range(self._param.max_rounds + 1):
# ... tool calling loop ...
logging.warning(f"Exceed max rounds: {self._param.max_rounds}")
final_instruction = ("Based on ALL the information you have gathered from your "
"research and tool calls above, please provide a DIRECT and "
"COMPREHENSIVE final answer...")
History compression from `agent/component/agent_with_tools.py:479-480`:
# If history > 12 messages, compress to keep context window manageable
_hist = [hist[0], hist[1], *hist[-10:]]