Principle:EvolvingLMMs Lab Lmms eval MCP Tool Calling
| Knowledge Sources | |
|---|---|
| Domains | Model Inference, Tool Integration |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
MCP tool calling enables language models to invoke external tools through the Model Context Protocol during inference.
Description
The Model Context Protocol (MCP) provides a standardized way for language models to call external tools during inference. The model generates text that includes tool call requests, which are parsed and executed by an MCP client. Tool results are then fed back into the model's context as additional messages, allowing multi-turn interactions where the model can use tool outputs to refine its answers. This enables models to access external resources like image processing, web search, or computation tools.
Usage
Apply this principle when your evaluation tasks require models to interact with external tools, perform multi-step reasoning with tool assistance, or access capabilities beyond pure text generation (e.g., image manipulation, calculator functions, web APIs).
Theoretical Basis
Tool Calling Loop
- Generation: Model generates text that may include tool call syntax
- Detection: Parser checks if finish_reason == "tool_calls"
- Parsing: Extract tool name and arguments from generated text
- Execution: MCPClient.run_tool(tool_name, arguments) invokes the tool
- Formatting: Convert tool result to OpenAI-compatible message format
- Context Update: Append tool message as {"role": "tool", "name": ..., "content": ...}
- Next Turn: Generate next response with updated context including tool results
- Termination: Continue until model produces final answer or max_turn reached
MCP Server Requirements
- Standalone Script: Must be a Python script that can run independently
- Tool Definitions: Exposes available tools with clear descriptions and input schemas
- Error Handling: Gracefully handles errors and returns structured responses
- Response Format: Returns TextContent or ImageContent in standardized format
- Performance: Avoids long-running operations that could cause timeouts
Best Practices
- Set batch_size=1 when tools are enabled (sequential processing required)
- Configure max_turn appropriately (5-10 recommended for most tasks)
- Allocate sufficient work_dir space for temporary files
- Keep tools focused on single, well-defined tasks
- Provide clear, specific tool descriptions for better model understanding