Implementation:Run llama Llama index FunctionCallingLLM
| Knowledge Sources | |
|---|---|
| Domains | LLM Integration, Function Calling, Tool Use |
| Last Updated | 2026-02-11 19:00 GMT |
Overview
This module defines the FunctionCallingLLM abstract base class, which extends the base LLM class with function/tool calling capabilities including chat with tools, streaming with tools, and predict-and-call workflows.
Description
The function_calling.py module provides FunctionCallingLLM, an abstract subclass of LLM that adds structured function calling support. This is the base class for all LLM integrations that support native tool/function calling (e.g., OpenAI, Anthropic, Google models).
FunctionCallingLLM provides several key method groups:
Chat with Tools Methods:
- chat_with_tools - Synchronous method that prepares tool-augmented chat arguments via _prepare_chat_with_tools_compat, calls self.chat, and validates the response
- achat_with_tools - Asynchronous counterpart using self.achat
- stream_chat_with_tools - Streaming variant using self.stream_chat (no validation for streaming outputs)
- astream_chat_with_tools - Async streaming variant using self.astream_chat
All four methods accept tools (a sequence of BaseTool), optional user_msg (string or ChatMessage), optional chat_history, verbose flag, allow_parallel_tool_calls flag, and tool_required flag.
Tool Preparation Methods:
- _prepare_chat_with_tools - Abstract method that subclasses must implement to convert tools into the LLM-specific format and return a kwargs dictionary suitable for self.chat
- _prepare_chat_with_tools_compat - Compatibility wrapper that checks (using the cached _supports_tool_required helper) whether the subclass's _prepare_chat_with_tools implementation supports the tool_required parameter, and omits it if not supported (with a logged warning)
- _validate_chat_with_tools_response - Hook for subclasses to validate the chat response (default is passthrough)
Tool Extraction:
- get_tool_calls_from_response - Extracts ToolSelection objects from a ChatResponse. Raises NotImplementedError by default; subclasses must override.
Predict and Call Methods:
- predict_and_call - End-to-end synchronous method that calls chat_with_tools, extracts tool calls, executes them using call_tool_with_selection, and returns an AgentChatResponse. Handles parallel tool calls by concatenating outputs. Falls back to the parent LLM.predict_and_call if the model does not report is_function_calling_model.
- apredict_and_call - Async counterpart that uses achat_with_tools and asyncio.gather for parallel tool execution via acall_tool_with_selection.
Both predict_and_call methods support error_on_no_tool_call and error_on_tool_error flags for controlling error behavior.
The module-level _supports_tool_required function is decorated with @functools.lru_cache(maxsize=1000) and uses inspect.signature to check whether a given subclass's _prepare_chat_with_tools method includes the tool_required parameter, providing backward compatibility with older LLM integrations.
Usage
Subclass FunctionCallingLLM when implementing an LLM integration that supports native function/tool calling. Implement _prepare_chat_with_tools and get_tool_calls_from_response at minimum. Use predict_and_call or apredict_and_call for end-to-end tool usage workflows in agents. Use chat_with_tools for lower-level control over the tool calling process.
Code Reference
Source Location
- Repository: Run_llama_Llama_index
- File: llama-index-core/llama_index/core/llms/function_calling.py
- Lines: 1-347
Signature
class FunctionCallingLLM(LLM):
def __init__(self, *args: Any, **kwargs: Any) -> None: ...
def chat_with_tools(
self,
tools: Sequence["BaseTool"],
user_msg: Optional[Union[str, ChatMessage]] = None,
chat_history: Optional[List[ChatMessage]] = None,
verbose: bool = False,
allow_parallel_tool_calls: bool = False,
tool_required: bool = False,
**kwargs: Any,
) -> ChatResponse: ...
async def achat_with_tools(
self,
tools: Sequence["BaseTool"],
user_msg: Optional[Union[str, ChatMessage]] = None,
chat_history: Optional[List[ChatMessage]] = None,
verbose: bool = False,
allow_parallel_tool_calls: bool = False,
tool_required: bool = False,
**kwargs: Any,
) -> ChatResponse: ...
@abstractmethod
def _prepare_chat_with_tools(
self,
tools: Sequence["BaseTool"],
user_msg: Optional[Union[str, ChatMessage]] = None,
chat_history: Optional[List[ChatMessage]] = None,
verbose: bool = False,
allow_parallel_tool_calls: bool = False,
tool_required: bool = False,
**kwargs: Any,
) -> Dict[str, Any]: ...
def get_tool_calls_from_response(
self,
response: ChatResponse,
error_on_no_tool_call: bool = True,
**kwargs: Any,
) -> List[ToolSelection]: ...
def predict_and_call(
self,
tools: Sequence["BaseTool"],
user_msg: Optional[Union[str, ChatMessage]] = None,
chat_history: Optional[List[ChatMessage]] = None,
verbose: bool = False,
allow_parallel_tool_calls: bool = False,
error_on_no_tool_call: bool = True,
error_on_tool_error: bool = False,
**kwargs: Any,
) -> "AgentChatResponse": ...
async def apredict_and_call(
self,
tools: Sequence["BaseTool"],
user_msg: Optional[Union[str, ChatMessage]] = None,
chat_history: Optional[List[ChatMessage]] = None,
verbose: bool = False,
allow_parallel_tool_calls: bool = False,
error_on_no_tool_call: bool = True,
error_on_tool_error: bool = False,
**kwargs: Any,
) -> "AgentChatResponse": ...
@functools.lru_cache(maxsize=1000)
def _supports_tool_required(
cls: Type[FunctionCallingLLM], tool_required: bool
) -> bool: ...
Import
from llama_index.core.llms.function_calling import FunctionCallingLLM
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| tools | Sequence[BaseTool] | Yes | The tools/functions available for the LLM to call |
| user_msg | Optional[Union[str, ChatMessage]] | No | The user message to send to the LLM |
| chat_history | Optional[List[ChatMessage]] | No | Previous chat messages for context |
| verbose | bool | No | Whether to print verbose output (default False) |
| allow_parallel_tool_calls | bool | No | Whether to allow the LLM to call multiple tools in one turn (default False) |
| tool_required | bool | No | If True, the LLM should only call tools and not return a direct text response (default False) |
| error_on_no_tool_call | bool | No | Whether to raise an error if no tool call is found in the response (default True) |
| error_on_tool_error | bool | No | Whether to raise an error if a tool call returns an error (default False) |
| **kwargs | Any | No | Additional keyword arguments passed to the underlying chat method |
Outputs
| Name | Type | Description |
|---|---|---|
| return (chat_with_tools) | ChatResponse | The LLM chat response potentially containing tool call information |
| return (stream_chat_with_tools) | ChatResponseGen | A synchronous generator of streaming chat response chunks |
| return (astream_chat_with_tools) | ChatResponseAsyncGen | An async generator of streaming chat response chunks |
| return (get_tool_calls_from_response) | List[ToolSelection] | List of tool selections extracted from the LLM response |
| return (predict_and_call) | AgentChatResponse | The agent response containing tool output text and source tool outputs |
Usage Examples
Basic Usage
from llama_index.core.llms.function_calling import FunctionCallingLLM
from llama_index.core.tools import FunctionTool
# Define a tool
def multiply(a: int, b: int) -> int:
"""Multiply two integers and return the result."""
return a * b
tool = FunctionTool.from_defaults(fn=multiply)
# Use with an OpenAI model (which extends FunctionCallingLLM)
from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-4")
# Chat with tools - low level
response = llm.chat_with_tools(
tools=[tool],
user_msg="What is 6 times 7?",
verbose=True,
)
tool_calls = llm.get_tool_calls_from_response(response)
# Predict and call - end to end
agent_response = llm.predict_and_call(
tools=[tool],
user_msg="What is 6 times 7?",
verbose=True,
)
print(agent_response.response) # "42"
Async Parallel Tool Calls
import asyncio
from llama_index.core.tools import FunctionTool
from llama_index.llms.openai import OpenAI
def add(a: int, b: int) -> int:
"""Add two numbers."""
return a + b
def multiply(a: int, b: int) -> int:
"""Multiply two numbers."""
return a * b
tools = [
FunctionTool.from_defaults(fn=add),
FunctionTool.from_defaults(fn=multiply),
]
llm = OpenAI(model="gpt-4")
async def main():
response = await llm.apredict_and_call(
tools=tools,
user_msg="Add 3 and 5, and also multiply 4 and 6.",
allow_parallel_tool_calls=True,
verbose=True,
)
print(response.response)
print(response.sources) # List of ToolOutput objects
asyncio.run(main())