Implementation:Run llama Llama index MockLLM

Knowledge Sources	Run_llama_Llama_index
Domains	LLM, Testing, Mock
Last Updated	2026-02-11 19:00 GMT

Overview

Provides mock LLM implementations (MockLLM, MockLLMWithChatMemoryOfLastCall, and MockFunctionCallingLLM) for use in testing and development without requiring actual LLM API calls.

Description

The mock.py module defines three mock LLM classes that extend the LlamaIndex LLM hierarchy for testing purposes:

MockLLM extends CustomLLM and provides a simple deterministic LLM that either echoes the input prompt or generates repeated "text" tokens up to a configurable max_tokens limit. It supports both complete and stream_complete methods.

MockLLMWithNonyieldingChatStream extends MockLLM and provides a stream_chat implementation that yields nothing, useful for testing empty stream handling.

MockLLMWithChatMemoryOfLastCall extends MockLLM and records the last set of ChatMessage objects passed to any chat method (chat, stream_chat, achat, astream_chat), enabling test assertions on the messages that would have been sent to a real LLM.

MockFunctionCallingLLM extends FunctionCallingLLM and simulates a function-calling LLM. It supports customizable response generation via a response_generator callback and a blocks_to_content_callback for converting multi-modal content blocks (text, images, audio, documents, video, thinking blocks, citations) into string content. It implements _prepare_chat_with_tools and get_tool_calls_from_response to simulate tool calling workflows.

The module also includes helper functions _data_from_binary for base64 decoding binary block data and _default_blocks_to_content_callback for serializing a list of ContentBlock objects into a single string.

Usage

Use these mock LLMs in unit tests, integration tests, and development scenarios where you want to verify pipeline behavior without incurring real LLM API costs or latency. MockLLM is suitable for simple completion testing, MockLLMWithChatMemoryOfLastCall for verifying chat message construction, and MockFunctionCallingLLM for testing tool-calling agent workflows.

Code Reference

Source Location

Repository: Run_llama_Llama_index
File: llama-index-core/llama_index/core/llms/mock.py
Lines: 1-503

Signature

class MockLLM(CustomLLM):
    max_tokens: Optional[int]

    def __init__(
        self,
        max_tokens: Optional[int] = None,
        callback_manager: Optional[CallbackManager] = None,
        system_prompt: Optional[str] = None,
        messages_to_prompt: Optional[MessagesToPromptType] = None,
        completion_to_prompt: Optional[CompletionToPromptType] = None,
        pydantic_program_mode: PydanticProgramMode = PydanticProgramMode.DEFAULT,
    ) -> None: ...

class MockLLMWithChatMemoryOfLastCall(MockLLM):
    last_chat_messages: Optional[Sequence[ChatMessage]]
    last_called_chat_function: List[str]
    ...

class MockFunctionCallingLLM(FunctionCallingLLM):
    tool_calls: List[ToolCallBlock]
    blocks_to_content_callback: BlockToContentCallback

    def __init__(
        self,
        callback_manager: Optional[CallbackManager] = None,
        system_prompt: Optional[str] = None,
        messages_to_prompt: Optional[MessagesToPromptType] = None,
        completion_to_prompt: Optional[CompletionToPromptType] = None,
        pydantic_program_mode: PydanticProgramMode = PydanticProgramMode.DEFAULT,
        blocks_to_content_callback: Optional[BlockToContentCallback] = None,
        response_generator: Optional[ResponseGenerator] = None,
        **kwargs: Any,
    ) -> None: ...

Import

from llama_index.core.llms.mock import MockLLM
from llama_index.core.llms.mock import MockLLMWithChatMemoryOfLastCall
from llama_index.core.llms.mock import MockFunctionCallingLLM

I/O Contract

Inputs (MockLLM)

Name	Type	Required	Description
max_tokens	Optional[int]	No	Maximum number of tokens to generate; if None the prompt is echoed back
callback_manager	Optional[CallbackManager]	No	Callback manager for LLM event hooks
system_prompt	Optional[str]	No	System prompt to prepend to conversations
messages_to_prompt	Optional[MessagesToPromptType]	No	Function to convert messages to a prompt string
completion_to_prompt	Optional[CompletionToPromptType]	No	Function to convert a completion string to a prompt
pydantic_program_mode	PydanticProgramMode	No	Mode for structured output generation (default: DEFAULT)

Inputs (MockFunctionCallingLLM)

Name	Type	Required	Description
callback_manager	Optional[CallbackManager]	No	Callback manager for LLM event hooks
system_prompt	Optional[str]	No	System prompt to prepend to conversations
blocks_to_content_callback	Optional[BlockToContentCallback]	No	Custom callback for converting content blocks to string
response_generator	Optional[ResponseGenerator]	No	Custom function to generate mock chat responses from messages

Outputs

Name	Type	Description
complete()	CompletionResponse	Returns a completion response with generated or echoed text
stream_complete()	CompletionResponseGen	Yields streaming completion response chunks
chat()	ChatResponse	Returns a chat response (MockFunctionCallingLLM)
get_tool_calls_from_response()	List[ToolSelection]	Extracts tool call selections from a chat response

Usage Examples

Basic Usage

from llama_index.core.llms.mock import MockLLM

# Echo mode: returns the prompt as the response
llm = MockLLM()
response = llm.complete("What is the meaning of life?")
print(response.text)  # "What is the meaning of life?"

# Token generation mode: generates repeated "text" tokens
llm = MockLLM(max_tokens=5)
response = llm.complete("ignored prompt")
print(response.text)  # "text text text text text"

Using MockLLMWithChatMemoryOfLastCall

from llama_index.core.llms.mock import MockLLMWithChatMemoryOfLastCall
from llama_index.core.base.llms.types import ChatMessage

llm = MockLLMWithChatMemoryOfLastCall()
messages = [ChatMessage(role="user", content="Hello")]
llm.chat(messages)

# Inspect what messages were sent
assert llm.last_chat_messages is not None
assert llm.last_called_chat_function == ["chat"]

Using MockFunctionCallingLLM

from llama_index.core.llms.mock import MockFunctionCallingLLM

llm = MockFunctionCallingLLM()
response = llm.chat([ChatMessage(role="user", content="Call a tool")])
tool_calls = llm.get_tool_calls_from_response(response)

Related Pages

Environment:Run_llama_Llama_index_Python_LlamaIndex_Core

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment