Implementation:Run llama Llama index CustomLLM
Overview
CustomLLM is an abstract base class that simplifies the process of integrating custom language models into LlamaIndex. It extends the core LLM class and provides default implementations for chat and async methods by delegating to the completion interface. Subclasses only need to implement _complete, _stream_complete, and the metadata property.
Source file: llama-index-core/llama_index/core/llms/custom.py (91 lines)
Class Hierarchy
LLM └── CustomLLM
CustomLLM inherits from LLM, which is the main language model base class in LlamaIndex.
Design Pattern
The class follows a "completion-first" design: all chat-based methods are implemented by converting chat messages to a single prompt string using messages_to_prompt, then delegating to the completion methods. This means subclasses only need to implement the simpler completion interface.
Constructor
def __init__(self, *args: Any, **kwargs: Any):
super().__init__(*args, **kwargs)
A pass-through constructor that delegates to the parent LLM class.
Chat Methods
chat
@llm_chat_callback() def chat(self, messages: Sequence[ChatMessage], **kwargs: Any) -> ChatResponse:
Synchronous chat implementation:
- Asserts that
messages_to_promptis set. - Converts messages to a single prompt string.
- Calls
self.completewithformatted=True. - Converts the
CompletionResponseto aChatResponseusingcompletion_response_to_chat_response.
stream_chat
@llm_chat_callback()
def stream_chat(
self, messages: Sequence[ChatMessage], **kwargs: Any
) -> ChatResponseGen:
Streaming chat implementation:
- Asserts that
messages_to_promptis set. - Converts messages to a prompt and calls
self.stream_complete. - Converts the streaming completion response to a streaming chat response using
stream_completion_response_to_chat_response.
Async Methods
All async methods are implemented by wrapping their synchronous counterparts. This is a convenience for custom implementations that do not have native async support.
achat
@llm_chat_callback() async def achat(self, messages: Sequence[ChatMessage], **kwargs: Any) -> ChatResponse:
Delegates directly to self.chat.
astream_chat
@llm_chat_callback() async def astream_chat(self, messages: Sequence[ChatMessage], **kwargs: Any) -> ChatResponseAsyncGen:
Creates an async generator that wraps the synchronous stream_chat generator. Uses an inner gen() function to yield messages asynchronously.
acomplete
@llm_completion_callback() async def acomplete(self, prompt: str, formatted: bool = False, **kwargs: Any) -> CompletionResponse:
Delegates directly to self.complete.
astream_complete
@llm_completion_callback() async def astream_complete(self, prompt: str, formatted: bool = False, **kwargs: Any) -> CompletionResponseAsyncGen:
Creates an async generator that wraps the synchronous stream_complete generator.
Class Name
@classmethod
def class_name(cls) -> str:
return "custom_llm"
Returns "custom_llm" for serialization and type identification.
Callback Decorators
All methods are decorated with either @llm_chat_callback() or @llm_completion_callback(), which install event tracking for the callback manager system.
Required Subclass Implementations
Subclasses of CustomLLM must implement:
| Member | Type | Description |
|---|---|---|
_complete |
Method | Core synchronous completion logic |
_stream_complete |
Method | Core streaming completion logic |
metadata |
Property | Returns an LLMMetadata object describing the model
|
Dependencies
llama_index.core.base.llms.generic_utils-- providescompletion_response_to_chat_responseandstream_completion_response_to_chat_responsellama_index.core.base.llms.types-- providesChatMessage,ChatResponse,ChatResponseAsyncGen,ChatResponseGen,CompletionResponse,CompletionResponseAsyncGenllama_index.core.llms.callbacks-- providesllm_chat_callbackandllm_completion_callbackllama_index.core.llms.llm.LLM-- parent class