Implementation:Run llama Llama index CustomLLM

Overview

CustomLLM is an abstract base class that simplifies the process of integrating custom language models into LlamaIndex. It extends the core LLM class and provides default implementations for chat and async methods by delegating to the completion interface. Subclasses only need to implement _complete, _stream_complete, and the metadata property.

Source file: llama-index-core/llama_index/core/llms/custom.py (91 lines)

Class Hierarchy

LLM
  └── CustomLLM

CustomLLM inherits from LLM, which is the main language model base class in LlamaIndex.

Design Pattern

The class follows a "completion-first" design: all chat-based methods are implemented by converting chat messages to a single prompt string using messages_to_prompt, then delegating to the completion methods. This means subclasses only need to implement the simpler completion interface.

Constructor

def __init__(self, *args: Any, **kwargs: Any):
    super().__init__(*args, **kwargs)

A pass-through constructor that delegates to the parent LLM class.

Chat Methods

chat

@llm_chat_callback()
def chat(self, messages: Sequence[ChatMessage], **kwargs: Any) -> ChatResponse:

Synchronous chat implementation:

Asserts that messages_to_prompt is set.
Converts messages to a single prompt string.
Calls self.complete with formatted=True.
Converts the CompletionResponse to a ChatResponse using completion_response_to_chat_response.

stream_chat

@llm_chat_callback()
def stream_chat(
    self, messages: Sequence[ChatMessage], **kwargs: Any
) -> ChatResponseGen:

Streaming chat implementation:

Asserts that messages_to_prompt is set.
Converts messages to a prompt and calls self.stream_complete.
Converts the streaming completion response to a streaming chat response using stream_completion_response_to_chat_response.

Async Methods

All async methods are implemented by wrapping their synchronous counterparts. This is a convenience for custom implementations that do not have native async support.

achat

@llm_chat_callback()
async def achat(self, messages: Sequence[ChatMessage], **kwargs: Any) -> ChatResponse:

Delegates directly to self.chat.

astream_chat

@llm_chat_callback()
async def astream_chat(self, messages: Sequence[ChatMessage], **kwargs: Any) -> ChatResponseAsyncGen:

Creates an async generator that wraps the synchronous stream_chat generator. Uses an inner gen() function to yield messages asynchronously.

acomplete

@llm_completion_callback()
async def acomplete(self, prompt: str, formatted: bool = False, **kwargs: Any) -> CompletionResponse:

Delegates directly to self.complete.

astream_complete

@llm_completion_callback()
async def astream_complete(self, prompt: str, formatted: bool = False, **kwargs: Any) -> CompletionResponseAsyncGen:

Creates an async generator that wraps the synchronous stream_complete generator.

Class Name

@classmethod
def class_name(cls) -> str:
    return "custom_llm"

Returns "custom_llm" for serialization and type identification.

Callback Decorators

All methods are decorated with either @llm_chat_callback() or @llm_completion_callback(), which install event tracking for the callback manager system.

Required Subclass Implementations

Subclasses of CustomLLM must implement:

Member	Type	Description
`_complete`	Method	Core synchronous completion logic
`_stream_complete`	Method	Core streaming completion logic
`metadata`	Property	Returns an `LLMMetadata` object describing the model

Dependencies

llama_index.core.base.llms.generic_utils -- provides completion_response_to_chat_response and stream_completion_response_to_chat_response
llama_index.core.base.llms.types -- provides ChatMessage, ChatResponse, ChatResponseAsyncGen, ChatResponseGen, CompletionResponse, CompletionResponseAsyncGen
llama_index.core.llms.callbacks -- provides llm_chat_callback and llm_completion_callback
llama_index.core.llms.llm.LLM -- parent class

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment