Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Run llama Llama index CustomLLM

From Leeroopedia
Revision as of 11:47, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Run_llama_Llama_index_CustomLLM.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Overview

CustomLLM is an abstract base class that simplifies the process of integrating custom language models into LlamaIndex. It extends the core LLM class and provides default implementations for chat and async methods by delegating to the completion interface. Subclasses only need to implement _complete, _stream_complete, and the metadata property.

Source file: llama-index-core/llama_index/core/llms/custom.py (91 lines)

Class Hierarchy

LLM
  └── CustomLLM

CustomLLM inherits from LLM, which is the main language model base class in LlamaIndex.

Design Pattern

The class follows a "completion-first" design: all chat-based methods are implemented by converting chat messages to a single prompt string using messages_to_prompt, then delegating to the completion methods. This means subclasses only need to implement the simpler completion interface.

Constructor

def __init__(self, *args: Any, **kwargs: Any):
    super().__init__(*args, **kwargs)

A pass-through constructor that delegates to the parent LLM class.

Chat Methods

chat

@llm_chat_callback()
def chat(self, messages: Sequence[ChatMessage], **kwargs: Any) -> ChatResponse:

Synchronous chat implementation:

  1. Asserts that messages_to_prompt is set.
  2. Converts messages to a single prompt string.
  3. Calls self.complete with formatted=True.
  4. Converts the CompletionResponse to a ChatResponse using completion_response_to_chat_response.

stream_chat

@llm_chat_callback()
def stream_chat(
    self, messages: Sequence[ChatMessage], **kwargs: Any
) -> ChatResponseGen:

Streaming chat implementation:

  1. Asserts that messages_to_prompt is set.
  2. Converts messages to a prompt and calls self.stream_complete.
  3. Converts the streaming completion response to a streaming chat response using stream_completion_response_to_chat_response.

Async Methods

All async methods are implemented by wrapping their synchronous counterparts. This is a convenience for custom implementations that do not have native async support.

achat

@llm_chat_callback()
async def achat(self, messages: Sequence[ChatMessage], **kwargs: Any) -> ChatResponse:

Delegates directly to self.chat.

astream_chat

@llm_chat_callback()
async def astream_chat(self, messages: Sequence[ChatMessage], **kwargs: Any) -> ChatResponseAsyncGen:

Creates an async generator that wraps the synchronous stream_chat generator. Uses an inner gen() function to yield messages asynchronously.

acomplete

@llm_completion_callback()
async def acomplete(self, prompt: str, formatted: bool = False, **kwargs: Any) -> CompletionResponse:

Delegates directly to self.complete.

astream_complete

@llm_completion_callback()
async def astream_complete(self, prompt: str, formatted: bool = False, **kwargs: Any) -> CompletionResponseAsyncGen:

Creates an async generator that wraps the synchronous stream_complete generator.

Class Name

@classmethod
def class_name(cls) -> str:
    return "custom_llm"

Returns "custom_llm" for serialization and type identification.

Callback Decorators

All methods are decorated with either @llm_chat_callback() or @llm_completion_callback(), which install event tracking for the callback manager system.

Required Subclass Implementations

Subclasses of CustomLLM must implement:

Member Type Description
_complete Method Core synchronous completion logic
_stream_complete Method Core streaming completion logic
metadata Property Returns an LLMMetadata object describing the model

Dependencies

  • llama_index.core.base.llms.generic_utils -- provides completion_response_to_chat_response and stream_completion_response_to_chat_response
  • llama_index.core.base.llms.types -- provides ChatMessage, ChatResponse, ChatResponseAsyncGen, ChatResponseGen, CompletionResponse, CompletionResponseAsyncGen
  • llama_index.core.llms.callbacks -- provides llm_chat_callback and llm_completion_callback
  • llama_index.core.llms.llm.LLM -- parent class

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment