Implementation:Run llama Llama index LLM Callbacks

Knowledge Sources	Run_llama_Llama_index
Domains	LLM Integration, Callbacks, Instrumentation
Last Updated	2026-02-11 19:00 GMT

Overview

This module provides decorator functions for instrumenting LLM chat and completion methods with callback events and dispatcher-based instrumentation, supporting both synchronous and asynchronous execution as well as streaming generators.

Description

The llms/callbacks.py module defines two decorator factory functions that form the core instrumentation layer for all LLM interactions in LlamaIndex: llm_chat_callback and llm_completion_callback.

llm_chat_callback() is a decorator factory that returns a decorator for wrapping LLM chat methods. When applied, it:

Ensures the decorated LLM instance has a valid CallbackManager (creating one if needed) via a wrapper_logic context manager
Fires an LLMChatStartEvent through the instrumentation dispatcher with the model dictionary (excluding API keys), messages, and additional kwargs
Starts a callback event of type CBEventType.LLM with the messages and serialized model info
Executes the original function and handles the response based on its type:
- For AsyncGenerator returns (async streaming): wraps the generator to emit LLMChatInProgressEvent for each chunk, then LLMChatEndEvent when complete
- For Generator returns (sync streaming): wraps the generator similarly for synchronous iteration
- For non-streaming returns: directly fires the end event with the full response
Handles exceptions by firing callback end events with the exception payload and ExceptionEvent through the dispatcher
Preserves function metadata (__name__, __qualname__, __doc__, etc.) on wrapper functions
Detects if a function is already wrapped (via __wrapped__ attribute) and uses a simple passthrough wrapper to avoid double-instrumentation

llm_completion_callback() follows an identical pattern but is tailored for completion-style LLM calls. Key differences include:

It extracts the prompt from positional or keyword arguments via an extract_prompt helper function
Fires LLMCompletionStartEvent, LLMCompletionInProgressEvent, and LLMCompletionEndEvent instead of chat events
Uses EventPayload.PROMPT and EventPayload.COMPLETION instead of EventPayload.MESSAGES and EventPayload.RESPONSE

Both decorators automatically detect whether the wrapped function is a coroutine (via inspect.iscoroutinefunction) and return the appropriate sync or async wrapper. The dispatcher is initialized at module level using get_dispatcher(__name__).

Usage

Use @llm_chat_callback() as a decorator on LLM chat and stream_chat methods (and their async variants) to automatically instrument them with callback events and observability. Use @llm_completion_callback() similarly on complete and stream_complete methods. These decorators are already applied to the base LLM class methods in LlamaIndex and should be applied when implementing custom LLM integrations.

Code Reference

Source Location

Repository: Run_llama_Llama_index
File: llama-index-core/llama_index/core/llms/callbacks.py
Lines: 1-545

Signature

def llm_chat_callback() -> Callable:
    def wrap(f: Callable) -> Callable:
        ...
    return wrap

def llm_completion_callback() -> Callable:
    def wrap(f: Callable) -> Callable:
        ...
    return wrap

Import

from llama_index.core.llms.callbacks import (
    llm_chat_callback,
    llm_completion_callback,
)

I/O Contract

Inputs

Name	Type	Required	Description
f	Callable	Yes (internal)	The LLM method to be decorated
_self	Any	Yes (internal)	The LLM instance (must have callback_manager attribute or one will be created)
messages	Sequence[ChatMessage]	Yes (chat callback)	The chat messages passed to the LLM chat method
prompt	str	Yes (completion callback)	The prompt string passed to the LLM completion method (extracted from args or kwargs)
**kwargs	Any	No	Additional keyword arguments passed through to the underlying LLM method

Outputs

Name	Type	Description
return (chat)	ChatResponse or ChatResponseGen or ChatResponseAsyncGen	The original return value from the decorated chat method, potentially wrapped with instrumentation generators
return (completion)	CompletionResponse or CompletionResponseGen or CompletionResponseAsyncGen	The original return value from the decorated completion method, potentially wrapped with instrumentation generators

Usage Examples

Basic Usage

from llama_index.core.llms.callbacks import llm_chat_callback, llm_completion_callback
from llama_index.core.base.llms.types import ChatMessage, ChatResponse, CompletionResponse
from typing import Sequence, Any

class MyCustomLLM:
    @llm_chat_callback()
    def chat(self, messages: Sequence[ChatMessage], **kwargs: Any) -> ChatResponse:
        # The decorator will automatically:
        # 1. Fire LLMChatStartEvent with the messages
        # 2. Call this method
        # 3. Fire LLMChatEndEvent with the response
        response_text = "This is the LLM response."
        return ChatResponse(
            message=ChatMessage(role="assistant", content=response_text)
        )

    @llm_completion_callback()
    def complete(self, prompt: str, **kwargs: Any) -> CompletionResponse:
        # The decorator will automatically:
        # 1. Fire LLMCompletionStartEvent with the prompt
        # 2. Call this method
        # 3. Fire LLMCompletionEndEvent with the response
        return CompletionResponse(text="Completion result.")

Async Streaming Usage

from llama_index.core.llms.callbacks import llm_chat_callback
from llama_index.core.base.llms.types import (
    ChatMessage,
    ChatResponse,
    ChatResponseAsyncGen,
)
from typing import Sequence, Any, AsyncGenerator

class MyStreamingLLM:
    @llm_chat_callback()
    async def astream_chat(
        self, messages: Sequence[ChatMessage], **kwargs: Any
    ) -> ChatResponseAsyncGen:
        # The decorator detects the async generator return type
        # and wraps it to emit LLMChatInProgressEvent for each chunk
        async def gen() -> ChatResponseAsyncGen:
            for token in ["Hello", " world", "!"]:
                yield ChatResponse(
                    message=ChatMessage(role="assistant", content=token)
                )
        return gen()

Related Pages

Environment:Run_llama_Llama_index_Python_LlamaIndex_Core

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment