Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Run llama Llama index LLM Callbacks

From Leeroopedia
Knowledge Sources
Domains LLM Integration, Callbacks, Instrumentation
Last Updated 2026-02-11 19:00 GMT

Overview

This module provides decorator functions for instrumenting LLM chat and completion methods with callback events and dispatcher-based instrumentation, supporting both synchronous and asynchronous execution as well as streaming generators.

Description

The llms/callbacks.py module defines two decorator factory functions that form the core instrumentation layer for all LLM interactions in LlamaIndex: llm_chat_callback and llm_completion_callback.

llm_chat_callback() is a decorator factory that returns a decorator for wrapping LLM chat methods. When applied, it:

  1. Ensures the decorated LLM instance has a valid CallbackManager (creating one if needed) via a wrapper_logic context manager
  2. Fires an LLMChatStartEvent through the instrumentation dispatcher with the model dictionary (excluding API keys), messages, and additional kwargs
  3. Starts a callback event of type CBEventType.LLM with the messages and serialized model info
  4. Executes the original function and handles the response based on its type:
    • For AsyncGenerator returns (async streaming): wraps the generator to emit LLMChatInProgressEvent for each chunk, then LLMChatEndEvent when complete
    • For Generator returns (sync streaming): wraps the generator similarly for synchronous iteration
    • For non-streaming returns: directly fires the end event with the full response
  5. Handles exceptions by firing callback end events with the exception payload and ExceptionEvent through the dispatcher
  6. Preserves function metadata (__name__, __qualname__, __doc__, etc.) on wrapper functions
  7. Detects if a function is already wrapped (via __wrapped__ attribute) and uses a simple passthrough wrapper to avoid double-instrumentation

llm_completion_callback() follows an identical pattern but is tailored for completion-style LLM calls. Key differences include:

  • It extracts the prompt from positional or keyword arguments via an extract_prompt helper function
  • Fires LLMCompletionStartEvent, LLMCompletionInProgressEvent, and LLMCompletionEndEvent instead of chat events
  • Uses EventPayload.PROMPT and EventPayload.COMPLETION instead of EventPayload.MESSAGES and EventPayload.RESPONSE

Both decorators automatically detect whether the wrapped function is a coroutine (via inspect.iscoroutinefunction) and return the appropriate sync or async wrapper. The dispatcher is initialized at module level using get_dispatcher(__name__).

Usage

Use @llm_chat_callback() as a decorator on LLM chat and stream_chat methods (and their async variants) to automatically instrument them with callback events and observability. Use @llm_completion_callback() similarly on complete and stream_complete methods. These decorators are already applied to the base LLM class methods in LlamaIndex and should be applied when implementing custom LLM integrations.

Code Reference

Source Location

Signature

def llm_chat_callback() -> Callable:
    def wrap(f: Callable) -> Callable:
        ...
    return wrap

def llm_completion_callback() -> Callable:
    def wrap(f: Callable) -> Callable:
        ...
    return wrap

Import

from llama_index.core.llms.callbacks import (
    llm_chat_callback,
    llm_completion_callback,
)

I/O Contract

Inputs

Name Type Required Description
f Callable Yes (internal) The LLM method to be decorated
_self Any Yes (internal) The LLM instance (must have callback_manager attribute or one will be created)
messages Sequence[ChatMessage] Yes (chat callback) The chat messages passed to the LLM chat method
prompt str Yes (completion callback) The prompt string passed to the LLM completion method (extracted from args or kwargs)
**kwargs Any No Additional keyword arguments passed through to the underlying LLM method

Outputs

Name Type Description
return (chat) ChatResponse or ChatResponseGen or ChatResponseAsyncGen The original return value from the decorated chat method, potentially wrapped with instrumentation generators
return (completion) CompletionResponse or CompletionResponseGen or CompletionResponseAsyncGen The original return value from the decorated completion method, potentially wrapped with instrumentation generators

Usage Examples

Basic Usage

from llama_index.core.llms.callbacks import llm_chat_callback, llm_completion_callback
from llama_index.core.base.llms.types import ChatMessage, ChatResponse, CompletionResponse
from typing import Sequence, Any

class MyCustomLLM:
    @llm_chat_callback()
    def chat(self, messages: Sequence[ChatMessage], **kwargs: Any) -> ChatResponse:
        # The decorator will automatically:
        # 1. Fire LLMChatStartEvent with the messages
        # 2. Call this method
        # 3. Fire LLMChatEndEvent with the response
        response_text = "This is the LLM response."
        return ChatResponse(
            message=ChatMessage(role="assistant", content=response_text)
        )

    @llm_completion_callback()
    def complete(self, prompt: str, **kwargs: Any) -> CompletionResponse:
        # The decorator will automatically:
        # 1. Fire LLMCompletionStartEvent with the prompt
        # 2. Call this method
        # 3. Fire LLMCompletionEndEvent with the response
        return CompletionResponse(text="Completion result.")

Async Streaming Usage

from llama_index.core.llms.callbacks import llm_chat_callback
from llama_index.core.base.llms.types import (
    ChatMessage,
    ChatResponse,
    ChatResponseAsyncGen,
)
from typing import Sequence, Any, AsyncGenerator

class MyStreamingLLM:
    @llm_chat_callback()
    async def astream_chat(
        self, messages: Sequence[ChatMessage], **kwargs: Any
    ) -> ChatResponseAsyncGen:
        # The decorator detects the async generator return type
        # and wraps it to emit LLMChatInProgressEvent for each chunk
        async def gen() -> ChatResponseAsyncGen:
            for token in ["Hello", " world", "!"]:
                yield ChatResponse(
                    message=ChatMessage(role="assistant", content=token)
                )
        return gen()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment