Implementation:Run llama Llama index LLM Callbacks
| Knowledge Sources | |
|---|---|
| Domains | LLM Integration, Callbacks, Instrumentation |
| Last Updated | 2026-02-11 19:00 GMT |
Overview
This module provides decorator functions for instrumenting LLM chat and completion methods with callback events and dispatcher-based instrumentation, supporting both synchronous and asynchronous execution as well as streaming generators.
Description
The llms/callbacks.py module defines two decorator factory functions that form the core instrumentation layer for all LLM interactions in LlamaIndex: llm_chat_callback and llm_completion_callback.
llm_chat_callback() is a decorator factory that returns a decorator for wrapping LLM chat methods. When applied, it:
- Ensures the decorated LLM instance has a valid CallbackManager (creating one if needed) via a wrapper_logic context manager
- Fires an LLMChatStartEvent through the instrumentation dispatcher with the model dictionary (excluding API keys), messages, and additional kwargs
- Starts a callback event of type CBEventType.LLM with the messages and serialized model info
- Executes the original function and handles the response based on its type:
- For AsyncGenerator returns (async streaming): wraps the generator to emit LLMChatInProgressEvent for each chunk, then LLMChatEndEvent when complete
- For Generator returns (sync streaming): wraps the generator similarly for synchronous iteration
- For non-streaming returns: directly fires the end event with the full response
- Handles exceptions by firing callback end events with the exception payload and ExceptionEvent through the dispatcher
- Preserves function metadata (__name__, __qualname__, __doc__, etc.) on wrapper functions
- Detects if a function is already wrapped (via __wrapped__ attribute) and uses a simple passthrough wrapper to avoid double-instrumentation
llm_completion_callback() follows an identical pattern but is tailored for completion-style LLM calls. Key differences include:
- It extracts the prompt from positional or keyword arguments via an extract_prompt helper function
- Fires LLMCompletionStartEvent, LLMCompletionInProgressEvent, and LLMCompletionEndEvent instead of chat events
- Uses EventPayload.PROMPT and EventPayload.COMPLETION instead of EventPayload.MESSAGES and EventPayload.RESPONSE
Both decorators automatically detect whether the wrapped function is a coroutine (via inspect.iscoroutinefunction) and return the appropriate sync or async wrapper. The dispatcher is initialized at module level using get_dispatcher(__name__).
Usage
Use @llm_chat_callback() as a decorator on LLM chat and stream_chat methods (and their async variants) to automatically instrument them with callback events and observability. Use @llm_completion_callback() similarly on complete and stream_complete methods. These decorators are already applied to the base LLM class methods in LlamaIndex and should be applied when implementing custom LLM integrations.
Code Reference
Source Location
- Repository: Run_llama_Llama_index
- File: llama-index-core/llama_index/core/llms/callbacks.py
- Lines: 1-545
Signature
def llm_chat_callback() -> Callable:
def wrap(f: Callable) -> Callable:
...
return wrap
def llm_completion_callback() -> Callable:
def wrap(f: Callable) -> Callable:
...
return wrap
Import
from llama_index.core.llms.callbacks import (
llm_chat_callback,
llm_completion_callback,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| f | Callable | Yes (internal) | The LLM method to be decorated |
| _self | Any | Yes (internal) | The LLM instance (must have callback_manager attribute or one will be created) |
| messages | Sequence[ChatMessage] | Yes (chat callback) | The chat messages passed to the LLM chat method |
| prompt | str | Yes (completion callback) | The prompt string passed to the LLM completion method (extracted from args or kwargs) |
| **kwargs | Any | No | Additional keyword arguments passed through to the underlying LLM method |
Outputs
| Name | Type | Description |
|---|---|---|
| return (chat) | ChatResponse or ChatResponseGen or ChatResponseAsyncGen | The original return value from the decorated chat method, potentially wrapped with instrumentation generators |
| return (completion) | CompletionResponse or CompletionResponseGen or CompletionResponseAsyncGen | The original return value from the decorated completion method, potentially wrapped with instrumentation generators |
Usage Examples
Basic Usage
from llama_index.core.llms.callbacks import llm_chat_callback, llm_completion_callback
from llama_index.core.base.llms.types import ChatMessage, ChatResponse, CompletionResponse
from typing import Sequence, Any
class MyCustomLLM:
@llm_chat_callback()
def chat(self, messages: Sequence[ChatMessage], **kwargs: Any) -> ChatResponse:
# The decorator will automatically:
# 1. Fire LLMChatStartEvent with the messages
# 2. Call this method
# 3. Fire LLMChatEndEvent with the response
response_text = "This is the LLM response."
return ChatResponse(
message=ChatMessage(role="assistant", content=response_text)
)
@llm_completion_callback()
def complete(self, prompt: str, **kwargs: Any) -> CompletionResponse:
# The decorator will automatically:
# 1. Fire LLMCompletionStartEvent with the prompt
# 2. Call this method
# 3. Fire LLMCompletionEndEvent with the response
return CompletionResponse(text="Completion result.")
Async Streaming Usage
from llama_index.core.llms.callbacks import llm_chat_callback
from llama_index.core.base.llms.types import (
ChatMessage,
ChatResponse,
ChatResponseAsyncGen,
)
from typing import Sequence, Any, AsyncGenerator
class MyStreamingLLM:
@llm_chat_callback()
async def astream_chat(
self, messages: Sequence[ChatMessage], **kwargs: Any
) -> ChatResponseAsyncGen:
# The decorator detects the async generator return type
# and wraps it to emit LLMChatInProgressEvent for each chunk
async def gen() -> ChatResponseAsyncGen:
for token in ["Hello", " world", "!"]:
yield ChatResponse(
message=ChatMessage(role="assistant", content=token)
)
return gen()