Implementation:Run llama Llama index Settings Configuration
Overview
The Settings Configuration implementation provides the centralized singleton object through which all default LlamaIndex component configurations are managed. The _Settings class, exposed as the module-level Settings instance, uses Python's ContextVar to provide thread-safe and async-safe access to the default LLM, embedding model, tokenizer, node parser, transformations, and callback manager.
LLM Configuration RAG Pipeline LlamaIndex Core
Source File
- File:
llama-index-core/llama_index/core/settings.py - Class:
_Settings(singleton exposed asSettings)
Import
from llama_index.core.settings import Settings
Class Signature
class _Settings:
"""Global settings for LlamaIndex.
Singleton exposed as `Settings`. Uses ContextVar for thread safety.
"""
...
Key Properties
The Settings singleton exposes the following properties, each backed by a ContextVar for isolation:
| Property | Type | Description | Default Behavior |
|---|---|---|---|
llm |
LLM |
The default language model used for generation tasks (response synthesis, query rewriting, summarization). | Lazily initialized to OpenAI(model="gpt-3.5-turbo") if not explicitly set.
|
embed_model |
BaseEmbedding |
The default embedding model used for converting text into dense vector representations. | Lazily initialized to OpenAIEmbedding() if not explicitly set.
|
tokenizer |
Callable |
A callable that tokenizes text, used for token counting and context window management. | Defaults to the tokenizer associated with the configured LLM. |
node_parser |
NodeParser |
The default node parser for splitting documents into indexable chunks (nodes). | Defaults to SentenceSplitter().
|
transformations |
List[TransformComponent] |
An ordered list of transformation components applied during the ingestion pipeline. | Defaults to [SentenceSplitter()].
|
callback_manager |
CallbackManager |
The global callback manager for observability, tracing, and event handling. | Defaults to an empty CallbackManager().
|
Usage Examples
Basic Configuration
from llama_index.core.settings import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
# Configure the global LLM
Settings.llm = OpenAI(model="gpt-4")
# Configure the global embedding model
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Configuring Node Parser and Transformations
from llama_index.core.settings import Settings
from llama_index.core.node_parser import SentenceSplitter
# Set a custom node parser with specific chunk size
Settings.node_parser = SentenceSplitter(chunk_size=1024, chunk_overlap=200)
# Set custom transformations pipeline
Settings.transformations = [
SentenceSplitter(chunk_size=1024, chunk_overlap=200),
]
Configuring Callback Manager for Observability
from llama_index.core.settings import Settings
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler
# Attach a debug handler for tracing
debug_handler = LlamaDebugHandler(print_trace_on_end=True)
Settings.callback_manager = CallbackManager([debug_handler])
Using a Non-OpenAI LLM
from llama_index.core.settings import Settings
from llama_index.llms.anthropic import Anthropic
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
Settings.llm = Anthropic(model="claude-3-5-sonnet-20241022")
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
How Components Consume Settings
All LlamaIndex components that need an LLM, embedding model, or other shared resource follow a consistent resolution order:
- Check if the value was explicitly passed as a constructor parameter.
- If not, read from Settings to obtain the global default.
- If Settings has not been configured for that property, use the built-in default (e.g.,
OpenAI(model="gpt-3.5-turbo")for the LLM).
This three-tier resolution ensures maximum flexibility while minimizing required configuration.
Thread Safety Details
Each property in the Settings singleton is backed by a contextvars.ContextVar. This means:
- Concurrent web requests (e.g., in FastAPI) each get their own isolated copy of Settings when modified within a request context.
- Async tasks spawned via
asyncioinherit the parent context but can override values independently. - Tests running in parallel do not interfere with each other's Settings state.
import contextvars
import asyncio
from llama_index.core.settings import Settings
from llama_index.llms.openai import OpenAI
async def handle_request_a():
# This override is isolated to this coroutine's context
Settings.llm = OpenAI(model="gpt-4")
# ... build index and query ...
async def handle_request_b():
# This override is isolated to this coroutine's context
Settings.llm = OpenAI(model="gpt-3.5-turbo")
# ... build index and query ...
# Both can run concurrently without interference
asyncio.gather(handle_request_a(), handle_request_b())
Parameter Reference
| Parameter | Type | Default | Description |
|---|---|---|---|
llm |
LLM |
OpenAI(model="gpt-3.5-turbo") |
Language model for all generation tasks |
embed_model |
BaseEmbedding |
OpenAIEmbedding() |
Embedding model for vectorization |
tokenizer |
Callable |
LLM's tokenizer | Function mapping text to token list |
node_parser |
NodeParser |
SentenceSplitter() |
Document chunking strategy |
transformations |
List[TransformComponent] |
[SentenceSplitter()] |
Ingestion pipeline transformations |
callback_manager |
CallbackManager |
CallbackManager() |
Global callback/event manager |
Knowledge Sources
LlamaIndex Settings Documentation LlamaIndex GitHub Repository
Principle
Principle:Run_llama_Llama_index_Global_Settings_Configuration