Implementation:Run llama Llama index Settings Configuration

Overview

The Settings Configuration implementation provides the centralized singleton object through which all default LlamaIndex component configurations are managed. The _Settings class, exposed as the module-level Settings instance, uses Python's ContextVar to provide thread-safe and async-safe access to the default LLM, embedding model, tokenizer, node parser, transformations, and callback manager.

LLM Configuration RAG Pipeline LlamaIndex Core

Source File

File: llama-index-core/llama_index/core/settings.py
Class: _Settings (singleton exposed as Settings)

Import

from llama_index.core.settings import Settings

Class Signature

class _Settings:
    """Global settings for LlamaIndex.

    Singleton exposed as `Settings`. Uses ContextVar for thread safety.
    """
    ...

Key Properties

The Settings singleton exposes the following properties, each backed by a ContextVar for isolation:

Property	Type	Description	Default Behavior
`llm`	`LLM`	The default language model used for generation tasks (response synthesis, query rewriting, summarization).	Lazily initialized to `OpenAI(model="gpt-3.5-turbo")` if not explicitly set.
`embed_model`	`BaseEmbedding`	The default embedding model used for converting text into dense vector representations.	Lazily initialized to `OpenAIEmbedding()` if not explicitly set.
`tokenizer`	`Callable`	A callable that tokenizes text, used for token counting and context window management.	Defaults to the tokenizer associated with the configured LLM.
`node_parser`	`NodeParser`	The default node parser for splitting documents into indexable chunks (nodes).	Defaults to `SentenceSplitter()`.
`transformations`	`List[TransformComponent]`	An ordered list of transformation components applied during the ingestion pipeline.	Defaults to `[SentenceSplitter()]`.
`callback_manager`	`CallbackManager`	The global callback manager for observability, tracing, and event handling.	Defaults to an empty `CallbackManager()`.

Usage Examples

Basic Configuration

from llama_index.core.settings import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Configure the global LLM
Settings.llm = OpenAI(model="gpt-4")

# Configure the global embedding model
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

Configuring Node Parser and Transformations

from llama_index.core.settings import Settings
from llama_index.core.node_parser import SentenceSplitter

# Set a custom node parser with specific chunk size
Settings.node_parser = SentenceSplitter(chunk_size=1024, chunk_overlap=200)

# Set custom transformations pipeline
Settings.transformations = [
    SentenceSplitter(chunk_size=1024, chunk_overlap=200),
]

Configuring Callback Manager for Observability

from llama_index.core.settings import Settings
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler

# Attach a debug handler for tracing
debug_handler = LlamaDebugHandler(print_trace_on_end=True)
Settings.callback_manager = CallbackManager([debug_handler])

Using a Non-OpenAI LLM

from llama_index.core.settings import Settings
from llama_index.llms.anthropic import Anthropic
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

Settings.llm = Anthropic(model="claude-3-5-sonnet-20241022")
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

How Components Consume Settings

All LlamaIndex components that need an LLM, embedding model, or other shared resource follow a consistent resolution order:

Check if the value was explicitly passed as a constructor parameter.
If not, read from Settings to obtain the global default.
If Settings has not been configured for that property, use the built-in default (e.g., OpenAI(model="gpt-3.5-turbo") for the LLM).

This three-tier resolution ensures maximum flexibility while minimizing required configuration.

Thread Safety Details

Each property in the Settings singleton is backed by a contextvars.ContextVar. This means:

Concurrent web requests (e.g., in FastAPI) each get their own isolated copy of Settings when modified within a request context.
Async tasks spawned via asyncio inherit the parent context but can override values independently.
Tests running in parallel do not interfere with each other's Settings state.

import contextvars
import asyncio
from llama_index.core.settings import Settings
from llama_index.llms.openai import OpenAI

async def handle_request_a():
    # This override is isolated to this coroutine's context
    Settings.llm = OpenAI(model="gpt-4")
    # ... build index and query ...

async def handle_request_b():
    # This override is isolated to this coroutine's context
    Settings.llm = OpenAI(model="gpt-3.5-turbo")
    # ... build index and query ...

# Both can run concurrently without interference
asyncio.gather(handle_request_a(), handle_request_b())

Parameter Reference

Parameter	Type	Default	Description
`llm`	`LLM`	`OpenAI(model="gpt-3.5-turbo")`	Language model for all generation tasks
`embed_model`	`BaseEmbedding`	`OpenAIEmbedding()`	Embedding model for vectorization
`tokenizer`	`Callable`	LLM's tokenizer	Function mapping text to token list
`node_parser`	`NodeParser`	`SentenceSplitter()`	Document chunking strategy
`transformations`	`List[TransformComponent]`	`[SentenceSplitter()]`	Ingestion pipeline transformations
`callback_manager`	`CallbackManager`	`CallbackManager()`	Global callback/event manager

Knowledge Sources

LlamaIndex Settings Documentation LlamaIndex GitHub Repository

Principle

Principle:Run_llama_Llama_index_Global_Settings_Configuration

Metadata

2026-02-11 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment