Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Run llama Llama index Settings Configuration

From Leeroopedia

Overview

The Settings Configuration implementation provides the centralized singleton object through which all default LlamaIndex component configurations are managed. The _Settings class, exposed as the module-level Settings instance, uses Python's ContextVar to provide thread-safe and async-safe access to the default LLM, embedding model, tokenizer, node parser, transformations, and callback manager.

LLM Configuration RAG Pipeline LlamaIndex Core

Source File

  • File: llama-index-core/llama_index/core/settings.py
  • Class: _Settings (singleton exposed as Settings)

Import

from llama_index.core.settings import Settings

Class Signature

class _Settings:
    """Global settings for LlamaIndex.

    Singleton exposed as `Settings`. Uses ContextVar for thread safety.
    """
    ...

Key Properties

The Settings singleton exposes the following properties, each backed by a ContextVar for isolation:

Property Type Description Default Behavior
llm LLM The default language model used for generation tasks (response synthesis, query rewriting, summarization). Lazily initialized to OpenAI(model="gpt-3.5-turbo") if not explicitly set.
embed_model BaseEmbedding The default embedding model used for converting text into dense vector representations. Lazily initialized to OpenAIEmbedding() if not explicitly set.
tokenizer Callable A callable that tokenizes text, used for token counting and context window management. Defaults to the tokenizer associated with the configured LLM.
node_parser NodeParser The default node parser for splitting documents into indexable chunks (nodes). Defaults to SentenceSplitter().
transformations List[TransformComponent] An ordered list of transformation components applied during the ingestion pipeline. Defaults to [SentenceSplitter()].
callback_manager CallbackManager The global callback manager for observability, tracing, and event handling. Defaults to an empty CallbackManager().

Usage Examples

Basic Configuration

from llama_index.core.settings import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Configure the global LLM
Settings.llm = OpenAI(model="gpt-4")

# Configure the global embedding model
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

Configuring Node Parser and Transformations

from llama_index.core.settings import Settings
from llama_index.core.node_parser import SentenceSplitter

# Set a custom node parser with specific chunk size
Settings.node_parser = SentenceSplitter(chunk_size=1024, chunk_overlap=200)

# Set custom transformations pipeline
Settings.transformations = [
    SentenceSplitter(chunk_size=1024, chunk_overlap=200),
]

Configuring Callback Manager for Observability

from llama_index.core.settings import Settings
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler

# Attach a debug handler for tracing
debug_handler = LlamaDebugHandler(print_trace_on_end=True)
Settings.callback_manager = CallbackManager([debug_handler])

Using a Non-OpenAI LLM

from llama_index.core.settings import Settings
from llama_index.llms.anthropic import Anthropic
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

Settings.llm = Anthropic(model="claude-3-5-sonnet-20241022")
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

How Components Consume Settings

All LlamaIndex components that need an LLM, embedding model, or other shared resource follow a consistent resolution order:

  1. Check if the value was explicitly passed as a constructor parameter.
  2. If not, read from Settings to obtain the global default.
  3. If Settings has not been configured for that property, use the built-in default (e.g., OpenAI(model="gpt-3.5-turbo") for the LLM).

This three-tier resolution ensures maximum flexibility while minimizing required configuration.

Thread Safety Details

Each property in the Settings singleton is backed by a contextvars.ContextVar. This means:

  • Concurrent web requests (e.g., in FastAPI) each get their own isolated copy of Settings when modified within a request context.
  • Async tasks spawned via asyncio inherit the parent context but can override values independently.
  • Tests running in parallel do not interfere with each other's Settings state.
import contextvars
import asyncio
from llama_index.core.settings import Settings
from llama_index.llms.openai import OpenAI

async def handle_request_a():
    # This override is isolated to this coroutine's context
    Settings.llm = OpenAI(model="gpt-4")
    # ... build index and query ...

async def handle_request_b():
    # This override is isolated to this coroutine's context
    Settings.llm = OpenAI(model="gpt-3.5-turbo")
    # ... build index and query ...

# Both can run concurrently without interference
asyncio.gather(handle_request_a(), handle_request_b())

Parameter Reference

Parameter Type Default Description
llm LLM OpenAI(model="gpt-3.5-turbo") Language model for all generation tasks
embed_model BaseEmbedding OpenAIEmbedding() Embedding model for vectorization
tokenizer Callable LLM's tokenizer Function mapping text to token list
node_parser NodeParser SentenceSplitter() Document chunking strategy
transformations List[TransformComponent] [SentenceSplitter()] Ingestion pipeline transformations
callback_manager CallbackManager CallbackManager() Global callback/event manager

Knowledge Sources

LlamaIndex Settings Documentation LlamaIndex GitHub Repository

Principle

Principle:Run_llama_Llama_index_Global_Settings_Configuration

Metadata

2026-02-11 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment