Implementation:Run llama Llama index Settings LLM Assignment

Overview

The Settings.llm assignment pattern documents how the Settings singleton's llm property is used to integrate a finetuned model into the LlamaIndex pipeline. This is a Wrapper Doc that documents a specific usage pattern of the Settings.llm getter and setter, focused on the finetuned model integration use case.

Source File

File: llama-index-core/llama_index/core/settings.py
Lines: 32-46
Import: from llama_index.core import Settings

Settings Singleton

The Settings object is a module-level instance of the _Settings dataclass, created at the bottom of the settings module:

# Singleton
Settings = _Settings()

This ensures a single, shared configuration object exists across the entire application.

LLM Property: Getter

@property
def llm(self) -> LLM:
    """Get the LLM."""
    if self._llm is None:
        self._llm = resolve_llm("default")

    if self._callback_manager is not None:
        self._llm.callback_manager = self._callback_manager

    return self._llm

Returns: LLM -- The currently configured LLM instance

Behavior:

Lazy initialization: If no LLM has been set (self._llm is None), calls resolve_llm("default") to create a default LLM
Callback propagation: If a global callback manager has been configured, it is automatically assigned to the LLM's callback_manager before returning. This ensures monitoring, logging, and finetuning data collection handlers are always active.
Return: Returns the _llm instance

LLM Property: Setter

@llm.setter
def llm(self, llm: LLMType) -> None:
    """Set the LLM."""
    self._llm = resolve_llm(llm)

Parameters:

Parameter	Type	Description
`llm`	`LLMType`	The LLM to set. `LLMType = Union[LLM, str]`

Behavior:

Calls resolve_llm(llm) which:
- If llm is already an LLM instance, returns it directly
- If llm is a string, resolves it to the appropriate LLM class (e.g., "gpt-4" becomes an OpenAI instance)
Stores the resolved LLM in self._llm

Finetuned Model Integration Pattern

The primary use case documented here is assigning a finetuned model to the global Settings:

from llama_index.core import Settings
from llama_index.finetuning import OpenAIFinetuneEngine

# Retrieve the finetuned model from a completed job
engine = OpenAIFinetuneEngine(
    base_model="gpt-3.5-turbo",
    data_path="training_data.jsonl",
    start_job_id="ftjob-abc123",
)
ft_llm = engine.get_finetuned_model(temperature=0.3)

# Assign to global Settings -- all pipeline components now use the finetuned model
Settings.llm = ft_llm

After this assignment, any LlamaIndex component that reads Settings.llm (query engines, response synthesizers, chat engines, agents) will use the finetuned model without any code changes.

Complete Finetuning-to-Integration Workflow

from llama_index.finetuning import OpenAIFinetuneEngine, OpenAIFineTuningHandler
from llama_index.core.callbacks import CallbackManager
from llama_index.core import Settings, VectorStoreIndex

# Step 1: Collect training data
finetuning_handler = OpenAIFineTuningHandler()
Settings.callback_manager = CallbackManager([finetuning_handler])

# ... run queries to collect training data ...

# Step 2: Launch finetuning
engine = OpenAIFinetuneEngine.from_finetuning_handler(
    finetuning_handler=finetuning_handler,
    base_model="gpt-3.5-turbo",
    data_path="training_data.jsonl",
)
engine.finetune()

# Step 3: Wait for completion
import time
while engine.get_current_job().status not in ("succeeded", "failed"):
    time.sleep(60)

# Step 4: Integrate finetuned model
ft_llm = engine.get_finetuned_model(temperature=0.3)
Settings.llm = ft_llm

# Step 5: Use in pipeline (all queries now use the finetuned model)
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What is semantic search?")

A/B Testing Pattern

from llama_index.core import Settings
from llama_index.llms.openai import OpenAI

original_llm = OpenAI(model="gpt-3.5-turbo")
ft_llm = engine.get_finetuned_model(temperature=0.3)

# Compare original vs finetuned
Settings.llm = original_llm
response_a = query_engine.query("Explain RAG.")

Settings.llm = ft_llm
response_b = query_engine.query("Explain RAG.")

print("Original:", response_a)
print("Finetuned:", response_b)

Knowledge Sources

LlamaIndex Settings Source

Metadata

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment