Implementation:Run llama Llama index Settings LLM Assignment
Overview
The Settings.llm assignment pattern documents how the Settings singleton's llm property is used to integrate a finetuned model into the LlamaIndex pipeline. This is a Wrapper Doc that documents a specific usage pattern of the Settings.llm getter and setter, focused on the finetuned model integration use case.
Source File
- File:
llama-index-core/llama_index/core/settings.py - Lines: 32-46
- Import:
from llama_index.core import Settings
Settings Singleton
The Settings object is a module-level instance of the _Settings dataclass, created at the bottom of the settings module:
# Singleton
Settings = _Settings()
This ensures a single, shared configuration object exists across the entire application.
LLM Property: Getter
@property
def llm(self) -> LLM:
"""Get the LLM."""
if self._llm is None:
self._llm = resolve_llm("default")
if self._callback_manager is not None:
self._llm.callback_manager = self._callback_manager
return self._llm
Returns: LLM -- The currently configured LLM instance
Behavior:
- Lazy initialization: If no LLM has been set (
self._llm is None), callsresolve_llm("default")to create a default LLM - Callback propagation: If a global callback manager has been configured, it is automatically assigned to the LLM's
callback_managerbefore returning. This ensures monitoring, logging, and finetuning data collection handlers are always active. - Return: Returns the
_llminstance
LLM Property: Setter
@llm.setter
def llm(self, llm: LLMType) -> None:
"""Set the LLM."""
self._llm = resolve_llm(llm)
Parameters:
| Parameter | Type | Description |
|---|---|---|
llm |
LLMType |
The LLM to set. LLMType = Union[LLM, str]
|
Behavior:
- Calls
resolve_llm(llm)which:- If
llmis already anLLMinstance, returns it directly - If
llmis a string, resolves it to the appropriate LLM class (e.g.,"gpt-4"becomes anOpenAIinstance)
- If
- Stores the resolved LLM in
self._llm
Finetuned Model Integration Pattern
The primary use case documented here is assigning a finetuned model to the global Settings:
from llama_index.core import Settings
from llama_index.finetuning import OpenAIFinetuneEngine
# Retrieve the finetuned model from a completed job
engine = OpenAIFinetuneEngine(
base_model="gpt-3.5-turbo",
data_path="training_data.jsonl",
start_job_id="ftjob-abc123",
)
ft_llm = engine.get_finetuned_model(temperature=0.3)
# Assign to global Settings -- all pipeline components now use the finetuned model
Settings.llm = ft_llm
After this assignment, any LlamaIndex component that reads Settings.llm (query engines, response synthesizers, chat engines, agents) will use the finetuned model without any code changes.
Complete Finetuning-to-Integration Workflow
from llama_index.finetuning import OpenAIFinetuneEngine, OpenAIFineTuningHandler
from llama_index.core.callbacks import CallbackManager
from llama_index.core import Settings, VectorStoreIndex
# Step 1: Collect training data
finetuning_handler = OpenAIFineTuningHandler()
Settings.callback_manager = CallbackManager([finetuning_handler])
# ... run queries to collect training data ...
# Step 2: Launch finetuning
engine = OpenAIFinetuneEngine.from_finetuning_handler(
finetuning_handler=finetuning_handler,
base_model="gpt-3.5-turbo",
data_path="training_data.jsonl",
)
engine.finetune()
# Step 3: Wait for completion
import time
while engine.get_current_job().status not in ("succeeded", "failed"):
time.sleep(60)
# Step 4: Integrate finetuned model
ft_llm = engine.get_finetuned_model(temperature=0.3)
Settings.llm = ft_llm
# Step 5: Use in pipeline (all queries now use the finetuned model)
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What is semantic search?")
A/B Testing Pattern
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
original_llm = OpenAI(model="gpt-3.5-turbo")
ft_llm = engine.get_finetuned_model(temperature=0.3)
# Compare original vs finetuned
Settings.llm = original_llm
response_a = query_engine.query("Explain RAG.")
Settings.llm = ft_llm
response_b = query_engine.query("Explain RAG.")
print("Original:", response_a)
print("Finetuned:", response_b)