Implementation:Run llama Llama index StructuredLLM
Overview
StructuredLLM is a wrapper around a standard LLM that constrains all outputs to conform to a specified Pydantic BaseModel output class. It delegates to the inner LLM's structured prediction methods, ensuring that all chat and completion responses contain JSON-serialized structured data matching the target schema.
Source file: llama-index-core/llama_index/core/llms/structured_llm.py (163 lines)
Class Hierarchy
LLM └── StructuredLLM
Configuration Fields
| Field | Type | Description |
|---|---|---|
llm |
SerializeAsAny[LLM] |
The inner LLM instance to wrap |
output_cls |
Type[BaseModel] |
The Pydantic model class defining the output structure (excluded from serialization) |
The output_cls field uses exclude=True, meaning it is not included when serializing the model.
Properties
metadata
@property
def metadata(self) -> LLMMetadata:
return self.llm.metadata
Delegates to the inner LLM's metadata, exposing the same model information.
Synchronous Methods
chat
@llm_chat_callback() def chat(self, messages: Sequence[ChatMessage], **kwargs: Any) -> ChatResponse:
- Wraps the input messages in a
ChatPromptTemplate. - Calls
self.llm.structured_predictwith theoutput_clsand prompt. - Returns a
ChatResponsewith the assistant's content set to the JSON serialization of the output (model_dump_json()) andrawset to the structured output object.
stream_chat
@llm_chat_callback()
def stream_chat(
self, messages: Sequence[ChatMessage], **kwargs: Any
) -> ChatResponseGen:
- Wraps messages in a
ChatPromptTemplate. - Calls
self.llm.stream_structured_predict. - Yields
ChatResponseobjects for each partial output, with content serialized as JSON.
complete
@llm_completion_callback()
def complete(
self, prompt: str, formatted: bool = False, **kwargs: Any
) -> CompletionResponse:
Uses chat_to_completion_decorator to convert the chat method into a completion-style call, then invokes it with the prompt.
stream_complete
@llm_completion_callback()
def stream_complete(
self, prompt: str, formatted: bool = False, **kwargs: Any
) -> CompletionResponseGen:
Raises NotImplementedError. Streaming completion is not supported.
Async Methods
achat
@llm_chat_callback()
async def achat(
self, messages: Sequence[ChatMessage], **kwargs: Any
) -> ChatResponse:
Async counterpart of chat. Wraps messages in ChatPromptTemplate and calls self.llm.astructured_predict. Returns a ChatResponse with JSON-serialized content.
astream_chat
@llm_chat_callback()
async def astream_chat(
self, messages: Sequence[ChatMessage], **kwargs: Any
) -> ChatResponseAsyncGen:
Async streaming chat. Creates an inner async generator that:
- Wraps messages in
ChatPromptTemplate. - Calls
self.llm.astream_structured_predict. - Yields
ChatResponseobjects for each partial output.
Returns the async generator function.
acomplete
@llm_completion_callback()
async def acomplete(
self, prompt: str, formatted: bool = False, **kwargs: Any
) -> CompletionResponse:
Uses achat_to_completion_decorator to convert achat into an async completion call.
astream_complete
@llm_completion_callback()
async def astream_complete(
self, prompt: str, formatted: bool = False, **kwargs: Any
) -> CompletionResponseGen:
Raises NotImplementedError. Async streaming completion is not supported.
Class Name
@classmethod
def class_name(cls) -> str:
return "structured_llm"
Key Design Decisions
- ChatPromptTemplate wrapping: Input messages are wrapped in a
ChatPromptTemplateeven when they have no template variables. This is done to maintain compatibility with theFunctionCallingProgramand other structured prediction infrastructure. - JSON serialization in content: The structured output is serialized to JSON and placed in the
contentfield of chat messages, while the raw Pydantic object is stored in therawfield. - Completion via chat: The
completeandacompletemethods are implemented as decorators over the chat methods, usingchat_to_completion_decoratorandachat_to_completion_decoratorrespectively.
Dependencies
llama_index.core.llms.llm.LLM-- parent class and inner LLM typellama_index.core.bridge.pydantic-- providesBaseModel,Field,SerializeAsAnyllama_index.core.base.llms.types-- provides all response types andLLMMetadatallama_index.core.llms.callbacks-- providesllm_chat_callbackandllm_completion_callbackllama_index.core.prompts.base.ChatPromptTemplate-- used to wrap messagesllama_index.core.base.llms.generic_utils-- provideschat_to_completion_decoratorandachat_to_completion_decorator