Implementation:Run llama Llama index StructuredLLM

Overview

StructuredLLM is a wrapper around a standard LLM that constrains all outputs to conform to a specified Pydantic BaseModel output class. It delegates to the inner LLM's structured prediction methods, ensuring that all chat and completion responses contain JSON-serialized structured data matching the target schema.

Source file: llama-index-core/llama_index/core/llms/structured_llm.py (163 lines)

Class Hierarchy

LLM
  └── StructuredLLM

Configuration Fields

Field	Type	Description
`llm`	`SerializeAsAny[LLM]`	The inner LLM instance to wrap
`output_cls`	`Type[BaseModel]`	The Pydantic model class defining the output structure (excluded from serialization)

The output_cls field uses exclude=True, meaning it is not included when serializing the model.

Properties

metadata

@property
def metadata(self) -> LLMMetadata:
    return self.llm.metadata

Delegates to the inner LLM's metadata, exposing the same model information.

Synchronous Methods

chat

@llm_chat_callback()
def chat(self, messages: Sequence[ChatMessage], **kwargs: Any) -> ChatResponse:

Wraps the input messages in a ChatPromptTemplate.
Calls self.llm.structured_predict with the output_cls and prompt.
Returns a ChatResponse with the assistant's content set to the JSON serialization of the output (model_dump_json()) and raw set to the structured output object.

stream_chat

@llm_chat_callback()
def stream_chat(
    self, messages: Sequence[ChatMessage], **kwargs: Any
) -> ChatResponseGen:

Wraps messages in a ChatPromptTemplate.
Calls self.llm.stream_structured_predict.
Yields ChatResponse objects for each partial output, with content serialized as JSON.

complete

@llm_completion_callback()
def complete(
    self, prompt: str, formatted: bool = False, **kwargs: Any
) -> CompletionResponse:

Uses chat_to_completion_decorator to convert the chat method into a completion-style call, then invokes it with the prompt.

stream_complete

@llm_completion_callback()
def stream_complete(
    self, prompt: str, formatted: bool = False, **kwargs: Any
) -> CompletionResponseGen:

Raises NotImplementedError. Streaming completion is not supported.

Async Methods

achat

@llm_chat_callback()
async def achat(
    self, messages: Sequence[ChatMessage], **kwargs: Any
) -> ChatResponse:

Async counterpart of chat. Wraps messages in ChatPromptTemplate and calls self.llm.astructured_predict. Returns a ChatResponse with JSON-serialized content.

astream_chat

@llm_chat_callback()
async def astream_chat(
    self, messages: Sequence[ChatMessage], **kwargs: Any
) -> ChatResponseAsyncGen:

Async streaming chat. Creates an inner async generator that:

Wraps messages in ChatPromptTemplate.
Calls self.llm.astream_structured_predict.
Yields ChatResponse objects for each partial output.

Returns the async generator function.

acomplete

@llm_completion_callback()
async def acomplete(
    self, prompt: str, formatted: bool = False, **kwargs: Any
) -> CompletionResponse:

Uses achat_to_completion_decorator to convert achat into an async completion call.

astream_complete

@llm_completion_callback()
async def astream_complete(
    self, prompt: str, formatted: bool = False, **kwargs: Any
) -> CompletionResponseGen:

Raises NotImplementedError. Async streaming completion is not supported.

Class Name

@classmethod
def class_name(cls) -> str:
    return "structured_llm"

Key Design Decisions

ChatPromptTemplate wrapping: Input messages are wrapped in a ChatPromptTemplate even when they have no template variables. This is done to maintain compatibility with the FunctionCallingProgram and other structured prediction infrastructure.
JSON serialization in content: The structured output is serialized to JSON and placed in the content field of chat messages, while the raw Pydantic object is stored in the raw field.
Completion via chat: The complete and acomplete methods are implemented as decorators over the chat methods, using chat_to_completion_decorator and achat_to_completion_decorator respectively.

Dependencies

llama_index.core.llms.llm.LLM -- parent class and inner LLM type
llama_index.core.bridge.pydantic -- provides BaseModel, Field, SerializeAsAny
llama_index.core.base.llms.types -- provides all response types and LLMMetadata
llama_index.core.llms.callbacks -- provides llm_chat_callback and llm_completion_callback
llama_index.core.prompts.base.ChatPromptTemplate -- used to wrap messages
llama_index.core.base.llms.generic_utils -- provides chat_to_completion_decorator and achat_to_completion_decorator

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment