Implementation:Run llama Llama index SelectionOutputParser

Overview

The SelectionOutputParser module provides an output parser for extracting structured choice-and-reason selections from LLM output. It parses JSON (or YAML) responses into a list of Answer dataclass instances, each containing a numeric choice and a textual reason. This module is located at llama-index-core/llama_index/core/output_parsers/selection.py (104 lines).

Purpose

This parser is used in LlamaIndex's selection and routing workflows, where an LLM must choose one or more options from a list (such as selecting the most relevant index, data source, or tool) and provide reasoning for each choice. The parser ensures the LLM's JSON output is properly extracted, validated, and converted into structured Answer objects.

Constants and Helpers

FORMAT_STR

A format string appended to prompts instructing the LLM to output JSON in a specific array-of-objects format:

The output should be ONLY JSON formatted as a JSON instance.

Here is an example:
[
    {
        choice: 1,
        reason: "<insert reason for choice>"
    },
    ...
]

Helper Function: _escape_curly_braces

def _escape_curly_braces(input_string: str) -> str

Replaces { with {{ and } with }} to escape curly braces for safe use in format strings and prompt templates.

Key Components

Dataclass: Answer

A dataclass with JSON serialization support (via DataClassJsonMixin) representing a single selection.

Field	Type	Description
`choice`	`int`	The numeric index of the selected choice.
`reason`	`str`	The LLM's reasoning for making this choice.

Class: SelectionOutputParser

A concrete implementation of BaseOutputParser that parses LLM output into a list of Answer objects.

Class Attributes

Attribute	Value	Description
`REQUIRED_KEYS`	`frozenset(Answer.__annotations__)`	The set of required keys (`{"choice", "reason"}`) that each answer dictionary must contain.

Methods

Method	Parameters	Return Type	Description
`_filter_dict`	`json_dict: dict`	`dict`	Recursively searches a nested dictionary structure to find one that contains all `REQUIRED_KEYS`. Handles cases where the LLM wraps the answer in nested structures.
`_format_output`	`output: List[dict]`	`List[dict]`	Validates each dictionary in the output list. If a dictionary is missing required keys, it applies `_filter_dict` to attempt extraction from nested structures.
`parse`	`output: str`	`StructuredOutput`	Main parsing method. Extracts JSON from the LLM output, validates and formats it, then converts to `Answer` objects. Returns a `StructuredOutput` containing both raw and parsed output.
`format`	`prompt_template: str`	`str`	Appends the escaped `FORMAT_STR` to the prompt template.

Parsing Pipeline

The parse method follows this sequence:

JSON marshaling: The raw LLM output is passed through _marshal_llm_to_json() to extract a JSON string.
JSON parsing: The string is parsed with json.loads().
YAML fallback: If JSON parsing fails, the parser attempts yaml.safe_load() as a fallback. This handles cases where the LLM produces trailing commas or other minor syntax issues that YAML tolerates.
Type normalization: If the result is a single dictionary, it is wrapped in a list.
Validation: If the result is not a list, a ValueError is raised.
Format correction: _format_output() validates each dictionary and applies recursive filtering for missing keys.
Deserialization: Each validated dictionary is converted to an Answer instance via Answer.from_dict().
Result wrapping: Returns a StructuredOutput(raw_output=output, parsed_output=answers).

Error Handling

Condition	Exception	Details
JSON parsing fails and YAML also fails	`OutputParserException`	Includes both the JSON and YAML error messages along with the problematic string.
YAML is not installed	`ImportError`	Prompts the user to install `PyYAML`.
Parsed result is not a list or dict	`ValueError`	Indicates the output could not be converted to the expected format.

Dependencies

Module	Items Imported
`json`	Standard JSON parsing.
`dataclasses`	`dataclass` decorator for the `Answer` class.
`dataclasses_json`	`DataClassJsonMixin` for JSON serialization of the `Answer` dataclass.
`llama_index.core.output_parsers.base`	`OutputParserException`, `StructuredOutput`
`llama_index.core.output_parsers.utils`	`_marshal_llm_to_json` for extracting JSON from LLM text.
`llama_index.core.types`	`BaseOutputParser`
`yaml` (optional)	`yaml.safe_load` used as a fallback parser for malformed JSON.

Design Notes

The YAML fallback is a pragmatic choice: LLMs frequently produce JSON with trailing commas, which is invalid JSON but valid YAML.
The _filter_dict method handles the common LLM behavior of wrapping answers in extra levels of nesting.
The parser produces a StructuredOutput that preserves both the raw LLM text and the parsed Answer objects, allowing downstream consumers to access either form.
The REQUIRED_KEYS set is derived automatically from the Answer dataclass annotations, ensuring the validation stays in sync with the data model.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment