Implementation:Run llama Llama index SelectionOutputParser
Overview
The SelectionOutputParser module provides an output parser for extracting structured choice-and-reason selections from LLM output. It parses JSON (or YAML) responses into a list of Answer dataclass instances, each containing a numeric choice and a textual reason. This module is located at llama-index-core/llama_index/core/output_parsers/selection.py (104 lines).
Purpose
This parser is used in LlamaIndex's selection and routing workflows, where an LLM must choose one or more options from a list (such as selecting the most relevant index, data source, or tool) and provide reasoning for each choice. The parser ensures the LLM's JSON output is properly extracted, validated, and converted into structured Answer objects.
Constants and Helpers
FORMAT_STR
A format string appended to prompts instructing the LLM to output JSON in a specific array-of-objects format:
The output should be ONLY JSON formatted as a JSON instance.
Here is an example:
[
{
choice: 1,
reason: "<insert reason for choice>"
},
...
]
Helper Function: _escape_curly_braces
def _escape_curly_braces(input_string: str) -> str
Replaces { with {{ and } with }} to escape curly braces for safe use in format strings and prompt templates.
Key Components
Dataclass: Answer
A dataclass with JSON serialization support (via DataClassJsonMixin) representing a single selection.
| Field | Type | Description |
|---|---|---|
choice |
int |
The numeric index of the selected choice. |
reason |
str |
The LLM's reasoning for making this choice. |
Class: SelectionOutputParser
A concrete implementation of BaseOutputParser that parses LLM output into a list of Answer objects.
Class Attributes
| Attribute | Value | Description |
|---|---|---|
REQUIRED_KEYS |
frozenset(Answer.__annotations__) |
The set of required keys ({"choice", "reason"}) that each answer dictionary must contain.
|
Methods
| Method | Parameters | Return Type | Description |
|---|---|---|---|
_filter_dict |
json_dict: dict |
dict |
Recursively searches a nested dictionary structure to find one that contains all REQUIRED_KEYS. Handles cases where the LLM wraps the answer in nested structures.
|
_format_output |
output: List[dict] |
List[dict] |
Validates each dictionary in the output list. If a dictionary is missing required keys, it applies _filter_dict to attempt extraction from nested structures.
|
parse |
output: str |
StructuredOutput |
Main parsing method. Extracts JSON from the LLM output, validates and formats it, then converts to Answer objects. Returns a StructuredOutput containing both raw and parsed output.
|
format |
prompt_template: str |
str |
Appends the escaped FORMAT_STR to the prompt template.
|
Parsing Pipeline
The parse method follows this sequence:
- JSON marshaling: The raw LLM output is passed through
_marshal_llm_to_json()to extract a JSON string. - JSON parsing: The string is parsed with
json.loads(). - YAML fallback: If JSON parsing fails, the parser attempts
yaml.safe_load()as a fallback. This handles cases where the LLM produces trailing commas or other minor syntax issues that YAML tolerates. - Type normalization: If the result is a single dictionary, it is wrapped in a list.
- Validation: If the result is not a list, a
ValueErroris raised. - Format correction:
_format_output()validates each dictionary and applies recursive filtering for missing keys. - Deserialization: Each validated dictionary is converted to an
Answerinstance viaAnswer.from_dict(). - Result wrapping: Returns a
StructuredOutput(raw_output=output, parsed_output=answers).
Error Handling
| Condition | Exception | Details |
|---|---|---|
| JSON parsing fails and YAML also fails | OutputParserException |
Includes both the JSON and YAML error messages along with the problematic string. |
| YAML is not installed | ImportError |
Prompts the user to install PyYAML.
|
| Parsed result is not a list or dict | ValueError |
Indicates the output could not be converted to the expected format. |
Dependencies
| Module | Items Imported |
|---|---|
json |
Standard JSON parsing. |
dataclasses |
dataclass decorator for the Answer class.
|
dataclasses_json |
DataClassJsonMixin for JSON serialization of the Answer dataclass.
|
llama_index.core.output_parsers.base |
OutputParserException, StructuredOutput
|
llama_index.core.output_parsers.utils |
_marshal_llm_to_json for extracting JSON from LLM text.
|
llama_index.core.types |
BaseOutputParser
|
yaml (optional) |
yaml.safe_load used as a fallback parser for malformed JSON.
|
Design Notes
- The YAML fallback is a pragmatic choice: LLMs frequently produce JSON with trailing commas, which is invalid JSON but valid YAML.
- The
_filter_dictmethod handles the common LLM behavior of wrapping answers in extra levels of nesting. - The parser produces a
StructuredOutputthat preserves both the raw LLM text and the parsedAnswerobjects, allowing downstream consumers to access either form. - The
REQUIRED_KEYSset is derived automatically from theAnswerdataclass annotations, ensuring the validation stays in sync with the data model.