Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Run llama Llama index SelectionOutputParser

From Leeroopedia
Revision as of 11:48, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Run_llama_Llama_index_SelectionOutputParser.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Overview

The SelectionOutputParser module provides an output parser for extracting structured choice-and-reason selections from LLM output. It parses JSON (or YAML) responses into a list of Answer dataclass instances, each containing a numeric choice and a textual reason. This module is located at llama-index-core/llama_index/core/output_parsers/selection.py (104 lines).

Purpose

This parser is used in LlamaIndex's selection and routing workflows, where an LLM must choose one or more options from a list (such as selecting the most relevant index, data source, or tool) and provide reasoning for each choice. The parser ensures the LLM's JSON output is properly extracted, validated, and converted into structured Answer objects.

Constants and Helpers

FORMAT_STR

A format string appended to prompts instructing the LLM to output JSON in a specific array-of-objects format:

The output should be ONLY JSON formatted as a JSON instance.

Here is an example:
[
    {
        choice: 1,
        reason: "<insert reason for choice>"
    },
    ...
]

Helper Function: _escape_curly_braces

def _escape_curly_braces(input_string: str) -> str

Replaces { with {{ and } with }} to escape curly braces for safe use in format strings and prompt templates.

Key Components

Dataclass: Answer

A dataclass with JSON serialization support (via DataClassJsonMixin) representing a single selection.

Field Type Description
choice int The numeric index of the selected choice.
reason str The LLM's reasoning for making this choice.

Class: SelectionOutputParser

A concrete implementation of BaseOutputParser that parses LLM output into a list of Answer objects.

Class Attributes

Attribute Value Description
REQUIRED_KEYS frozenset(Answer.__annotations__) The set of required keys ({"choice", "reason"}) that each answer dictionary must contain.

Methods

Method Parameters Return Type Description
_filter_dict json_dict: dict dict Recursively searches a nested dictionary structure to find one that contains all REQUIRED_KEYS. Handles cases where the LLM wraps the answer in nested structures.
_format_output output: List[dict] List[dict] Validates each dictionary in the output list. If a dictionary is missing required keys, it applies _filter_dict to attempt extraction from nested structures.
parse output: str StructuredOutput Main parsing method. Extracts JSON from the LLM output, validates and formats it, then converts to Answer objects. Returns a StructuredOutput containing both raw and parsed output.
format prompt_template: str str Appends the escaped FORMAT_STR to the prompt template.

Parsing Pipeline

The parse method follows this sequence:

  1. JSON marshaling: The raw LLM output is passed through _marshal_llm_to_json() to extract a JSON string.
  2. JSON parsing: The string is parsed with json.loads().
  3. YAML fallback: If JSON parsing fails, the parser attempts yaml.safe_load() as a fallback. This handles cases where the LLM produces trailing commas or other minor syntax issues that YAML tolerates.
  4. Type normalization: If the result is a single dictionary, it is wrapped in a list.
  5. Validation: If the result is not a list, a ValueError is raised.
  6. Format correction: _format_output() validates each dictionary and applies recursive filtering for missing keys.
  7. Deserialization: Each validated dictionary is converted to an Answer instance via Answer.from_dict().
  8. Result wrapping: Returns a StructuredOutput(raw_output=output, parsed_output=answers).

Error Handling

Condition Exception Details
JSON parsing fails and YAML also fails OutputParserException Includes both the JSON and YAML error messages along with the problematic string.
YAML is not installed ImportError Prompts the user to install PyYAML.
Parsed result is not a list or dict ValueError Indicates the output could not be converted to the expected format.

Dependencies

Module Items Imported
json Standard JSON parsing.
dataclasses dataclass decorator for the Answer class.
dataclasses_json DataClassJsonMixin for JSON serialization of the Answer dataclass.
llama_index.core.output_parsers.base OutputParserException, StructuredOutput
llama_index.core.output_parsers.utils _marshal_llm_to_json for extracting JSON from LLM text.
llama_index.core.types BaseOutputParser
yaml (optional) yaml.safe_load used as a fallback parser for malformed JSON.

Design Notes

  • The YAML fallback is a pragmatic choice: LLMs frequently produce JSON with trailing commas, which is invalid JSON but valid YAML.
  • The _filter_dict method handles the common LLM behavior of wrapping answers in extra levels of nesting.
  • The parser produces a StructuredOutput that preserves both the raw LLM text and the parsed Answer objects, allowing downstream consumers to access either form.
  • The REQUIRED_KEYS set is derived automatically from the Answer dataclass annotations, ensuring the validation stays in sync with the data model.

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment