Principle:Spcl Graph of thoughts Document Merging Response Parsing

Knowledge Sources	Graph of Thoughts Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Domains	Response_Parsing, Document_Merging
Related Implementations	Implementation:Spcl_Graph_of_thoughts_DocMergeParser
Last Updated	2026-02-14

Overview

Domain-specific parsing pattern for extracting merged documents and computing F1 quality scores from XML-tagged LLM responses.

Description

The Document Merging Response Parsing principle defines how raw LLM text output is transformed into structured thought state dictionaries containing merged NDA text and numeric quality scores. This is the most complex parsing pattern in the GoT framework because it must handle both free-form document text (extracted from <Merged> tags) and numeric scores (extracted from <Redundancy> and <Retained> tags), then combine those scores into a composite F1 metric.

Core Parsing Strategy: strip_answer_helper

A central helper method provides flexible tag-based extraction:

Strip whitespace from the response.
If "Output:" is present, take everything after it.
If a tag name is provided, locate the last occurrence of <tag> and </tag>.
Extract the content between those positions.
Handle partial matches gracefully: if only the start tag is found, return everything after it; if only the end tag is found, return everything before it; if neither is found, return the full text with a warning.

Parsing by Operation Type

Generate Parsing (parse_generate_answer): Extracts merged NDA text from between <Merged> and </Merged> tags. Creates one new thought state per LLM response, with the extracted text as current. This is straightforward since all generate responses follow the same tag format.

Aggregation Parsing (parse_aggregation_answer): Two modes based on the parts field:

Subpart aggregation (parts is non-empty and smaller than the document count): Extracts merged text, then computes the union of parts sets from all input states. This tracks which original documents are now represented in the merged result.
Full aggregation (parts covers all documents): Simply extracts the merged text without modifying parts.

Score Parsing (parse_score_answer): The most distinctive parsing logic in the GoT framework:

For each LLM response text, extract the content within <Redundancy> tags and search for numeric values using regex (\d+\.?\d*).
Similarly extract from <Retained> tags.
If multiple numbers are found within a tag, use the last one (with a warning).
Compute the mean redundancy score and mean retained score across all response texts.
Combine into an F1 score: F1 = 2 * mean_redundancy * mean_retain / (mean_redundancy + mean_retain).
Return the single F1 score in a list. If no valid scores are found, return [0.0].

This F1 computation balances both quality dimensions -- a high score requires both low redundancy and high information retention.

Error Handling

Missing tags: The helper logs warnings and returns the full text if no matching tags are found.
Partial tags: If only the opening or closing tag is found, it extracts what it can and logs the discrepancy.
Missing scores: If no numeric value is found in a score tag, the response is ignored. If all responses lack scores, the method returns [0.0].
Multiple scores: When multiple numbers appear within a single tag, the last one is used with a warning log.

Related Pages

Implementation:Spcl_Graph_of_thoughts_DocMergeParser -- Concrete Python class implementing this principle
Principle:Spcl_Graph_of_thoughts_Document_Merging_Prompt_Design -- Companion prompt design principle
Workflow:Spcl_Graph_of_thoughts_GoT_Document_Merging_Pipeline -- End-to-end workflow using this parsing

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment