Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Spcl Graph of thoughts Document Merging Prompt Design

From Leeroopedia
Revision as of 17:16, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Spcl_Graph_of_thoughts_Document_Merging_Prompt_Design.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Prompt_Engineering, Document_Merging
Related Implementations Implementation:Spcl_Graph_of_thoughts_DocMergePrompter
Last Updated 2026-02-14

Overview

Domain-specific prompt engineering pattern for LLM-based document merging with redundancy and retention scoring.

Description

The Document Merging Prompt Design principle defines how prompts are constructed to guide an LLM through merging multiple Non-Disclosure Agreement (NDA) documents into a single consolidated NDA. Unlike the sorting and keyword counting examples, this domain has no objectively correct answer -- quality is measured by a trade-off between information retention and redundancy reduction. This makes it unique in the GoT framework for relying on LLM-based scoring rather than programmatic evaluation.

Prompt Strategy by Reasoning Approach

Input-Output (IO): The simplest approach uses merge_doc_prompt_start + merge_doc_prompt_block to present all NDA documents (tagged as <Doc1> through <DocN>) and instruct the LLM to produce a merged NDA between <Merged> and </Merged> tags.

Chain-of-Thought (CoT): Uses merge_doc_prompt_cot_start which adds explicit step-by-step guidance:

  1. Split each NDA into logical subparts.
  2. Merge the subparts across all NDAs.
  3. Combine merged subparts into a single NDA.
  4. Place the result between <Merged> tags.

The model is free to generate intermediate reasoning text.

Tree-of-Thought (ToT): Uses the IO merge prompt for the initial attempt, then applies improve_summary_prompt_start + improve_summary_prompt_block + improve_summary_prompt_end to iteratively refine the result. The improvement prompt presents both the original documents and the current summary, asking for better information coverage and less redundancy.

Graph-of-Thought (GoT - full): Generates multiple initial merge candidates, scores them with the LLM, aggregates the best ones using aggregate_full_prompt_base + blocks that show both the original documents and the summary NDAs to combine, then refines with improvement prompts.

Graph-of-Thought 2 (GoT2 - partial): Splits documents into pairs (e.g., Doc1+Doc2, Doc3+Doc4), merges each pair independently using the standard merge prompt with a parts subset, then aggregates partial merges using aggregate_sub_prompt_base + aggregate_sub_prompt_generate which presents only the summary NDAs (not the originals) for combination.

Unique Feature: LLM-Based Scoring

The document merging example is the only GoT task that uses LLM-based scoring. The score_prompt method generates a prompt that asks the LLM to evaluate a merged NDA along two dimensions:

  • Redundancy (1-10): A score of 10 means no information is redundant; 0 means at least half is repeated.
  • Retained Information (1-10): A score of 10 means all original information is preserved; 0 means none is.

The LLM outputs scores between XML tags <Redundancy> and <Retained>, which are then combined into an F1 score by the parser.

Prompt Template Inventory

Template Purpose Used By
merge_doc_prompt_start + merge_doc_prompt_block Direct NDA merge IO, ToT (initial), GoT (initial), GoT2 (initial)
merge_doc_prompt_cot_start CoT-style merge with step-by-step approach CoT
improve_summary_prompt_start + _block + _end Refine an existing merged NDA ToT (refinement), GoT/GoT2 (refinement)
score_prompt_base + score_prompt_block + score_prompt_end LLM-based quality scoring All methods (scoring phase)
aggregate_full_prompt_base + _block1 + _mid + _block2 Combine summaries with original docs visible GoT (aggregation at full level)
aggregate_sub_prompt_base + _generate Combine partial summaries (no originals shown) GoT2 (subpart aggregation)

Key Design Decisions

  • XML tag structure: All merged output uses <Merged>/</Merged> tags; scoring uses <Redundancy> and <Retained> tags. This enables deterministic parsing.
  • Document numbering: Templates dynamically number documents based on list length, supporting variable numbers of NDAs.
  • Two-level aggregation: GoT2 distinguishes between subpart aggregation (only summaries visible) and full aggregation (original docs also visible), using different prompt templates for each.
  • Parts tracking: A parts set in the thought state tracks which original document indices have been processed, enabling the prompter to present only relevant documents.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment