Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Spcl Graph of thoughts Custom GoT Use Case Integration

From Leeroopedia
Knowledge Sources
Domains LLM_Reasoning, Graph_Based_Inference, Framework_Extension
Last Updated 2026-02-14 04:00 GMT

Overview

End-to-end process for integrating a new problem domain into the Graph of Thoughts framework by implementing custom Prompter and Parser classes, designing an appropriate Graph of Operations topology, and wiring everything into the Controller for execution.

Description

This workflow describes how to add a new use case to the GoT framework. The framework separates concerns into four components: the language model interface, the operation graph topology, the domain-specific prompt generation (Prompter), and the domain-specific response parsing (Parser). Adding a new problem requires implementing concrete subclasses of the abstract Prompter and Parser, designing a graph topology that matches the problem structure (linear for simple tasks, branching for decompose-and-merge patterns), and creating a scoring function to evaluate intermediate and final results. Optionally, a new language model backend can be added by subclassing AbstractLanguageModel.

Usage

Execute this workflow when you want to apply Graph of Thoughts reasoning to a new problem domain that is not already covered by the existing examples (sorting, keyword counting, document merging, set intersection). The problem should be decomposable into sub-problems or benefit from generating and evaluating multiple solution candidates.

Execution Steps

Step 1: Define the Problem Interface

Specify the problem by defining: the input format (what data the LLM receives), the output format (what the LLM should produce), the evaluation criteria (how to judge correctness or quality), and whether the problem is decomposable into independent sub-problems. Determine which reasoning approaches are appropriate (IO, CoT, ToT, GoT, or combinations).

Key considerations:

  • Problems with verifiable correctness can use local scoring functions (faster, cheaper)
  • Problems with subjective quality require LLM-based scoring (more expensive)
  • Decomposable problems benefit most from the full GoT approach
  • Define the initial thought state dictionary with all required fields

Step 2: Implement the Custom Prompter

Create a concrete subclass of the abstract Prompter class, implementing all five abstract methods: generate_prompt, aggregation_prompt, improve_prompt, validation_prompt, and score_prompt. Each method receives thought state dictionaries and returns a prompt string. Design few-shot examples that demonstrate the expected input-output format for each operation type.

What happens:

  • generate_prompt: Creates the main task prompt (may vary by method and phase)
  • aggregation_prompt: Creates prompts for merging two intermediate results
  • improve_prompt: Creates prompts for refining an incorrect or suboptimal result
  • validation_prompt: Creates prompts for checking result correctness
  • score_prompt: Creates prompts for LLM-based quality evaluation (if needed)
  • Methods not used by your graph topology can return None or pass

Step 3: Implement the Custom Parser

Create a concrete subclass of the abstract Parser class, implementing all five abstract methods: parse_generate_answer, parse_aggregation_answer, parse_improve_answer, parse_validation_answer, and parse_score_answer. Each method receives thought states and LLM response texts, and returns updated thought state dictionaries. Handle malformed LLM responses gracefully with fallback defaults.

Key considerations:

  • The parser must extract structured data from free-text LLM responses
  • Use robust parsing (regex, JSON parsing with error handling, tag extraction)
  • Thought state updates must include all fields needed by subsequent operations
  • Phase tracking in the thought state allows different parsing logic per stage
  • Cache repeated parsing results if the same response may be processed multiple times

Step 4: Design the Graph of Operations Topology

Construct a GraphOfOperations instance using the 9 available operation types: Generate, Score, Aggregate, KeepBestN, KeepValid, ValidateAndImprove, Improve, GroundTruth, and Selector. Choose between linear graphs (append_operation for sequential pipelines) and branching graphs (add_operation with explicit predecessor relationships for parallel branches).

What happens:

  • Linear topology (IO/CoT): Generate → Score → GroundTruth
  • Tree topology (ToT): Generate(N) → Score → KeepBestN → repeated levels
  • Graph topology (GoT): Generate(split) → Selector branches → parallel processing → Aggregate → refine
  • Each operation is instantiated with parameters (num_branches, num_responses, scoring_function)
  • Predecessors are linked explicitly for branching; append_operation handles linear chains
  • The Selector operation filters thoughts by custom predicates (e.g., sublist ID)

Step 5: Create the Scoring Function

Implement a scoring function that evaluates thought states. For problems with ground truth, create a local Python function that compares the current result to the correct answer and returns an error count (lower is better). For problems without ground truth, use the score_prompt in the Prompter to have the LLM evaluate quality.

Key considerations:

  • Local scoring functions receive a thought state dict and return a float
  • The Score operation can use either local functions or LLM-based scoring
  • For decomposed problems, the scoring function should handle sub-problem evaluation
  • GroundTruth operations compare the final result to a known correct answer
  • ValidateAndImprove operations use a boolean validation function

Step 6: Wire into Controller and Execute

Create the Controller with the language model, Graph of Operations, custom Prompter, custom Parser, and initial problem parameters. The initial parameters dictionary should contain all fields referenced by the Prompter and Parser, plus method-specific metadata. Call Controller.run() to execute and Controller.output_graph() to serialize results.

What happens:

  • The Controller takes ownership of all components
  • Initial problem parameters become the state of the root Thought
  • Controller.run() executes operations in BFS order based on predecessor completion
  • Controller.get_final_thoughts() retrieves results from leaf operations
  • Controller.output_graph() writes the complete execution trace to JSON

Step 7: Add Dataset and Benchmarking Infrastructure

Create a dataset (CSV or JSON) with input samples and ground truth (if available). Implement a run() function that iterates over samples and methods, managing budget and result serialization. Optionally, create a plotting script to visualize results across different reasoning approaches.

Key considerations:

  • Follow the existing example pattern: data loading, results directory creation, config saving, logging
  • Budget management prevents runaway API costs
  • Each method-sample pair produces a separate result JSON file
  • Dataset generators can create synthetic benchmarks with known answers

Execution Diagram

GitHub URL

Workflow Repository