Workflow:Spcl Graph of thoughts GoT Sorting Pipeline

Knowledge Sources	Graph of Thoughts Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Domains	LLM_Reasoning, Graph_Based_Inference, Benchmarking
Last Updated	2026-02-14 04:00 GMT

Overview

End-to-end process for sorting lists of numbers using Graph of Thoughts (GoT), which decomposes the problem into sublists, sorts them independently, and merges the results with scoring and refinement.

Description

This workflow demonstrates the core GoT pattern applied to sorting: a divide-and-conquer approach where an LLM splits a list into sublists, sorts each sublist independently (generating multiple candidates and keeping the best), then merges the sorted sublists back together. The workflow compares five distinct reasoning approaches (IO, CoT, ToT, ToT2, GoT) on the same problem to benchmark their accuracy and cost. The sorting task supports 32, 64, and 128 element lists, with the GoT graph topology scaling accordingly (more sublists and deeper merge trees for larger inputs).

Usage

Execute this workflow when you want to benchmark or demonstrate Graph of Thoughts on a well-defined algorithmic task. It is appropriate when you have a list of integers (0-9) to sort and want to compare how different LLM reasoning strategies (Input-Output, Chain-of-Thought, Tree-of-Thought, Graph-of-Thought) perform in terms of correctness and API cost.

Execution Steps

Step 1: Environment Setup and Data Loading

Configure the language model by loading API credentials and model settings from a JSON configuration file. Load the benchmark dataset from a CSV file containing unsorted lists paired with their correct sorted forms.

Key considerations:

The config file must contain model ID, token costs, temperature, and API key
Supports both OpenAI ChatGPT and local LLaMA-2 models via HuggingFace
Budget tracking is built in to prevent runaway API costs

Step 2: Graph of Operations Construction

Build the Graph of Operations (GoO) that defines the execution plan. For the GoT approach, this involves creating a directed acyclic graph of operations: a Generate operation to split the input list into sublists, followed by parallel branches where each sublist is selected, sorted (with multiple candidates), scored, and filtered to the best result.

What happens:

A Generate node produces sublist splits (e.g., 2 sublists for 32 elements)
Selector operations route each sublist to its own processing branch
Each branch: Generate (5 candidates) → Score (count errors) → KeepBestN (1)
Branches converge at an Aggregate operation that merges sorted sublists
A final Score → KeepBestN → Generate (improve) → Score → KeepBestN chain refines the result
GroundTruth evaluates correctness against the known answer

Step 3: Prompter and Parser Initialization

Instantiate the domain-specific Prompter and Parser. The SortingPrompter generates different prompt templates depending on the reasoning method (split prompts for GoT, sort-in-one-shot for IO, chain-of-thought decomposition for CoT, and improvement prompts for refinement). The SortingParser extracts sorted lists from LLM responses, handles JSON parsing for split results, and manages thought state transitions between phases.

Key considerations:

Prompts use few-shot examples to guide the LLM
The parser must handle malformed LLM responses gracefully (empty lists, missing brackets)
Phase tracking in the thought state controls which prompt template is used

Step 4: Controller Execution

Create the Controller with the language model, Graph of Operations, Prompter, Parser, and initial problem state. The Controller executes operations in BFS order, only running an operation when all its predecessors are complete. Each operation interacts with the LLM through the Prompter (to generate queries) and Parser (to interpret responses), updating Thought objects that carry state through the graph.

What happens:

The execution queue starts with root operations (those with no predecessors)
Each operation calls the LLM via the Prompter, parses responses via the Parser
Thought states flow from operation to operation, accumulating results
Scoring operations use a local error-counting function (no LLM call needed)
KeepBestN filters thoughts by score, pruning poor candidates

Step 5: Result Collection and Output

After all operations execute, serialize the complete execution graph to JSON. The output captures every operation, its resulting thought states, scores, validation results, and ground truth comparisons. Token usage and cost are appended for budget analysis.

Key considerations:

Each sample produces a separate JSON file per reasoning method
Results include both the solution and the execution trace for analysis
The cost field tracks cumulative prompt and completion tokens

Step 6: Benchmark Comparison and Plotting

Run all five approaches (IO, CoT, ToT, ToT2, GoT) across the full sample set and aggregate results. A separate plotting script reads the result JSON files and generates comparative visualizations showing accuracy vs. cost tradeoffs across methods.

Key considerations:

Budget is shared across all methods and samples; execution stops when depleted
The plot script lives separately and processes archived result directories
GoT typically achieves higher accuracy than linear approaches at moderate cost

Execution Diagram

GitHub URL

Workflow Repository