Implementation:Diagram of thought Diagram of thought Node Edge Status Regex Extraction
| Knowledge Sources | |
|---|---|
| Domains | Parsing, Trace_Extraction, Protocol_Design |
| Last Updated | 2026-02-14 |
Overview
Concrete pattern for extracting @node, @edge, and @status typed records from Diagram of Thought (DoT) output using regular expressions. This is a pattern doc -- there is no library to install. Users implement this parser themselves in their own codebase.
Description
Three regular expression patterns extract structured records from the raw DoT text stream:
- Node pattern (
@node id=(\d+) role=(\w+)): Captures the integer node identifier and the role string. - Edge pattern (
@edge src=(\d+) dst=(\d+) kind=(\w+)): Captures the source node identifier, destination node identifier, and the relationship kind. - Status pattern (
@status target=(\d+) mark=(\w+)): Captures the target node identifier and the validation mark.
Each pattern matches a single record type unambiguously. The patterns rely on the fixed field ordering and the @-prefixed syntax of the DoT serialization protocol, which ensures that natural language text in the same stream does not produce false matches.
Usage
Apply this extraction pattern after capturing raw DoT output from the LLM and before DAG reconstruction. The parsed records are the input to any downstream step that operates on the reasoning graph structure -- building an adjacency list, checking acyclicity, filtering validated nodes, or rendering a visualization.
Code Reference
Source Location
- Repository: Diagram of Thought
- File: README.md
- Lines: L68-72 (typed record format), L116-124 (structural view with parsing)
Signature
import re
NODE_PATTERN = r"@node id=(\d+) role=(\w+)"
EDGE_PATTERN = r"@edge src=(\d+) dst=(\d+) kind=(\w+)"
STATUS_PATTERN = r"@status target=(\d+) mark=(\w+)"
def parse_dot_records(raw_output: str):
"""Extract all typed records from raw DoT output text."""
nodes = re.findall(NODE_PATTERN, raw_output)
edges = re.findall(EDGE_PATTERN, raw_output)
statuses = re.findall(STATUS_PATTERN, raw_output)
return nodes, edges, statuses
Import
import re
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| raw_output | str | Yes | Raw DoT output text containing interleaved natural language and typed records |
Outputs
| Name | Type | Description |
|---|---|---|
| nodes | list[tuple(int, str)] | List of extracted nodes, each as (id, role). Roles: problem, proposer, critic, summarizer
|
| edges | list[tuple(int, int, str)] | List of extracted edges, each as (src, dst, kind). Kinds: use, critique, refine
|
| statuses | list[tuple(int, str)] | List of extracted statuses, each as (target, mark). Marks: validated, invalidated
|
Usage Examples
Complete Python Parser
import re
from dataclasses import dataclass
@dataclass
class Node:
id: int
role: str # problem | proposer | critic | summarizer
@dataclass
class Edge:
src: int
dst: int
kind: str # use | critique | refine
@dataclass
class Status:
target: int
mark: str # validated | invalidated
def parse_dot_records(raw_output: str):
"""Extract all typed records from raw DoT output and return structured objects."""
nodes = [
Node(int(m[0]), m[1])
for m in re.findall(r"@node id=(\d+) role=(\w+)", raw_output)
]
edges = [
Edge(int(m[0]), int(m[1]), m[2])
for m in re.findall(r"@edge src=(\d+) dst=(\d+) kind=(\w+)", raw_output)
]
statuses = [
Status(int(m[0]), m[1])
for m in re.findall(r"@status target=(\d+) mark=(\w+)", raw_output)
]
return nodes, edges, statuses
Example Invocation
raw = """
<problem>
How many times does the letter "r" appear in the word "strawberry"?
@node id=1 role=problem
<proposer>
Let me step through each letter: s-t-r-a-w-b-e-r-r-y. I count 3 occurrences of "r".
@node id=2 role=proposer
@edge src=1 dst=2 kind=use
<critic>
Checking the count: position 3 is "r", position 8 is "r", position 9 is "r". Confirmed 3.
@node id=3 role=critic
@edge src=2 dst=3 kind=critique
@status target=2 mark=validated
"""
nodes, edges, statuses = parse_dot_records(raw)
# nodes: [Node(id=1, role='problem'), Node(id=2, role='proposer'), Node(id=3, role='critic')]
# edges: [Edge(src=1, dst=2, kind='use'), Edge(src=2, dst=3, kind='critique')]
# statuses: [Status(target=2, mark='validated')]