Principle:Diagram of thought Diagram of thought Raw Output Collection
Overview
Raw Output Collection is the principle of capturing the complete structured reasoning trace produced by an iterative Diagram of Thought reasoning process, preserving all XML role blocks and typed records as a single text stream for downstream extraction and analysis.
Description
The first step in trace extraction is capturing the raw LLM response. This text stream contains the entire reasoning DAG encoded as interleaved XML-tagged blocks (<proposer>, <critic>, <summarizer>) with typed records (@node, @edge, @status). Faithful capture is essential for downstream parsing and analysis. Any truncation or alteration of the output would destroy the structural information embedded in the response, making it impossible to reconstruct the reasoning graph.
The raw output follows a predictable pattern:
| Component | Description | Example |
|---|---|---|
| XML Role Blocks | Delimit reasoning phases | <proposer>...</proposer>
|
| @node records | Declare graph vertices with role annotations | @node id=2 role=proposer
|
| @edge records | Declare directed edges with dependency kinds | @edge src=1 dst=2 kind=use
|
| @status records | Mark propositions as validated or invalidated | @status target=2 mark=validated
|
Usage
Raw Output Collection should be applied:
- After a DoT reasoning session completes -- once the LLM has finished generating its full response containing all proposer, critic, and summarizer blocks.
- As the first step in post-hoc analysis -- before any parsing, graph extraction, or evaluation can occur, the complete text must be faithfully captured and stored.
- When building audit trails -- for safety, debugging, and verification workflows that require access to the unmodified reasoning trace.
Theoretical Basis
The raw output is the serialized form of the reasoning DAG. The operational view described in the paper (Zhang & Yao, 2024) treats the LLM output as a single text stream with embedded structural markers. The model generates interleaved role tokens -- <proposer> emits candidate propositions, <critic> evaluates and marks them, and <summarizer> aggregates validated nodes into a final answer.
The typed protocol (@node, @edge, @status) ensures this stream can be deterministically deserialized back into a graph structure. Because @edge records enforce that source IDs are strictly less than destination IDs (src < dst), the resulting graph is guaranteed to be acyclic. This structural invariant is encoded directly in the text and preserved through faithful raw output collection.
Pseudo-code:
raw_output = llm.generate(system=dot_prompt, user=problem)
# raw_output contains: <proposer>..@node..</proposer><critic>..@edge..@status..</critic>...
The raw output string is library-agnostic -- it is simply the complete text returned by any LLM capable of following the DoT prompt. The only requirement is that the capture mechanism preserves the full response without truncation, ensuring all XML tags and typed records remain intact for downstream deserialization.
Related Pages
- Implementation:Diagram_of_thought_Diagram_of_thought_LLM_Response_Capture -- Concrete patterns for capturing the raw DoT output using specific LLM client libraries.