Implementation:Spcl Graph of thoughts KeywordCountingParser

Knowledge Sources	Graph of Thoughts Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Domains	Response_Parsing, Keyword_Counting
Source File	`examples/keyword_counting/keyword_counting.py`, Lines 856-1023
Superclass	`graph_of_thoughts.parser.Parser` (ABC)
Implements Principle	Principle:Spcl_Graph_of_thoughts_Keyword_Counting_Response_Parsing
Last Updated	2026-02-14

Overview

KeywordCountingParser is a domain-specific parser that extracts JSON frequency dictionaries from LLM text responses and updates thought state dictionaries for the keyword counting task. It subclasses the abstract Parser base class and implements its five abstract methods. The class is defined in the keyword counting example file.

Description

The KeywordCountingParser manages a response cache (self.cache) and provides a key helper method strip_answer_json that underpins all JSON extraction logic. It handles two fundamentally different response formats: paragraph/sentence split JSON (for GoT decomposition) and frequency dictionary JSON (for counting and aggregation results).

Code Reference

Helper Method

def strip_answer_json(self, text: str) -> str:
    """
    Extracts JSON from LLM response text.
    1. Strip whitespace
    2. If "Output:" present, take everything after it
    3. Find last '{' and last '}' positions
    4. Extract substring between them (inclusive)
    5. Validate with json.loads(); return '{}' on failure
    """
    text = text.strip()
    if "Output:" in text:
        text = text[text.index("Output:") + len("Output:"):].strip()
    start = text.rfind("{")
    end = text.rfind("}")
    if start == -1 or end == -1:
        return "{}"
    text = text[start : end + 1]
    try:
        json.loads(text)
        return text
    except:
        return "{}"

Key Methods

class KeywordCountingParser(parser.Parser):
    def __init__(self) -> None:
        self.cache = {}

    def parse_generate_answer(self, state: Dict, texts: List[str]) -> List[Dict]:
        """
        Two code paths:
        1. GoT phase 0 (split): Extract JSON with 'Paragraph'/'Sentence' keys,
           create one state per key with phase=1, sub_text, part, current="".
        2. All other: Extract frequency dict via strip_answer_json,
           set as current with phase=2.
        """

    def parse_aggregation_answer(self, states: List[Dict], texts: List[str]) -> Union[Dict, List[Dict]]:
        """
        Extracts merged frequency dictionary from response.
        - Concatenates sub_text from both input states
        - Stores pre-aggregation dicts in aggr1 and aggr2 fields
        - Handles 0 or 1 input states by substituting empty dicts
        - Asserts at most 2 input states
        """

    def parse_improve_answer(self, state: Dict, texts: List[str]) -> Dict:
        """
        Extracts corrected frequency dictionary.
        Asserts exactly 1 text. Returns updated state with new current.
        """

    def parse_validation_answer(self, state: Dict, texts: List[str]) -> bool:
        """Not implemented (returns None). Validation uses programmatic valid_aggregation."""

    def parse_score_answer(self, states: List[Dict], texts: List[str]) -> List[float]:
        """Not implemented (returns None). Scoring uses programmatic num_errors."""

Detailed parse_aggregation_answer Logic

def parse_aggregation_answer(self, states, texts):
    assert len(states) <= 2
    if len(states) == 0:
        states = [{"current": "{}", "sub_text": ""}, {"current": "{}", "sub_text": ""}]
    elif len(states) == 1:
        states.append({"current": "{}", "sub_text": ""})
    new_states = []
    for text in texts:
        answer = self.strip_answer_json(text)
        new_state = states[0].copy()
        new_state["sub_text"] = (
            states[0].get("sub_text", "") + states[1].get("sub_text", "")
        )
        new_state["current"] = answer
        new_state["aggr1"] = states[0]["current"]
        new_state["aggr2"] = states[1]["current"]
        new_states.append(new_state)
    return new_states

I/O Contract

Input

Parameter	Type	Description
`state`	`Dict`	Current thought state with keys: `original`, `current`, `method`, `phase`
`states`	`List[Dict]`	For aggregation: at most 2 states with frequency dictionaries
`texts`	`List[str]`	Raw LLM response strings containing JSON dictionaries

Output

Method	Return Type	Description
`parse_generate_answer`	`List[Dict]`	Split: one state per paragraph/sentence. Count: state with frequency dict as `current`
`parse_aggregation_answer`	`List[Dict]`	State with merged frequency dict, `aggr1`, `aggr2`, combined `sub_text`
`parse_improve_answer`	`Dict`	State with corrected frequency dict as `current`
`parse_validation_answer`	`bool`	Not implemented (returns `None`); validation is programmatic
`parse_score_answer`	`List[float]`	Not implemented (returns `None`); scoring is programmatic

Usage Examples

Parsing a Split Response

parser = KeywordCountingParser()
state = {
    "original": "Alexandra boarded the first flight...",
    "current": "",
    "method": "got4",
    "phase": 0,
}
texts = ['{"Paragraph 1": "Alexandra boarded...", "Paragraph 2": "Her first stop...", "Paragraph 3": "The adventure...", "Paragraph 4": "Journeying westward..."}']
new_states = parser.parse_generate_answer(state, texts)
# Returns 4 states, each with sub_text, part ("Paragraph 1" etc.), phase=1, current=""

Parsing a Frequency Count Response

parser = KeywordCountingParser()
state = {"original": "...", "current": "", "method": "io", "phase": 0}
texts = ['Output: {"Canada": 1, "Mexico": 1, "Brazil": 1}']
new_states = parser.parse_generate_answer(state, texts)
# Returns [{"current": '{"Canada": 1, "Mexico": 1, "Brazil": 1}', "phase": 2, ...}]

Parsing an Aggregation Response

parser = KeywordCountingParser()
states = [
    {"current": '{"Canada": 1}', "sub_text": "First paragraph..."},
    {"current": '{"Mexico": 1}', "sub_text": "Second paragraph..."},
]
texts = ['{"Canada": 1, "Mexico": 1}']
new_states = parser.parse_aggregation_answer(states, texts)
# Returns [{"current": '{"Canada": 1, "Mexico": 1}', "aggr1": '{"Canada": 1}',
#           "aggr2": '{"Mexico": 1}', "sub_text": "First paragraph...Second paragraph..."}]

Related Pages

Principle:Spcl_Graph_of_thoughts_Keyword_Counting_Response_Parsing -- Design principle behind this parser
Implementation:Spcl_Graph_of_thoughts_KeywordCountingPrompter -- Companion prompter for generating prompts parsed here
Workflow:Spcl_Graph_of_thoughts_GoT_Keyword_Counting_Pipeline -- End-to-end pipeline using this parser

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment