Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Spcl Graph of thoughts KeywordCountingParser

From Leeroopedia
Knowledge Sources
Domains Response_Parsing, Keyword_Counting
Source File examples/keyword_counting/keyword_counting.py, Lines 856-1023
Superclass graph_of_thoughts.parser.Parser (ABC)
Implements Principle Principle:Spcl_Graph_of_thoughts_Keyword_Counting_Response_Parsing
Last Updated 2026-02-14

Overview

KeywordCountingParser is a domain-specific parser that extracts JSON frequency dictionaries from LLM text responses and updates thought state dictionaries for the keyword counting task. It subclasses the abstract Parser base class and implements its five abstract methods. The class is defined in the keyword counting example file.

Description

The KeywordCountingParser manages a response cache (self.cache) and provides a key helper method strip_answer_json that underpins all JSON extraction logic. It handles two fundamentally different response formats: paragraph/sentence split JSON (for GoT decomposition) and frequency dictionary JSON (for counting and aggregation results).

Code Reference

Helper Method

def strip_answer_json(self, text: str) -> str:
    """
    Extracts JSON from LLM response text.
    1. Strip whitespace
    2. If "Output:" present, take everything after it
    3. Find last '{' and last '}' positions
    4. Extract substring between them (inclusive)
    5. Validate with json.loads(); return '{}' on failure
    """
    text = text.strip()
    if "Output:" in text:
        text = text[text.index("Output:") + len("Output:"):].strip()
    start = text.rfind("{")
    end = text.rfind("}")
    if start == -1 or end == -1:
        return "{}"
    text = text[start : end + 1]
    try:
        json.loads(text)
        return text
    except:
        return "{}"

Key Methods

class KeywordCountingParser(parser.Parser):
    def __init__(self) -> None:
        self.cache = {}

    def parse_generate_answer(self, state: Dict, texts: List[str]) -> List[Dict]:
        """
        Two code paths:
        1. GoT phase 0 (split): Extract JSON with 'Paragraph'/'Sentence' keys,
           create one state per key with phase=1, sub_text, part, current="".
        2. All other: Extract frequency dict via strip_answer_json,
           set as current with phase=2.
        """

    def parse_aggregation_answer(self, states: List[Dict], texts: List[str]) -> Union[Dict, List[Dict]]:
        """
        Extracts merged frequency dictionary from response.
        - Concatenates sub_text from both input states
        - Stores pre-aggregation dicts in aggr1 and aggr2 fields
        - Handles 0 or 1 input states by substituting empty dicts
        - Asserts at most 2 input states
        """

    def parse_improve_answer(self, state: Dict, texts: List[str]) -> Dict:
        """
        Extracts corrected frequency dictionary.
        Asserts exactly 1 text. Returns updated state with new current.
        """

    def parse_validation_answer(self, state: Dict, texts: List[str]) -> bool:
        """Not implemented (returns None). Validation uses programmatic valid_aggregation."""

    def parse_score_answer(self, states: List[Dict], texts: List[str]) -> List[float]:
        """Not implemented (returns None). Scoring uses programmatic num_errors."""

Detailed parse_aggregation_answer Logic

def parse_aggregation_answer(self, states, texts):
    assert len(states) <= 2
    if len(states) == 0:
        states = [{"current": "{}", "sub_text": ""}, {"current": "{}", "sub_text": ""}]
    elif len(states) == 1:
        states.append({"current": "{}", "sub_text": ""})
    new_states = []
    for text in texts:
        answer = self.strip_answer_json(text)
        new_state = states[0].copy()
        new_state["sub_text"] = (
            states[0].get("sub_text", "") + states[1].get("sub_text", "")
        )
        new_state["current"] = answer
        new_state["aggr1"] = states[0]["current"]
        new_state["aggr2"] = states[1]["current"]
        new_states.append(new_state)
    return new_states

I/O Contract

Input

Parameter Type Description
state Dict Current thought state with keys: original, current, method, phase
states List[Dict] For aggregation: at most 2 states with frequency dictionaries
texts List[str] Raw LLM response strings containing JSON dictionaries

Output

Method Return Type Description
parse_generate_answer List[Dict] Split: one state per paragraph/sentence. Count: state with frequency dict as current
parse_aggregation_answer List[Dict] State with merged frequency dict, aggr1, aggr2, combined sub_text
parse_improve_answer Dict State with corrected frequency dict as current
parse_validation_answer bool Not implemented (returns None); validation is programmatic
parse_score_answer List[float] Not implemented (returns None); scoring is programmatic

Usage Examples

Parsing a Split Response

parser = KeywordCountingParser()
state = {
    "original": "Alexandra boarded the first flight...",
    "current": "",
    "method": "got4",
    "phase": 0,
}
texts = ['{"Paragraph 1": "Alexandra boarded...", "Paragraph 2": "Her first stop...", "Paragraph 3": "The adventure...", "Paragraph 4": "Journeying westward..."}']
new_states = parser.parse_generate_answer(state, texts)
# Returns 4 states, each with sub_text, part ("Paragraph 1" etc.), phase=1, current=""

Parsing a Frequency Count Response

parser = KeywordCountingParser()
state = {"original": "...", "current": "", "method": "io", "phase": 0}
texts = ['Output: {"Canada": 1, "Mexico": 1, "Brazil": 1}']
new_states = parser.parse_generate_answer(state, texts)
# Returns [{"current": '{"Canada": 1, "Mexico": 1, "Brazil": 1}', "phase": 2, ...}]

Parsing an Aggregation Response

parser = KeywordCountingParser()
states = [
    {"current": '{"Canada": 1}', "sub_text": "First paragraph..."},
    {"current": '{"Mexico": 1}', "sub_text": "Second paragraph..."},
]
texts = ['{"Canada": 1, "Mexico": 1}']
new_states = parser.parse_aggregation_answer(states, texts)
# Returns [{"current": '{"Canada": 1, "Mexico": 1}', "aggr1": '{"Canada": 1}',
#           "aggr2": '{"Mexico": 1}', "sub_text": "First paragraph...Second paragraph..."}]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment