Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Iamhankai Forest of Thought Mcts Load Data

From Leeroopedia
Knowledge Sources
Domains Data_Engineering, Preprocessing
Last Updated 2026-02-14 03:00 GMT

Overview

Concrete tool for loading and preprocessing benchmark datasets provided by the Forest-of-Thought repository.

Description

The mcts_load_data function loads benchmark datasets from disk, handles format conversion (JSONL to Parquet), applies difficulty-level filtering for MATH problems, and slices to the requested sample range. It uses the HuggingFace datasets library for efficient data handling and returns a Dataset object ready for iteration.

Usage

Call this function after parsing command-line arguments. The returned Dataset object is iterated in the main evaluation loop, with each example passed to Monte_Carlo_Forest.run() for tree-search reasoning.

Code Reference

Source Location

Signature

def mcts_load_data(args):
    """
    Load and preprocess benchmark dataset for FoT evaluation.

    Args:
        args (argparse.Namespace): Configuration with fields:
            - dataset (str): Dataset name (must contain 'gsm', 'math', or 'aime')
            - dataset_filepath (str): Path to JSONL or Parquet file
            - level (int): MATH difficulty level filter (1-5)
            - start_id (int): Start index for sample range
            - end_id (int): End index for sample range

    Returns:
        datasets.Dataset: Filtered and sliced HuggingFace Dataset object.
    """

Import

from utils.utils import mcts_load_data

I/O Contract

Inputs

Name Type Required Description
args argparse.Namespace Yes Configuration namespace with dataset, dataset_filepath, level, start_id, end_id

Outputs

Name Type Description
dataset datasets.Dataset HuggingFace Dataset sliced to [start_id, end_id), optionally filtered by level

Usage Examples

Loading GSM8K Dataset

from utils.utils import mcts_load_data
import argparse

args = argparse.Namespace(
    dataset="gsm8k",
    dataset_filepath="/data/gsm8k/test.jsonl",
    level=1,
    start_id=0,
    end_id=100
)

dataset = mcts_load_data(args)
print(f"Loaded {len(dataset)} examples")
# Access: dataset[0]['question'], dataset[0]['answer']

Loading MATH with Level Filter

args = argparse.Namespace(
    dataset="math500",
    dataset_filepath="/data/math/test.jsonl",
    level=5,  # Only hardest problems
    start_id=0,
    end_id=500
)

dataset = mcts_load_data(args)
# Access: dataset[0]['problem'], dataset[0]['answer']

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment