Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Iamhankai Forest of Thought Python CUDA Runtime

From Leeroopedia
Knowledge Sources
Domains Infrastructure, Deep_Learning, LLMs
Last Updated 2026-02-14 03:30 GMT

Overview

Linux environment with Python 3.10, CUDA >= 11.7, PyTorch 2.3.0, and HuggingFace Transformers 4.41.2 for running Forest-of-Thought LLM reasoning experiments.

Description

This environment provides the full GPU-accelerated runtime required by the Forest-of-Thought framework. The system loads large language models (LLaMA, Qwen, GLM, DeepSeek) using HuggingFace Transformers with automatic device mapping (device_map='auto') and mixed-precision inference (torch.float16 for pipeline mode, torch.bfloat16 for direct model loading). All inference is hardcoded to run on CUDA (self.device = "cuda"), making an NVIDIA GPU mandatory.

The framework also relies on SymPy for symbolic math verification, Pandas and the HuggingFace Datasets library for benchmark data loading, and NumPy for numerical operations within the MCTS and BFS search algorithms.

Usage

Use this environment for all Forest-of-Thought workflows: FoT Benchmark Evaluation (MCTS/CoT/ToT on GSM8K, MATH500, AIME), Game24 Forest Solving (BFS-based), and CGDM Post-Processing (LLM-as-judge). Every implementation in this repository requires this runtime since model loading is a universal prerequisite.

System Requirements

Category Requirement Notes
OS Linux (Ubuntu recommended) Conda environment setup documented in README
Hardware NVIDIA GPU with CUDA support Device is hardcoded to `"cuda"` in `models/load_local_model.py:L15`
VRAM Minimum 16GB (40GB+ recommended) 7B models in bfloat16 require ~14GB; larger models (QwQ-32B) require 40GB+
Python 3.10 Specified in README conda create command
CUDA >= 11.7 Specified in README requirements section
Disk 50GB+ SSD Model weights (8B model ~16GB), datasets, and output logs

Dependencies

System Packages

  • CUDA Toolkit >= 11.7
  • `conda` (for virtual environment management)
  • `git` (for cloning repository)

Python Packages

  • `torch` == 2.3.0
  • `transformers` == 4.41.2
  • `datasets` == 3.1.0
  • `sympy` == 1.12
  • `numpy` == 1.24.3
  • `pandas` == 2.0.3
  • `tqdm` == 4.65.0
  • `openai` == 0.27.7
  • `aiohttp` == 3.8.4
  • `backoff` == 2.2.1
  • `requests` == 2.31.0
  • `mpmath` == 1.3.0

Credentials

No mandatory credentials for the local model inference path. For the optional OpenAI GPT-based path (models/models.py), see the Environment:Iamhankai_Forest_of_Thought_OpenAI_API_Credentials environment page.

Quick Install

# Create conda environment
conda create -n fot python=3.10 -y
conda activate fot

# Install all required packages
pip install -r requirements.txt

# Or install individually:
pip install torch==2.3.0 transformers==4.41.2 datasets==3.1.0 sympy==1.12 numpy==1.24.3 pandas==2.0.3 tqdm==4.65.0 openai==0.27.7 aiohttp==3.8.4 backoff==2.2.1 requests==2.31.0 mpmath==1.3.0

Code Evidence

CUDA device hardcoded in `models/load_local_model.py:L15`:

self.device = "cuda"

Model loading with bfloat16 and auto device mapping in `models/load_local_model.py:L40-46`:

self.model = AutoModelForCausalLM.from_pretrained(
    self.model_id,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True,
    device_map='auto',
).eval()

Pipeline mode with float16 in `models/load_local_model.py:L31-37`:

self.pipeline = transformers.pipeline(
    "text-generation",
    model=self.model_id,
    model_kwargs={"torch_dtype": torch.float16},
    device_map='auto',
    trust_remote_code=True,
)

CUDA GPU requirement enforced via `scripts/game24/run.py:L9`:

os.environ["CUDA_VISIBLE_DEVICES"] = "0"

Python and CUDA requirements from `README.md:L13-17`:

Python == 3.10
CUDA Version >= 11.7
pip install -r requirements.txt

Common Errors

Error Message Cause Solution
`ValueError: Input length of input_ids is X, but max_length is set to Y` Input prompt exceeds the configured `max_length` The code auto-recovers by parsing the error and extending `max_length` by 100 (see `load_local_model.py:L73-75`)
`RuntimeError: CUDA out of memory` Insufficient GPU VRAM for the model Use a smaller model, reduce `max_new_tokens`, or use a GPU with more VRAM
`torch.cuda.is_available()` returns `False` No CUDA-capable GPU found Install CUDA toolkit >= 11.7 and verify GPU drivers with `nvidia-smi`
`ImportError: No module named 'transformers'` Missing Python dependency Run `pip install -r requirements.txt`

Compatibility Notes

  • GPU Required: CPU-only execution is not supported. The device is hardcoded to `"cuda"` without a CPU fallback path.
  • Multi-GPU: Supported via `device_map='auto'` which uses HuggingFace Accelerate for automatic model sharding across available GPUs.
  • Model Architectures: Supports LLaMA, Qwen, GLM, DeepSeek, and Mistral model families via architecture-specific inference paths in `Pipeline`.
  • trust_remote_code: All model loading uses `trust_remote_code=True`, meaning custom model code from HuggingFace Hub will be executed. Only use trusted model repositories.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment