Environment:Iamhankai Forest of Thought Python CUDA Runtime
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Deep_Learning, LLMs |
| Last Updated | 2026-02-14 03:30 GMT |
Overview
Linux environment with Python 3.10, CUDA >= 11.7, PyTorch 2.3.0, and HuggingFace Transformers 4.41.2 for running Forest-of-Thought LLM reasoning experiments.
Description
This environment provides the full GPU-accelerated runtime required by the Forest-of-Thought framework. The system loads large language models (LLaMA, Qwen, GLM, DeepSeek) using HuggingFace Transformers with automatic device mapping (device_map='auto') and mixed-precision inference (torch.float16 for pipeline mode, torch.bfloat16 for direct model loading). All inference is hardcoded to run on CUDA (self.device = "cuda"), making an NVIDIA GPU mandatory.
The framework also relies on SymPy for symbolic math verification, Pandas and the HuggingFace Datasets library for benchmark data loading, and NumPy for numerical operations within the MCTS and BFS search algorithms.
Usage
Use this environment for all Forest-of-Thought workflows: FoT Benchmark Evaluation (MCTS/CoT/ToT on GSM8K, MATH500, AIME), Game24 Forest Solving (BFS-based), and CGDM Post-Processing (LLM-as-judge). Every implementation in this repository requires this runtime since model loading is a universal prerequisite.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (Ubuntu recommended) | Conda environment setup documented in README |
| Hardware | NVIDIA GPU with CUDA support | Device is hardcoded to `"cuda"` in `models/load_local_model.py:L15` |
| VRAM | Minimum 16GB (40GB+ recommended) | 7B models in bfloat16 require ~14GB; larger models (QwQ-32B) require 40GB+ |
| Python | 3.10 | Specified in README conda create command |
| CUDA | >= 11.7 | Specified in README requirements section |
| Disk | 50GB+ SSD | Model weights (8B model ~16GB), datasets, and output logs |
Dependencies
System Packages
- CUDA Toolkit >= 11.7
- `conda` (for virtual environment management)
- `git` (for cloning repository)
Python Packages
- `torch` == 2.3.0
- `transformers` == 4.41.2
- `datasets` == 3.1.0
- `sympy` == 1.12
- `numpy` == 1.24.3
- `pandas` == 2.0.3
- `tqdm` == 4.65.0
- `openai` == 0.27.7
- `aiohttp` == 3.8.4
- `backoff` == 2.2.1
- `requests` == 2.31.0
- `mpmath` == 1.3.0
Credentials
No mandatory credentials for the local model inference path. For the optional OpenAI GPT-based path (models/models.py), see the Environment:Iamhankai_Forest_of_Thought_OpenAI_API_Credentials environment page.
Quick Install
# Create conda environment
conda create -n fot python=3.10 -y
conda activate fot
# Install all required packages
pip install -r requirements.txt
# Or install individually:
pip install torch==2.3.0 transformers==4.41.2 datasets==3.1.0 sympy==1.12 numpy==1.24.3 pandas==2.0.3 tqdm==4.65.0 openai==0.27.7 aiohttp==3.8.4 backoff==2.2.1 requests==2.31.0 mpmath==1.3.0
Code Evidence
CUDA device hardcoded in `models/load_local_model.py:L15`:
self.device = "cuda"
Model loading with bfloat16 and auto device mapping in `models/load_local_model.py:L40-46`:
self.model = AutoModelForCausalLM.from_pretrained(
self.model_id,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True,
device_map='auto',
).eval()
Pipeline mode with float16 in `models/load_local_model.py:L31-37`:
self.pipeline = transformers.pipeline(
"text-generation",
model=self.model_id,
model_kwargs={"torch_dtype": torch.float16},
device_map='auto',
trust_remote_code=True,
)
CUDA GPU requirement enforced via `scripts/game24/run.py:L9`:
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
Python and CUDA requirements from `README.md:L13-17`:
Python == 3.10
CUDA Version >= 11.7
pip install -r requirements.txt
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ValueError: Input length of input_ids is X, but max_length is set to Y` | Input prompt exceeds the configured `max_length` | The code auto-recovers by parsing the error and extending `max_length` by 100 (see `load_local_model.py:L73-75`) |
| `RuntimeError: CUDA out of memory` | Insufficient GPU VRAM for the model | Use a smaller model, reduce `max_new_tokens`, or use a GPU with more VRAM |
| `torch.cuda.is_available()` returns `False` | No CUDA-capable GPU found | Install CUDA toolkit >= 11.7 and verify GPU drivers with `nvidia-smi` |
| `ImportError: No module named 'transformers'` | Missing Python dependency | Run `pip install -r requirements.txt` |
Compatibility Notes
- GPU Required: CPU-only execution is not supported. The device is hardcoded to `"cuda"` without a CPU fallback path.
- Multi-GPU: Supported via `device_map='auto'` which uses HuggingFace Accelerate for automatic model sharding across available GPUs.
- Model Architectures: Supports LLaMA, Qwen, GLM, DeepSeek, and Mistral model families via architecture-specific inference paths in `Pipeline`.
- trust_remote_code: All model loading uses `trust_remote_code=True`, meaning custom model code from HuggingFace Hub will be executed. Only use trusted model repositories.
Related Pages
- Implementation:Iamhankai_Forest_of_Thought_Pipeline_Init
- Implementation:Iamhankai_Forest_of_Thought_Monte_Carlo_Forest
- Implementation:Iamhankai_Forest_of_Thought_Monte_Carlo_Tree
- Implementation:Iamhankai_Forest_of_Thought_ToT_Task_Run
- Implementation:Iamhankai_Forest_of_Thought_CoT_Task_Run
- Implementation:Iamhankai_Forest_of_Thought_Forest_Solve
- Implementation:Iamhankai_Forest_of_Thought_CGDM_Get_Best_Answer