Implementation:Iamhankai Forest of Thought Pipeline Init
| Knowledge Sources | |
|---|---|
| Domains | NLP, Model_Loading |
| Last Updated | 2026-02-14 03:00 GMT |
Overview
Concrete tool for loading local LLM models with architecture-specific inference routing provided by the Forest-of-Thought repository.
Description
The Pipeline class provides a unified interface for loading and querying multiple LLM architectures. It loads model weights and tokenizer from a HuggingFace checkpoint path, detects the model type, and routes generation requests to architecture-specific methods (get_respond_llama, get_respond_qwen, get_respond_glm, get_respond_deepseek). For Mistral models and Game24 tasks, it uses the HuggingFace text-generation pipeline instead.
The class also implements optional self-correction via log-probability confidence scoring, where low-confidence generations are automatically re-prompted.
Usage
Import and instantiate at the beginning of any FoT experiment. The Pipeline is typically assigned to a global client variable and shared across all tree search functions.
Code Reference
Source Location
- Repository: Forest-of-Thought
- File: models/load_local_model.py
- Lines: L11-104
Signature
class Pipeline:
def __init__(
self,
model_id: str = "",
model_type: str = "llama",
correction: bool = False,
correct_threshold: float = -1,
dataname: str = "gsm8k",
task: str = "benchmark"
):
"""
Args:
model_id: Path to HuggingFace model checkpoint.
model_type: Architecture type (llama/qwen/glm/deepseek/mistral).
correction: Enable self-correction via confidence scoring.
correct_threshold: Log-probability threshold for correction trigger.
dataname: Dataset name for context-specific behavior.
task: Task type ('benchmark' or 'game24').
"""
def get_respond(
self,
messages: list,
max_length: int = 1024
) -> tuple:
"""
Args:
messages: Chat messages with 'role' and 'content' keys.
max_length: Maximum generation length.
Returns:
Tuple of (response_text: str, confidence: float).
"""
Import
from models.load_local_model import Pipeline
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_id | str | Yes | Path to HuggingFace model checkpoint directory |
| model_type | str | No | Architecture identifier: llama, qwen, glm, deepseek, mistral (default: llama) |
| correction | bool | No | Enable self-correction mechanism (default: False) |
| correct_threshold | float | No | Log-probability threshold for correction trigger (default: -1) |
| dataname | str | No | Dataset name for context-specific formatting (default: gsm8k) |
| task | str | No | Task type: benchmark or game24 (default: benchmark) |
Outputs
| Name | Type | Description |
|---|---|---|
| Pipeline instance | Pipeline | Initialized model with self.model, self.tokenizer loaded on CUDA |
| get_respond() returns | tuple[str, float] | (response_text, confidence_score) from model generation |
Usage Examples
Basic Model Loading
from models.load_local_model import Pipeline
# Load Qwen model for benchmark evaluation
client = Pipeline(
model_id="/path/to/Qwen2.5-7B-Instruct",
model_type="qwen",
correction=False,
task="benchmark"
)
# Generate response
messages = [{"role": "user", "content": "Solve: What is 2 + 3?"}]
response, confidence = client.get_respond(messages, max_length=1024)
print(response)
With Self-Correction
# Load with self-correction enabled
client = Pipeline(
model_id="/path/to/Llama-3-8B-Instruct",
model_type="llama",
correction=True,
correct_threshold=-0.5,
task="benchmark"
)
# Low-confidence responses will be automatically re-prompted
response, confidence = client.get_respond(messages)