Principle:Iamhankai Forest of Thought LLM as Judge

Knowledge Sources	Judging LLM-as-a-Judge Forest-of-Thought
Domains	Evaluation, Reasoning
Last Updated	2026-02-14 03:00 GMT

Overview

A pattern that uses a language model as an expert evaluator to select the best answer from multiple candidates when automated voting fails to reach consensus.

Description

LLM-as-Judge leverages the reasoning capabilities of large language models to act as expert arbitrators. When multiple candidate answers tie in majority voting, an LLM judge (typically a strong reasoning model like QwQ-32B) is prompted to analyze all candidates in the context of the original question and select the most correct one. This combines the breadth of ensemble generation with the depth of expert evaluation.

In the FoT CGDM pipeline, the judge is used as a fallback when:

Majority voting produces a tie
The "best answer" selection prompt formats all candidates for the judge
The judge can also generate a completely new answer if no candidate is satisfactory

Usage

Used as the second stage of CGDM when majority voting fails to produce a clear winner. Also available as a standalone tool in the cgdm/cgdm.py module for post-processing evaluation.

Theoretical Basis

LLM-as-Judge leverages meta-reasoning: using a model's ability to evaluate and compare solutions rather than just generate them. The judge prompt includes:

The original problem statement
All candidate answers formatted for comparison
An instruction to select the most correct answer

This is more effective than random selection because the judge can:

Verify mathematical steps in each candidate
Identify common errors across candidates
Apply domain knowledge to disambiguate

Related Pages

Implemented By

Implementation:Iamhankai_Forest_of_Thought_CGDM_Get_Best_Answer

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment