Principle:Iamhankai Forest of Thought LLM as Judge
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Reasoning |
| Last Updated | 2026-02-14 03:00 GMT |
Overview
A pattern that uses a language model as an expert evaluator to select the best answer from multiple candidates when automated voting fails to reach consensus.
Description
LLM-as-Judge leverages the reasoning capabilities of large language models to act as expert arbitrators. When multiple candidate answers tie in majority voting, an LLM judge (typically a strong reasoning model like QwQ-32B) is prompted to analyze all candidates in the context of the original question and select the most correct one. This combines the breadth of ensemble generation with the depth of expert evaluation.
In the FoT CGDM pipeline, the judge is used as a fallback when:
- Majority voting produces a tie
- The "best answer" selection prompt formats all candidates for the judge
- The judge can also generate a completely new answer if no candidate is satisfactory
Usage
Used as the second stage of CGDM when majority voting fails to produce a clear winner. Also available as a standalone tool in the cgdm/cgdm.py module for post-processing evaluation.
Theoretical Basis
LLM-as-Judge leverages meta-reasoning: using a model's ability to evaluate and compare solutions rather than just generate them. The judge prompt includes:
- The original problem statement
- All candidate answers formatted for comparison
- An instruction to select the most correct answer
This is more effective than random selection because the judge can:
- Verify mathematical steps in each candidate
- Identify common errors across candidates
- Apply domain knowledge to disambiguate