Principle:Promptfoo Promptfoo Evaluation Execution
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Testing |
| Last Updated | 2026-02-14 08:00 GMT |
Overview
An evaluation orchestration mechanism that executes all combinations of prompts, providers, and test cases with controlled concurrency, collecting graded results.
Description
Evaluation Execution is the core runtime phase of LLM testing. Given a fully resolved TestSuite, the evaluator generates a matrix of all (prompt x provider x test) combinations and executes them in parallel with configurable concurrency limits.
For each combination, the evaluator:
- Renders the prompt template with test variables using Nunjucks
- Calls the provider's API with the rendered prompt
- Runs all assertions against the provider's response
- Records latency, token usage, cost, and grading results
- Supports multi-turn conversations and abort signals
This mechanism solves the challenge of efficiently running potentially thousands of test-provider-prompt combinations while managing rate limits, progress reporting, and error recovery.
Usage
Use this principle for running the actual evaluation after configuration loading, provider resolution, and test suite construction. This is the fourth step in the evaluation pipeline and the most computationally intensive.
Theoretical Basis
Pseudo-code Logic:
1. Generate evaluation matrix: prompts × providers × tests × repeats
2. For each cell in matrix (with concurrency control):
a. Render prompt template with test vars
b. Call provider.callApi(renderedPrompt)
c. Apply output transforms if configured
d. Run assertions against response
e. Record result: { pass/fail, score, latency, cost, tokens }
3. Aggregate results into Eval record
4. Update progress bar and emit events
5. Return completed Eval with all results