Implementation:CrewAIInc CrewAI Crew Test Method

Overview

Concrete method for running performance evaluation with LLM-based scoring and tabular reporting provided by the CrewAI framework.

Source

src/crewai/crew.py:L1796-1842

Signature

def test(
    self,
    n_iterations: int,
    eval_llm: str | BaseLLM,
    inputs: dict[str, Any] | None = None,
) -> None

Parameters

Parameter	Type	Required	Description
`n_iterations`	`int`	Yes	Number of test iterations to run
`eval_llm`	BaseLLM	Yes	The evaluation LLM used to score task outputs (e.g., "openai/gpt-4o" or a BaseLLM instance)
`inputs`	None	No	Optional dictionary of input variables for task interpolation

I/O

Input: n_iterations (number of evaluation loops), eval_llm (LLM used for scoring), and optional inputs (variable dictionary for task templates)
Output: Printed Rich table displaying per-task scores, averages, execution times, and agent assignments across all iterations. No return value (returns None).

Internal Behavior

The test method performs the following steps:

Creates a CrewEvaluator — Instantiates a CrewEvaluator object with the provided eval_llm. This evaluator is responsible for scoring task outputs using the evaluation LLM.
Runs kickoff() n_iterations times — Executes the full crew workflow in a loop. Each iteration runs the complete task pipeline with the provided inputs.
Evaluator scores each task output 1-10 — After each iteration, the CrewEvaluator sends each task's output (along with the task description and expected output) to the evaluation LLM, which returns a numerical score on a 1-10 scale.
Prints Rich table with results — After all iterations complete, a formatted table is rendered using the Rich library, displaying:

- Task descriptions
- Agent assignments
- Per-iteration scores
- Average scores across iterations
- Execution times

Import

from crewai import Crew

Example

from crewai import Crew, Agent, Task, Process

# Define agents and tasks
researcher = Agent(
    role="Senior Research Analyst",
    goal="Uncover cutting-edge developments in {topic}",
    backstory="You are an expert research analyst.",
    verbose=True,
)

writer = Agent(
    role="Content Writer",
    goal="Craft compelling content about {topic}",
    backstory="You are a skilled writer.",
    verbose=True,
)

research_task = Task(
    description="Conduct thorough research about {topic}.",
    expected_output="A detailed research report with key findings.",
    agent=researcher,
)

writing_task = Task(
    description="Write a comprehensive article about {topic}.",
    expected_output="A well-structured article suitable for publication.",
    agent=writer,
)

# Configure crew
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    process=Process.sequential,
    verbose=True,
    memory=True,
)

# Run performance testing: 5 iterations scored by GPT-4o
crew.test(
    n_iterations=5,
    eval_llm="openai/gpt-4o",
    inputs={"topic": "AI"},
)
# Output: Rich table showing per-task scores (1-10),
# averages, execution times, and agent assignments
# across all 5 iterations.

Example Output

The test method produces a Rich table similar to the following:

Task	Agent	Run 1	Run 2	Run 3	Run 4	Run 5	Avg Score	Avg Time
Research about AI	Senior Research Analyst	8	7	9	8	8	8.0	12.3s
Write article about AI	Content Writer	7	8	7	9	8	7.8	8.7s

Key Implementation Details

The eval_llm parameter accepts either a string identifier (e.g., "openai/gpt-4o") or a BaseLLM instance. When a string is provided, CrewAI resolves it to the appropriate LLM provider.
The evaluation LLM is separate from the agents' LLMs. This is by design to avoid self-evaluation bias.
The Rich table output is printed to stdout and is not captured as a return value. For programmatic access to results, the CrewEvaluator object can be inspected directly.
If an exception occurs during any iteration, the method handles the error gracefully and continues with remaining iterations when possible.

Principle

Principle:CrewAIInc_CrewAI_Performance_Testing

References

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment