Implementation:CrewAIInc CrewAI Crew Test Method
Appearance
Overview
Concrete method for running performance evaluation with LLM-based scoring and tabular reporting provided by the CrewAI framework.
Source
Signature
def test(
self,
n_iterations: int,
eval_llm: str | BaseLLM,
inputs: dict[str, Any] | None = None,
) -> None
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
n_iterations |
int |
Yes | Number of test iterations to run |
eval_llm |
BaseLLM | Yes | The evaluation LLM used to score task outputs (e.g., "openai/gpt-4o" or a BaseLLM instance) |
inputs |
None | No | Optional dictionary of input variables for task interpolation |
I/O
- Input:
n_iterations(number of evaluation loops),eval_llm(LLM used for scoring), and optionalinputs(variable dictionary for task templates) - Output: Printed Rich table displaying per-task scores, averages, execution times, and agent assignments across all iterations. No return value (returns
None).
Internal Behavior
The test method performs the following steps:
- Creates a
CrewEvaluator— Instantiates aCrewEvaluatorobject with the providedeval_llm. This evaluator is responsible for scoring task outputs using the evaluation LLM. - Runs
kickoff()n_iterations times — Executes the full crew workflow in a loop. Each iteration runs the complete task pipeline with the provided inputs. - Evaluator scores each task output 1-10 — After each iteration, the
CrewEvaluatorsends each task's output (along with the task description and expected output) to the evaluation LLM, which returns a numerical score on a 1-10 scale. - Prints Rich table with results — After all iterations complete, a formatted table is rendered using the Rich library, displaying:
- Task descriptions
- Agent assignments
- Per-iteration scores
- Average scores across iterations
- Execution times
Import
from crewai import Crew
Example
from crewai import Crew, Agent, Task, Process
# Define agents and tasks
researcher = Agent(
role="Senior Research Analyst",
goal="Uncover cutting-edge developments in {topic}",
backstory="You are an expert research analyst.",
verbose=True,
)
writer = Agent(
role="Content Writer",
goal="Craft compelling content about {topic}",
backstory="You are a skilled writer.",
verbose=True,
)
research_task = Task(
description="Conduct thorough research about {topic}.",
expected_output="A detailed research report with key findings.",
agent=researcher,
)
writing_task = Task(
description="Write a comprehensive article about {topic}.",
expected_output="A well-structured article suitable for publication.",
agent=writer,
)
# Configure crew
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task],
process=Process.sequential,
verbose=True,
memory=True,
)
# Run performance testing: 5 iterations scored by GPT-4o
crew.test(
n_iterations=5,
eval_llm="openai/gpt-4o",
inputs={"topic": "AI"},
)
# Output: Rich table showing per-task scores (1-10),
# averages, execution times, and agent assignments
# across all 5 iterations.
Example Output
The test method produces a Rich table similar to the following:
| Task | Agent | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Avg Score | Avg Time |
|---|---|---|---|---|---|---|---|---|
| Research about AI | Senior Research Analyst | 8 | 7 | 9 | 8 | 8 | 8.0 | 12.3s |
| Write article about AI | Content Writer | 7 | 8 | 7 | 9 | 8 | 7.8 | 8.7s |
Key Implementation Details
- The
eval_llmparameter accepts either a string identifier (e.g.,"openai/gpt-4o") or aBaseLLMinstance. When a string is provided, CrewAI resolves it to the appropriate LLM provider. - The evaluation LLM is separate from the agents' LLMs. This is by design to avoid self-evaluation bias.
- The Rich table output is printed to stdout and is not captured as a return value. For programmatic access to results, the
CrewEvaluatorobject can be inspected directly. - If an exception occurs during any iteration, the method handles the error gracefully and continues with remaining iterations when possible.
Principle
Principle:CrewAIInc_CrewAI_Performance_Testing
References
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment