Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:CrewAIInc CrewAI Crew Test Method

From Leeroopedia

Overview

Concrete method for running performance evaluation with LLM-based scoring and tabular reporting provided by the CrewAI framework.

Source

src/crewai/crew.py:L1796-1842

Signature

def test(
    self,
    n_iterations: int,
    eval_llm: str | BaseLLM,
    inputs: dict[str, Any] | None = None,
) -> None

Parameters

Parameter Type Required Description
n_iterations int Yes Number of test iterations to run
eval_llm BaseLLM Yes The evaluation LLM used to score task outputs (e.g., "openai/gpt-4o" or a BaseLLM instance)
inputs None No Optional dictionary of input variables for task interpolation

I/O

  • Input: n_iterations (number of evaluation loops), eval_llm (LLM used for scoring), and optional inputs (variable dictionary for task templates)
  • Output: Printed Rich table displaying per-task scores, averages, execution times, and agent assignments across all iterations. No return value (returns None).

Internal Behavior

The test method performs the following steps:

  1. Creates a CrewEvaluator — Instantiates a CrewEvaluator object with the provided eval_llm. This evaluator is responsible for scoring task outputs using the evaluation LLM.
  2. Runs kickoff() n_iterations times — Executes the full crew workflow in a loop. Each iteration runs the complete task pipeline with the provided inputs.
  3. Evaluator scores each task output 1-10 — After each iteration, the CrewEvaluator sends each task's output (along with the task description and expected output) to the evaluation LLM, which returns a numerical score on a 1-10 scale.
  4. Prints Rich table with results — After all iterations complete, a formatted table is rendered using the Rich library, displaying:
    • Task descriptions
    • Agent assignments
    • Per-iteration scores
    • Average scores across iterations
    • Execution times

Import

from crewai import Crew

Example

from crewai import Crew, Agent, Task, Process

# Define agents and tasks
researcher = Agent(
    role="Senior Research Analyst",
    goal="Uncover cutting-edge developments in {topic}",
    backstory="You are an expert research analyst.",
    verbose=True,
)

writer = Agent(
    role="Content Writer",
    goal="Craft compelling content about {topic}",
    backstory="You are a skilled writer.",
    verbose=True,
)

research_task = Task(
    description="Conduct thorough research about {topic}.",
    expected_output="A detailed research report with key findings.",
    agent=researcher,
)

writing_task = Task(
    description="Write a comprehensive article about {topic}.",
    expected_output="A well-structured article suitable for publication.",
    agent=writer,
)

# Configure crew
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    process=Process.sequential,
    verbose=True,
    memory=True,
)

# Run performance testing: 5 iterations scored by GPT-4o
crew.test(
    n_iterations=5,
    eval_llm="openai/gpt-4o",
    inputs={"topic": "AI"},
)
# Output: Rich table showing per-task scores (1-10),
# averages, execution times, and agent assignments
# across all 5 iterations.

Example Output

The test method produces a Rich table similar to the following:

Task Agent Run 1 Run 2 Run 3 Run 4 Run 5 Avg Score Avg Time
Research about AI Senior Research Analyst 8 7 9 8 8 8.0 12.3s
Write article about AI Content Writer 7 8 7 9 8 7.8 8.7s

Key Implementation Details

  • The eval_llm parameter accepts either a string identifier (e.g., "openai/gpt-4o") or a BaseLLM instance. When a string is provided, CrewAI resolves it to the appropriate LLM provider.
  • The evaluation LLM is separate from the agents' LLMs. This is by design to avoid self-evaluation bias.
  • The Rich table output is printed to stdout and is not captured as a return value. For programmatic access to results, the CrewEvaluator object can be inspected directly.
  • If an exception occurs during any iteration, the method handles the error gracefully and continues with remaining iterations when possible.

Principle

Principle:CrewAIInc_CrewAI_Performance_Testing

References

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment