Principle:CrewAIInc CrewAI Training Execution

Overview

An iterative training loop that executes the crew multiple times with human-in-the-loop feedback, collecting and persisting improvement data for agent behavior refinement.

Description

Training Execution runs the crew workflow n times, enabling human_input on all tasks so a human reviewer can provide feedback on each task's output. This feedback is persisted to a training file using a pickle-based handler (CrewTrainingHandler). After all iterations complete, a TaskEvaluator analyzes the collected training data per agent and generates refined prompts and behaviors.

The training process enforces two critical constraints:

human_input=True on all tasks — Every task is configured to pause after execution and solicit human feedback. This ensures the human reviewer can assess each individual task output, providing fine-grained training signals rather than only evaluating the final crew output.
delegation disabled on all agents — During training, no agent is allowed to delegate work to another agent. This ensures that each agent handles its assigned work directly, producing clean training signals that can be unambiguously attributed to a specific agent. If delegation were allowed, it would be unclear which agent's behavior should be adjusted based on the feedback.

The iterative nature of training is essential. A single execution provides only one data point. Multiple iterations allow the system to:

Collect diverse feedback across different execution paths
Identify consistent patterns in agent strengths and weaknesses
Build a robust dataset from which the TaskEvaluator can generate meaningful improvements

Theoretical Basis

This principle is grounded in Reinforcement Learning from Human Feedback (RLHF), adapted to multi-agent systems. In traditional RLHF, a language model generates outputs that are ranked or scored by human evaluators, and this feedback is used to fine-tune the model. In CrewAI's adaptation:

The language model is replaced by a crew of agents executing a workflow
The human evaluator provides feedback after each task execution
The fine-tuning step is replaced by the TaskEvaluator, which analyzes feedback and generates improved agent configurations

The key insight is that RLHF can be applied at the prompt engineering level rather than at the model weight level. Instead of adjusting model parameters, the system adjusts agent prompts, tool usage patterns, and task descriptions based on accumulated human feedback.

RLHF Component	Traditional Implementation	CrewAI Adaptation
Model	Single LLM	Multi-agent crew
Human Feedback	Preference rankings	Per-task textual feedback
Optimization	Weight updates	Prompt/behavior refinement via TaskEvaluator
Iteration	Training epochs	n_iterations of crew execution

Training Data Persistence

Training data is persisted using the CrewTrainingHandler, which writes to a pickle file specified by the filename parameter. The handler stores per-agent training records keyed by agent role, enabling the TaskEvaluator to analyze each agent's performance independently.

The persistence format supports:

Incremental accumulation — Each iteration appends to existing training data
Per-agent isolation — Feedback is attributed to specific agents
Cross-session continuity — Training can be resumed across separate program invocations

Relationship to Workflow

Training Execution depends on Baseline Crew Configuration to provide a properly configured crew. The training data it produces feeds into Iterative Improvement through the memory subsystem, and the results can be validated using Performance Testing.

Implementation

Implementation:CrewAIInc_CrewAI_Crew_Train_Method

References

crewAI GitHub Repository

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment