Overview
Training and evaluation process manager for the LLaMA-Factory WebUI that validates configurations, constructs argument dictionaries from UI state, launches training as a subprocess, monitors progress, and manages configuration persistence.
Description
runner.py implements the Runner class, which serves as the execution backbone of the WebUI. It manages the full lifecycle of training and evaluation jobs:
- __init__: Initializes the runner with a reference to the Manager (for UI element access), demo mode flag, and state tracking (trainer subprocess, running status, abort flag).
- _initialize: Validates the UI configuration before launching. Checks for required fields (model name, model path, dataset, output directory), valid JSON in extra_args, reward model for PPO stage, and demo mode restrictions. Returns a localized error message or empty string on success.
- _parse_train_args: Builds a comprehensive training argument dictionary from UI state. Maps all UI elements to their corresponding CLI argument names including:
- Core training params (stage, model path, dataset, hyperparameters)
- Finetuning type-specific configs (LoRA, freeze, full)
- RLHF-specific configs (PPO reward model, DPO/KTO preferences)
- Multimodal configs (vision tower freezing, pixel constraints)
- Optimizer configs (GaLore, APOLLO, BAdam, SwanLab)
- DeepSpeed configs (stage, offload)
- Checkpoint and quantization handling
- _parse_eval_args: Builds evaluation argument dictionary with model, dataset, and generation parameters.
- _preview: Generates a command preview without executing, showing the equivalent CLI command.
- _launch: The main execution entry point that:
- Validates configuration via _initialize
- Saves the current config to LLAMABOARD_CONFIG
- Sets up environment variables (LLAMABOARD_ENABLED, LLAMABOARD_WORKDIR, FORCE_TORCHRUN for DeepSpeed)
- Launches llamafactory-cli train as a subprocess using Popen (explicitly avoiding shell=True for security)
- Delegates to monitor() for progress tracking
- monitor: A generator function that polls the subprocess for progress, reading trainer logs and running info. Yields UI updates including output log, progress bar, loss viewer plot, and SwanLab link. Handles abort via process tree termination and reports final status (success, abort, or failure with exit code and stderr).
- save_args / load_args: Configuration persistence to/from YAML files in the config directory.
- check_output_dir: Checks if an output directory already exists and loads its saved configuration if found, enabling training resumption.
- _build_config_dict: Extracts the current UI state into a serializable dictionary, excluding volatile fields (language, model path, output dir, config path).
- set_abort: Sets the abort flag and terminates the trainer process tree.
- _finalize: Cleans cached memory (torch_gc), resets runner state, and displays finish/abort info.
Usage
The Runner is instantiated by the WebUI Engine and connected to the training tab's action buttons. The preview_train, run_train, run_eval, preview_eval methods are bound to button click events. The monitor method is a Gradio generator that yields progressive UI updates during training.
Code Reference
Source Location
Signature
class Runner:
def __init__(self, manager: "Manager", demo_mode: bool = False) -> None
def set_abort(self) -> None
def _initialize(self, data: dict[Component, Any], do_train: bool, from_preview: bool) -> str
def _finalize(self, lang: str, finish_info: str) -> None
def _parse_train_args(self, data: dict[Component, Any]) -> dict[str, Any]
def _parse_eval_args(self, data: dict[Component, Any]) -> dict[str, Any]
def _preview(self, data, do_train: bool) -> Generator[dict[Component, str], None, None]
def _launch(self, data, do_train: bool) -> Generator[dict[Component, Any], None, None]
def _build_config_dict(self, data: dict[Component, Any]) -> dict[str, Any]
def preview_train(self, data) -> Generator
def preview_eval(self, data) -> Generator
def run_train(self, data) -> Generator
def run_eval(self, data) -> Generator
def monitor(self) -> Generator
def save_args(self, data) -> dict
def load_args(self, lang: str, config_path: str) -> dict
def check_output_dir(self, lang: str, model_name: str, finetuning_type: str, output_dir: str) -> dict
Import
from llamafactory.webui.runner import Runner
I/O Contract
Inputs
Runner.__init__
| Name |
Type |
Required |
Description
|
| manager |
Manager |
Yes |
The WebUI Manager instance for accessing UI elements by ID
|
| demo_mode |
bool |
No |
If True, prevents actual training execution (preview only); default False
|
Runner._parse_train_args
| Name |
Type |
Required |
Description
|
| data |
dict[Component, Any] |
Yes |
Dictionary mapping Gradio Component instances to their current values, representing the full UI state
|
Runner.load_args
| Name |
Type |
Required |
Description
|
| lang |
str |
Yes |
Current language code for localized messages ("en", "ru", "zh", "ko", "ja")
|
| config_path |
str |
Yes |
Name of the YAML configuration file to load from the config directory
|
Outputs
Runner._parse_train_args
| Name |
Type |
Description
|
| args |
dict[str, Any] |
Complete training argument dictionary ready for llamafactory-cli; includes all model, data, training, and optimizer configuration
|
Runner.monitor
| Name |
Type |
Description
|
| update_dict |
dict[Component, Any] |
Yielded dictionary mapping UI components to their updated values: output_box (log text), progress_bar (training progress), loss_viewer (loss plot), swanlab_link (experiment link)
|
Runner.load_args
| Name |
Type |
Description
|
| output_dict |
dict[Component, Any] |
Dictionary mapping UI components to restored configuration values, or error message if config not found
|
Usage Examples
# Initialize a runner (typically done by the Engine)
from llamafactory.webui.runner import Runner
runner = Runner(manager=engine.manager, demo_mode=False)
# Preview training command (called by button click)
for update in runner.preview_train(data):
# update is {output_box: "llamafactory-cli train ..."}
pass
# Launch training (called by button click)
for update in runner.run_train(data):
# update contains {output_box: log, progress_bar: progress, loss_viewer: plot}
pass
# Save and load configuration
runner.save_args(data)
result = runner.load_args(lang="en", config_path="my_config.yaml")
# Abort running training
runner.set_abort()
Related Pages