Heuristic:Hiyouga LLaMA Factory LoRA DDP Configuration
| Knowledge Sources | |
|---|---|
| Domains | Configuration_Best_Practice, Distributed_Training |
| Last Updated | 2026-02-06 20:00 GMT |
Overview
Configuration best practices for LoRA fine-tuning with DDP including unused parameter detection, label handling, and embedding layer trainability.
Description
When using LoRA with Distributed Data Parallel (DDP) training, LLaMA Factory automatically applies several configuration fixes. The most important is disabling ddp_find_unused_parameters since LoRA freezes most model parameters, making the unused parameter search wasteful and slow. Additionally, the framework warns about a common pitfall: when resizing vocabulary with LoRA, the new embedding tokens are not trainable unless explicitly added to additional_target.
Usage
Use this heuristic whenever running LoRA fine-tuning with multiple GPUs (DDP, DeepSpeed, or FSDP). The framework applies most of these automatically, but the embedding layer warning requires manual action.
The Insight (Rule of Thumb)
- Action 1: Let LLaMA Factory auto-set
ddp_find_unused_parameters=Falsefor LoRA DDP training (automatic). - Action 2: When adding custom tokens with
resize_vocab=Trueand LoRA, addadditional_target=embed_tokens,lm_headto make new token embeddings trainable. - Action 3: For LoRA training,
label_namesis automatically set to["labels"]to work around a HuggingFace Trainer issue. - Trade-off: Setting
ddp_find_unused_parameters=Truewastes significant time scanning all parameters but may be needed for unusual model architectures.
Reasoning
DDP unused parameter detection from src/llamafactory/hparams/parser.py:404-410:
if (
training_args.parallel_mode == ParallelMode.DISTRIBUTED
and training_args.ddp_find_unused_parameters is None
and finetuning_args.finetuning_type == "lora"
):
logger.info_rank0("Set `ddp_find_unused_parameters` to False in DDP training since LoRA is enabled.")
training_args.ddp_find_unused_parameters = False
Embedding trainability warning from src/llamafactory/hparams/parser.py:360-369:
if (
training_args.do_train
and finetuning_args.finetuning_type == "lora"
and model_args.quantization_bit is None
and model_args.resize_vocab
and finetuning_args.additional_target is None
):
logger.warning_rank0(
"Remember to add embedding layers to `additional_target` to make the added tokens trainable."
)
LoRA label_names fix from src/llamafactory/hparams/parser.py:397-399:
if finetuning_args.finetuning_type == "lora":
# https://github.com/huggingface/transformers/blob/v4.50.0/src/transformers/trainer.py#L782
training_args.label_names = training_args.label_names or ["labels"]