Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Heuristic:Hiyouga LLaMA Factory LoRA DDP Configuration

From Leeroopedia




Knowledge Sources
Domains Configuration_Best_Practice, Distributed_Training
Last Updated 2026-02-06 20:00 GMT

Overview

Configuration best practices for LoRA fine-tuning with DDP including unused parameter detection, label handling, and embedding layer trainability.

Description

When using LoRA with Distributed Data Parallel (DDP) training, LLaMA Factory automatically applies several configuration fixes. The most important is disabling ddp_find_unused_parameters since LoRA freezes most model parameters, making the unused parameter search wasteful and slow. Additionally, the framework warns about a common pitfall: when resizing vocabulary with LoRA, the new embedding tokens are not trainable unless explicitly added to additional_target.

Usage

Use this heuristic whenever running LoRA fine-tuning with multiple GPUs (DDP, DeepSpeed, or FSDP). The framework applies most of these automatically, but the embedding layer warning requires manual action.

The Insight (Rule of Thumb)

  • Action 1: Let LLaMA Factory auto-set ddp_find_unused_parameters=False for LoRA DDP training (automatic).
  • Action 2: When adding custom tokens with resize_vocab=True and LoRA, add additional_target=embed_tokens,lm_head to make new token embeddings trainable.
  • Action 3: For LoRA training, label_names is automatically set to ["labels"] to work around a HuggingFace Trainer issue.
  • Trade-off: Setting ddp_find_unused_parameters=True wastes significant time scanning all parameters but may be needed for unusual model architectures.

Reasoning

DDP unused parameter detection from src/llamafactory/hparams/parser.py:404-410:

if (
    training_args.parallel_mode == ParallelMode.DISTRIBUTED
    and training_args.ddp_find_unused_parameters is None
    and finetuning_args.finetuning_type == "lora"
):
    logger.info_rank0("Set `ddp_find_unused_parameters` to False in DDP training since LoRA is enabled.")
    training_args.ddp_find_unused_parameters = False

Embedding trainability warning from src/llamafactory/hparams/parser.py:360-369:

if (
    training_args.do_train
    and finetuning_args.finetuning_type == "lora"
    and model_args.quantization_bit is None
    and model_args.resize_vocab
    and finetuning_args.additional_target is None
):
    logger.warning_rank0(
        "Remember to add embedding layers to `additional_target` to make the added tokens trainable."
    )

LoRA label_names fix from src/llamafactory/hparams/parser.py:397-399:

if finetuning_args.finetuning_type == "lora":
    # https://github.com/huggingface/transformers/blob/v4.50.0/src/transformers/trainer.py#L782
    training_args.label_names = training_args.label_names or ["labels"]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment