Implementation:Lm sys FastChat Peft Get Peft Model
| Knowledge Sources | |
|---|---|
| Domains | NLP, Training, Parameter-Efficient Fine-Tuning |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
Wrapper around LoraConfig, prepare_model_for_kbit_training(), and get_peft_model() from the PEFT library, as used in FastChat's LoRA training script to inject LoRA adapters into a pretrained causal language model.
Description
In fastchat/train/train_lora.py, LoRA adapter injection occurs in three stages:
- A
LoraConfigis constructed from theLoraArgumentsdataclass, specifying rank, alpha, target modules, dropout, bias handling, and task type. - If
q_lora=True,prepare_model_for_kbit_training()is called to freeze the base model, cast layer norms to float32, and optionally enable gradient checkpointing. For multi-GPU setups without DDP, the model is also marked as parallelizable. get_peft_model()wraps the base model with LoRA adapter layers, creating aPeftModelthat routes gradients only to the adapter parameters.
After injection, when flash_attn is enabled, norm layers and embedding layers are explicitly cast to the compute dtype for compatibility. The model also calls enable_input_require_grads() when gradient checkpointing is active to ensure gradients flow through the frozen base model to the LoRA adapters.
Usage
Use this pattern when configuring and applying LoRA adapters to a pretrained model in the FastChat training pipeline.
Code Reference
Source Location
- Repository: FastChat
- File:
fastchat/train/train_lora.py(lines 147-177, LoRA config and model wrapping) - File:
fastchat/train/train_lora.py(lines 55-65,LoraArgumentsdataclass)
Signature
# LoraArguments dataclass (lines 55-65)
@dataclass
class LoraArguments:
lora_r: int = 8
lora_alpha: int = 16
lora_dropout: float = 0.05
lora_target_modules: typing.List[str] = field(
default_factory=lambda: ["q_proj", "v_proj"]
)
lora_weight_path: str = ""
lora_bias: str = "none"
q_lora: bool = False
# LoRA injection logic (lines 147-177)
lora_config = LoraConfig(
r=lora_args.lora_r,
lora_alpha=lora_args.lora_alpha,
target_modules=lora_args.lora_target_modules,
lora_dropout=lora_args.lora_dropout,
bias=lora_args.lora_bias,
task_type="CAUSAL_LM",
)
if lora_args.q_lora:
model = prepare_model_for_kbit_training(
model, use_gradient_checkpointing=training_args.gradient_checkpointing
)
if not ddp and torch.cuda.device_count() > 1:
model.is_parallelizable = True
model.model_parallel = True
model = get_peft_model(model, lora_config)
if training_args.flash_attn:
for name, module in model.named_modules():
if "norm" in name:
module = module.to(compute_dtype)
if "lm_head" in name or "embed_tokens" in name:
if hasattr(module, "weight"):
module = module.to(compute_dtype)
if training_args.gradient_checkpointing:
model.enable_input_require_grads()
Import
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | PreTrainedModel |
Yes | Base causal language model (optionally quantized) to wrap with LoRA adapters |
| lora_args.lora_r | int |
No | Rank of the LoRA decomposition matrices; default: 8
|
| lora_args.lora_alpha | int |
No | Scaling factor for LoRA; effective scaling is alpha / r; default: 16
|
| lora_args.lora_target_modules | List[str] |
No | Names of linear modules to apply LoRA to; default: ["q_proj", "v_proj"]
|
| lora_args.lora_dropout | float |
No | Dropout probability for LoRA layers; default: 0.05
|
| lora_args.lora_bias | str |
No | Bias training strategy: "none", "all", or "lora_only"; default: "none"
|
| lora_args.q_lora | bool |
No | Whether to prepare the model for k-bit training; default: False
|
| training_args.gradient_checkpointing | bool |
No | Enable gradient checkpointing for memory savings; default: False
|
| training_args.flash_attn | bool |
No | Enable FlashAttention monkey patch; default: False
|
Outputs
| Name | Type | Description |
|---|---|---|
| model | PeftModel |
Model wrapped with LoRA adapters; only adapter parameters are trainable |
Usage Examples
Basic LoRA Injection
from peft import LoraConfig, get_peft_model
lora_config = LoraConfig(
r=8,
lora_alpha=16,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Output: trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.06%
QLoRA Injection (with k-bit Preparation)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
# Prepare quantized model for training
model = prepare_model_for_kbit_training(
model, use_gradient_checkpointing=True
)
lora_config = LoraConfig(
r=8,
lora_alpha=16,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)
model.enable_input_require_grads()
CLI Invocation with Custom LoRA Parameters
deepspeed fastchat/train/train_lora.py \
--model_name_or_path meta-llama/Llama-2-13b-hf \
--data_path data/dummy_conversation.json \
--bf16 True \
--output_dir output_lora \
--lora_r 16 \
--lora_alpha 32 \
--lora_dropout 0.1 \
--deepspeed playground/deepspeed_config_s2.json