Implementation:Microsoft LoRA Run GLUE Training
Overview
Run GLUE Training is a Wrapper Doc for the training pipeline in run_glue.py from the microsoft/LoRA repository. This script wraps the HuggingFace Trainer class to run LoRA fine-tuning on GLUE benchmark tasks with support for distributed training, mixed-precision, and various evaluation strategies.
Source File
| File | Lines | Description |
|---|---|---|
examples/NLU/examples/text-classification/run_glue.py |
222-618 | main() function: full training pipeline
|
examples/NLU/examples/text-classification/run_glue.py |
537-545 | Trainer initialization |
examples/NLU/examples/text-classification/run_glue.py |
548-569 | Training loop |
examples/NLU/examples/text-classification/run_glue.py |
572-589 | Evaluation loop |
CLI Signature
python -m torch.distributed.launch --nproc_per_node=<N> \
examples/text-classification/run_glue.py \
--model_name_or_path roberta-base \
--task_name mnli --do_train --do_eval \
--apply_lora --lora_r 8 --lora_alpha 16 \
--max_seq_length 512 --per_device_train_batch_size 16 \
--learning_rate 5e-4 --num_train_epochs 30 \
--output_dir ./output/roberta_base_mnli
Input / Output
| Direction | Description |
|---|---|
| Input | GLUE dataset (downloaded from HuggingFace Hub or local TSVs) + pretrained model + LoRA configuration |
| Output | Trained model checkpoint (full state dict) + evaluation metrics + training logs |
Pipeline Steps
The main() function (lines 222-618) executes the following sequence:
1. Argument Parsing (Lines 227-233)
Three argument groups are parsed via HfArgumentParser:
parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
2. Checkpoint Detection (Lines 239-251)
If output_dir exists and contains a previous checkpoint, training resumes from that checkpoint unless --overwrite_output_dir is set.
3. Dataset Loading (Lines 288-319)
GLUE datasets are loaded via the HuggingFace datasets library:
datasets = load_dataset("glue", data_args.task_name)
The task_to_keys dictionary maps each GLUE task to its sentence column names:
task_to_keys = {
"cola": ("sentence", None),
"mnli": ("premise", "hypothesis"),
"mrpc": ("sentence1", "sentence2"),
"qnli": ("question", "sentence"),
"qqp": ("question1", "question2"),
"rte": ("sentence1", "sentence2"),
"sst2": ("sentence", None),
"stsb": ("sentence1", "sentence2"),
"wnli": ("sentence1", "sentence2"),
}
4. Model Construction (Lines 345-376)
The model is created with LoRA configuration injected into AutoConfig:
config = AutoConfig.from_pretrained(
model_args.model_name_or_path,
num_labels=num_labels,
apply_lora=model_args.apply_lora,
lora_alpha=model_args.lora_alpha,
lora_r=model_args.lora_r,
...
)
model = AutoModelForSequenceClassification.from_pretrained(
model_args.model_name_or_path, config=config, ...
)
5. Parameter Freezing (Lines 378-418)
Backbone parameters are frozen; LoRA and classifier parameters remain trainable.
6. Data Preprocessing (Lines 468-500)
Texts are tokenized via the preprocess_function applied with datasets.map():
def preprocess_function(examples):
args = (
(examples[sentence1_key],)
if sentence2_key is None
else (examples[sentence1_key], examples[sentence2_key])
)
result = tokenizer(*args, padding=padding,
max_length=max_seq_length, truncation=True)
return result
7. Trainer Initialization (Lines 537-545)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset if training_args.do_train else None,
eval_dataset=eval_dataset if training_args.do_eval else None,
compute_metrics=compute_metrics,
tokenizer=tokenizer,
data_collator=data_collator,
)
8. Training (Lines 548-569)
train_result = trainer.train(resume_from_checkpoint=checkpoint)
trainer.save_model()
trainer.log_metrics("train", metrics)
trainer.save_metrics("train", metrics)
trainer.save_state()
Typical Shell Script Configs
RoBERTa-base MNLI (roberta_base_mnli.sh)
export num_gpus=8
export CUBLAS_WORKSPACE_CONFIG=":16:8"
export PYTHONHASHSEED=0
python -m torch.distributed.launch --nproc_per_node=$num_gpus \
examples/text-classification/run_glue.py \
--model_name_or_path roberta-base \
--task_name mnli --do_train --do_eval \
--max_seq_length 512 --per_device_train_batch_size 16 \
--learning_rate 5e-4 --num_train_epochs 30 \
--output_dir $output_dir/model --overwrite_output_dir \
--logging_steps 10 --evaluation_strategy epoch \
--save_strategy epoch --warmup_ratio 0.06 \
--apply_lora --lora_r 8 --lora_alpha 16 \
--seed 0 --weight_decay 0.1
DeBERTa V2 XXL MNLI (deberta_v2_xxlarge_mnli.sh)
export num_gpus=8
export CUBLAS_WORKSPACE_CONFIG=":16:8"
export PYTHONHASHSEED=0
python -m torch.distributed.launch --nproc_per_node=$num_gpus \
examples/text-classification/run_glue.py \
--model_name_or_path microsoft/deberta-v2-xxlarge \
--task_name mnli --do_train --do_eval \
--max_seq_length 256 --per_device_train_batch_size 8 \
--learning_rate 1e-4 --num_train_epochs 5 \
--output_dir $output_dir/model --overwrite_output_dir \
--logging_steps 10 --fp16 \
--evaluation_strategy steps --eval_steps 500 \
--save_strategy steps --save_steps 500 \
--warmup_steps 1000 --cls_dropout 0.15 \
--apply_lora --lora_r 16 --lora_alpha 32 \
--seed 0 --weight_decay 0 --use_deterministic_algorithms
Compute Metrics
Task-specific metrics are computed via the compute_metrics function (lines 515-526):
def compute_metrics(p: EvalPrediction):
preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
preds = np.squeeze(preds) if is_regression else np.argmax(preds, axis=1)
if data_args.task_name is not None:
result = metric.compute(predictions=preds, references=p.label_ids)
if len(result) > 1:
result["combined_score"] = np.mean(list(result.values())).item()
return result
elif is_regression:
return {"mse": ((preds - p.label_ids) ** 2).mean().item()}
else:
return {"accuracy": (preds == p.label_ids).astype(np.float32).mean().item()}