Implementation:Microsoft LoRA Run GLUE Training

Overview

Run GLUE Training is a Wrapper Doc for the training pipeline in run_glue.py from the microsoft/LoRA repository. This script wraps the HuggingFace Trainer class to run LoRA fine-tuning on GLUE benchmark tasks with support for distributed training, mixed-precision, and various evaluation strategies.

Source File

File	Lines	Description
`examples/NLU/examples/text-classification/run_glue.py`	222-618	`main()` function: full training pipeline
`examples/NLU/examples/text-classification/run_glue.py`	537-545	Trainer initialization
`examples/NLU/examples/text-classification/run_glue.py`	548-569	Training loop
`examples/NLU/examples/text-classification/run_glue.py`	572-589	Evaluation loop

CLI Signature

python -m torch.distributed.launch --nproc_per_node=<N> \
    examples/text-classification/run_glue.py \
    --model_name_or_path roberta-base \
    --task_name mnli --do_train --do_eval \
    --apply_lora --lora_r 8 --lora_alpha 16 \
    --max_seq_length 512 --per_device_train_batch_size 16 \
    --learning_rate 5e-4 --num_train_epochs 30 \
    --output_dir ./output/roberta_base_mnli

Input / Output

Direction	Description
Input	GLUE dataset (downloaded from HuggingFace Hub or local TSVs) + pretrained model + LoRA configuration
Output	Trained model checkpoint (full state dict) + evaluation metrics + training logs

Pipeline Steps

The main() function (lines 222-618) executes the following sequence:

1. Argument Parsing (Lines 227-233)

Three argument groups are parsed via HfArgumentParser:

parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
model_args, data_args, training_args = parser.parse_args_into_dataclasses()

2. Checkpoint Detection (Lines 239-251)

If output_dir exists and contains a previous checkpoint, training resumes from that checkpoint unless --overwrite_output_dir is set.

3. Dataset Loading (Lines 288-319)

GLUE datasets are loaded via the HuggingFace datasets library:

datasets = load_dataset("glue", data_args.task_name)

The task_to_keys dictionary maps each GLUE task to its sentence column names:

task_to_keys = {
    "cola": ("sentence", None),
    "mnli": ("premise", "hypothesis"),
    "mrpc": ("sentence1", "sentence2"),
    "qnli": ("question", "sentence"),
    "qqp": ("question1", "question2"),
    "rte": ("sentence1", "sentence2"),
    "sst2": ("sentence", None),
    "stsb": ("sentence1", "sentence2"),
    "wnli": ("sentence1", "sentence2"),
}

4. Model Construction (Lines 345-376)

The model is created with LoRA configuration injected into AutoConfig:

config = AutoConfig.from_pretrained(
    model_args.model_name_or_path,
    num_labels=num_labels,
    apply_lora=model_args.apply_lora,
    lora_alpha=model_args.lora_alpha,
    lora_r=model_args.lora_r,
    ...
)
model = AutoModelForSequenceClassification.from_pretrained(
    model_args.model_name_or_path, config=config, ...
)

5. Parameter Freezing (Lines 378-418)

Backbone parameters are frozen; LoRA and classifier parameters remain trainable.

6. Data Preprocessing (Lines 468-500)

Texts are tokenized via the preprocess_function applied with datasets.map():

def preprocess_function(examples):
    args = (
        (examples[sentence1_key],)
        if sentence2_key is None
        else (examples[sentence1_key], examples[sentence2_key])
    )
    result = tokenizer(*args, padding=padding,
                       max_length=max_seq_length, truncation=True)
    return result

7. Trainer Initialization (Lines 537-545)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset if training_args.do_train else None,
    eval_dataset=eval_dataset if training_args.do_eval else None,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

8. Training (Lines 548-569)

train_result = trainer.train(resume_from_checkpoint=checkpoint)
trainer.save_model()
trainer.log_metrics("train", metrics)
trainer.save_metrics("train", metrics)
trainer.save_state()

Typical Shell Script Configs

RoBERTa-base MNLI (roberta_base_mnli.sh)

export num_gpus=8
export CUBLAS_WORKSPACE_CONFIG=":16:8"
export PYTHONHASHSEED=0
python -m torch.distributed.launch --nproc_per_node=$num_gpus \
    examples/text-classification/run_glue.py \
    --model_name_or_path roberta-base \
    --task_name mnli --do_train --do_eval \
    --max_seq_length 512 --per_device_train_batch_size 16 \
    --learning_rate 5e-4 --num_train_epochs 30 \
    --output_dir $output_dir/model --overwrite_output_dir \
    --logging_steps 10 --evaluation_strategy epoch \
    --save_strategy epoch --warmup_ratio 0.06 \
    --apply_lora --lora_r 8 --lora_alpha 16 \
    --seed 0 --weight_decay 0.1

DeBERTa V2 XXL MNLI (deberta_v2_xxlarge_mnli.sh)

export num_gpus=8
export CUBLAS_WORKSPACE_CONFIG=":16:8"
export PYTHONHASHSEED=0
python -m torch.distributed.launch --nproc_per_node=$num_gpus \
    examples/text-classification/run_glue.py \
    --model_name_or_path microsoft/deberta-v2-xxlarge \
    --task_name mnli --do_train --do_eval \
    --max_seq_length 256 --per_device_train_batch_size 8 \
    --learning_rate 1e-4 --num_train_epochs 5 \
    --output_dir $output_dir/model --overwrite_output_dir \
    --logging_steps 10 --fp16 \
    --evaluation_strategy steps --eval_steps 500 \
    --save_strategy steps --save_steps 500 \
    --warmup_steps 1000 --cls_dropout 0.15 \
    --apply_lora --lora_r 16 --lora_alpha 32 \
    --seed 0 --weight_decay 0 --use_deterministic_algorithms

Compute Metrics

Task-specific metrics are computed via the compute_metrics function (lines 515-526):

def compute_metrics(p: EvalPrediction):
    preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
    preds = np.squeeze(preds) if is_regression else np.argmax(preds, axis=1)
    if data_args.task_name is not None:
        result = metric.compute(predictions=preds, references=p.label_ids)
        if len(result) > 1:
            result["combined_score"] = np.mean(list(result.values())).item()
        return result
    elif is_regression:
        return {"mse": ((preds - p.label_ids) ** 2).mean().item()}
    else:
        return {"accuracy": (preds == p.label_ids).astype(np.float32).mean().item()}

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment