Implementation:Microsoft LoRA Run GLUE No Trainer
Template:Implementation metadata
Overview
run_glue_no_trainer.py is a GLUE benchmark fine-tuning script that uses HuggingFace Accelerate instead of the Trainer API, implementing a manual training loop with explicit optimizer, scheduler, and gradient accumulation control.
Description
This script demonstrates how to fine-tune AutoModelForSequenceClassification on GLUE tasks without relying on the Trainer abstraction. It uses the Accelerator class from HuggingFace Accelerate for device placement, distributed training, and mixed precision support.
Key implementation details:
- Task mapping: Defines
task_to_keysmapping GLUE task names to their sentence column names:- Single-sentence:
cola("sentence"),sst2("sentence") - Sentence-pair:
mnli("premise", "hypothesis"),mrpc("sentence1", "sentence2"),qnli("question", "sentence"),qqp("question1", "question2"),rte("sentence1", "sentence2"),stsb("sentence1", "sentence2"),wnli("sentence1", "sentence2")
- Single-sentence:
- Manual training loop: Implements explicit forward pass, loss computation, gradient accumulation (
loss / gradient_accumulation_steps), backward pass viaaccelerator.backward(), optimizer step, and learning rate scheduler step. - Optimizer setup: Uses AdamW with parameter group splitting - weight decay applied to all parameters except bias and LayerNorm weights.
- Learning rate scheduling: Supports multiple scheduler types via
get_scheduler(): linear, cosine, cosine_with_restarts, polynomial, constant, constant_with_warmup. - Argument parsing: Uses standard
argparseinstead ofHfArgumentParser, with explicit parameter definitions for batch size, learning rate, epochs, etc. - Evaluation: Per-epoch evaluation using GLUE task-specific metrics. Predictions gathered across processes via
accelerator.gather(). - MNLI special handling: After training, performs additional evaluation on the mismatched validation set.
- Model saving: Uses
accelerator.unwrap_model()andaccelerator.save()for distributed-safe model saving.
The STSB task is handled as regression (num_labels=1); all others are classification.
Usage
Use this script when you need to:
- Fine-tune on GLUE benchmarks with full control over the training loop
- Customize gradient accumulation, optimizer groups, or learning rate scheduling
- Use HuggingFace Accelerate for distributed training without the Trainer abstraction
- Prototype or debug training behavior with explicit step-by-step control
Code Reference
Source Location
| Property | Value |
|---|---|
| File | examples/NLU/examples/text-classification/run_glue_no_trainer.py
|
| Lines | 441 |
| Module | run_glue_no_trainer
|
| Entry Point | main()
|
Signature/CLI
python run_glue_no_trainer.py \
--model_name_or_path MODEL_NAME \
--task_name TASK_NAME \
[--train_file TRAIN_FILE] \
[--validation_file VALIDATION_FILE] \
[--max_length 128] \
[--pad_to_max_length] \
[--per_device_train_batch_size 8] \
[--per_device_eval_batch_size 8] \
[--learning_rate 5e-5] \
[--weight_decay 0.0] \
[--num_train_epochs 3] \
[--max_train_steps NUM_STEPS] \
[--gradient_accumulation_steps 1] \
[--lr_scheduler_type linear] \
[--num_warmup_steps 0] \
[--output_dir OUTPUT_DIR] \
[--seed SEED]
Import
from accelerate import Accelerator
from transformers import (
AdamW,
AutoConfig,
AutoModelForSequenceClassification,
AutoTokenizer,
DataCollatorWithPadding,
PretrainedConfig,
SchedulerType,
default_data_collator,
get_scheduler,
set_seed,
)
from datasets import load_dataset, load_metric
from torch.utils.data.dataloader import DataLoader
I/O Contract
Inputs
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--model_name_or_path |
str | Yes | - | Pretrained model name or path |
--task_name |
str | No | None | GLUE task: cola, mnli, mrpc, qnli, qqp, rte, sst2, stsb, wnli |
--train_file |
str | No | None | Custom CSV/JSON training file |
--validation_file |
str | No | None | Custom CSV/JSON validation file |
--max_length |
int | No | 128 | Max tokenized sequence length |
--per_device_train_batch_size |
int | No | 8 | Training batch size per device |
--per_device_eval_batch_size |
int | No | 8 | Evaluation batch size per device |
--learning_rate |
float | No | 5e-5 | Peak learning rate |
--weight_decay |
float | No | 0.0 | Weight decay for non-bias/LayerNorm parameters |
--num_train_epochs |
int | No | 3 | Number of training epochs |
--max_train_steps |
int | No | None | Max training steps (overrides epochs) |
--gradient_accumulation_steps |
int | No | 1 | Steps to accumulate before optimizer step |
--lr_scheduler_type |
SchedulerType | No | linear | Learning rate scheduler type |
--num_warmup_steps |
int | No | 0 | Warmup steps for learning rate scheduler |
Outputs
| Output | Location | Description |
|---|---|---|
| Trained model | {output_dir}/ |
Saved model weights via save_pretrained()
|
| Epoch metrics | stdout/logs | Per-epoch evaluation metrics logged to console |
| MNLI-mm metrics | stdout/logs | Mismatched validation metrics (MNLI task only) |
Usage Examples
Fine-tune on SST-2
python examples/NLU/examples/text-classification/run_glue_no_trainer.py \
--model_name_or_path bert-base-uncased \
--task_name sst2 \
--per_device_train_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 3 \
--output_dir /tmp/sst2_no_trainer
Fine-tune on MNLI with gradient accumulation
python examples/NLU/examples/text-classification/run_glue_no_trainer.py \
--model_name_or_path roberta-base \
--task_name mnli \
--per_device_train_batch_size 16 \
--gradient_accumulation_steps 2 \
--learning_rate 1e-5 \
--lr_scheduler_type cosine \
--num_warmup_steps 500 \
--num_train_epochs 3 \
--output_dir /tmp/mnli_no_trainer
Distributed training with Accelerate
accelerate launch examples/NLU/examples/text-classification/run_glue_no_trainer.py \
--model_name_or_path bert-base-uncased \
--task_name mrpc \
--per_device_train_batch_size 16 \
--learning_rate 5e-5 \
--num_train_epochs 5 \
--output_dir /tmp/mrpc_distributed
Related Pages
- Environment:Microsoft_LoRA_NLU_Conda_Environment
- Implementation:Microsoft_LoRA_Run_XNLI - Multilingual classification using Trainer API
- Implementation:Microsoft_LoRA_Run_TF_Text_Classification - TensorFlow-based text classification