Implementation:Microsoft LoRA Run GLUE No Trainer

Overview

run_glue_no_trainer.py is a GLUE benchmark fine-tuning script that uses HuggingFace Accelerate instead of the Trainer API, implementing a manual training loop with explicit optimizer, scheduler, and gradient accumulation control.

Description

This script demonstrates how to fine-tune AutoModelForSequenceClassification on GLUE tasks without relying on the Trainer abstraction. It uses the Accelerator class from HuggingFace Accelerate for device placement, distributed training, and mixed precision support.

Key implementation details:

Task mapping: Defines task_to_keys mapping GLUE task names to their sentence column names:
- Single-sentence: cola ("sentence"), sst2 ("sentence")
- Sentence-pair: mnli ("premise", "hypothesis"), mrpc ("sentence1", "sentence2"), qnli ("question", "sentence"), qqp ("question1", "question2"), rte ("sentence1", "sentence2"), stsb ("sentence1", "sentence2"), wnli ("sentence1", "sentence2")
Manual training loop: Implements explicit forward pass, loss computation, gradient accumulation (loss / gradient_accumulation_steps), backward pass via accelerator.backward(), optimizer step, and learning rate scheduler step.
Optimizer setup: Uses AdamW with parameter group splitting - weight decay applied to all parameters except bias and LayerNorm weights.
Learning rate scheduling: Supports multiple scheduler types via get_scheduler(): linear, cosine, cosine_with_restarts, polynomial, constant, constant_with_warmup.
Argument parsing: Uses standard argparse instead of HfArgumentParser, with explicit parameter definitions for batch size, learning rate, epochs, etc.
Evaluation: Per-epoch evaluation using GLUE task-specific metrics. Predictions gathered across processes via accelerator.gather().
MNLI special handling: After training, performs additional evaluation on the mismatched validation set.
Model saving: Uses accelerator.unwrap_model() and accelerator.save() for distributed-safe model saving.

The STSB task is handled as regression (num_labels=1); all others are classification.

Usage

Use this script when you need to:

Fine-tune on GLUE benchmarks with full control over the training loop
Customize gradient accumulation, optimizer groups, or learning rate scheduling
Use HuggingFace Accelerate for distributed training without the Trainer abstraction
Prototype or debug training behavior with explicit step-by-step control

Code Reference

Source Location

Property	Value
File	`examples/NLU/examples/text-classification/run_glue_no_trainer.py`
Lines	441
Module	`run_glue_no_trainer`
Entry Point	`main()`

Signature/CLI

python run_glue_no_trainer.py \
    --model_name_or_path MODEL_NAME \
    --task_name TASK_NAME \
    [--train_file TRAIN_FILE] \
    [--validation_file VALIDATION_FILE] \
    [--max_length 128] \
    [--pad_to_max_length] \
    [--per_device_train_batch_size 8] \
    [--per_device_eval_batch_size 8] \
    [--learning_rate 5e-5] \
    [--weight_decay 0.0] \
    [--num_train_epochs 3] \
    [--max_train_steps NUM_STEPS] \
    [--gradient_accumulation_steps 1] \
    [--lr_scheduler_type linear] \
    [--num_warmup_steps 0] \
    [--output_dir OUTPUT_DIR] \
    [--seed SEED]

Import

from accelerate import Accelerator
from transformers import (
    AdamW,
    AutoConfig,
    AutoModelForSequenceClassification,
    AutoTokenizer,
    DataCollatorWithPadding,
    PretrainedConfig,
    SchedulerType,
    default_data_collator,
    get_scheduler,
    set_seed,
)
from datasets import load_dataset, load_metric
from torch.utils.data.dataloader import DataLoader

I/O Contract

Inputs

Parameter	Type	Required	Default	Description
`--model_name_or_path`	str	Yes	-	Pretrained model name or path
`--task_name`	str	No	None	GLUE task: cola, mnli, mrpc, qnli, qqp, rte, sst2, stsb, wnli
`--train_file`	str	No	None	Custom CSV/JSON training file
`--validation_file`	str	No	None	Custom CSV/JSON validation file
`--max_length`	int	No	128	Max tokenized sequence length
`--per_device_train_batch_size`	int	No	8	Training batch size per device
`--per_device_eval_batch_size`	int	No	8	Evaluation batch size per device
`--learning_rate`	float	No	5e-5	Peak learning rate
`--weight_decay`	float	No	0.0	Weight decay for non-bias/LayerNorm parameters
`--num_train_epochs`	int	No	3	Number of training epochs
`--max_train_steps`	int	No	None	Max training steps (overrides epochs)
`--gradient_accumulation_steps`	int	No	1	Steps to accumulate before optimizer step
`--lr_scheduler_type`	SchedulerType	No	linear	Learning rate scheduler type
`--num_warmup_steps`	int	No	0	Warmup steps for learning rate scheduler

Outputs

Output	Location	Description
Trained model	`{output_dir}/`	Saved model weights via `save_pretrained()`
Epoch metrics	stdout/logs	Per-epoch evaluation metrics logged to console
MNLI-mm metrics	stdout/logs	Mismatched validation metrics (MNLI task only)

Usage Examples

Fine-tune on SST-2

python examples/NLU/examples/text-classification/run_glue_no_trainer.py \
    --model_name_or_path bert-base-uncased \
    --task_name sst2 \
    --per_device_train_batch_size 32 \
    --learning_rate 2e-5 \
    --num_train_epochs 3 \
    --output_dir /tmp/sst2_no_trainer

Fine-tune on MNLI with gradient accumulation

python examples/NLU/examples/text-classification/run_glue_no_trainer.py \
    --model_name_or_path roberta-base \
    --task_name mnli \
    --per_device_train_batch_size 16 \
    --gradient_accumulation_steps 2 \
    --learning_rate 1e-5 \
    --lr_scheduler_type cosine \
    --num_warmup_steps 500 \
    --num_train_epochs 3 \
    --output_dir /tmp/mnli_no_trainer

Distributed training with Accelerate

accelerate launch examples/NLU/examples/text-classification/run_glue_no_trainer.py \
    --model_name_or_path bert-base-uncased \
    --task_name mrpc \
    --per_device_train_batch_size 16 \
    --learning_rate 5e-5 \
    --num_train_epochs 5 \
    --output_dir /tmp/mrpc_distributed

Related Pages

Environment:Microsoft_LoRA_NLU_Conda_Environment
Implementation:Microsoft_LoRA_Run_XNLI - Multilingual classification using Trainer API
Implementation:Microsoft_LoRA_Run_TF_Text_Classification - TensorFlow-based text classification

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment