Implementation:Microsoft LoRA Lightning Base
Template:Implementation metadata
Overview
Base PyTorch Lightning module and training utilities for fine-tuning HuggingFace Transformer models across multiple NLP task types.
Description
lightning_base.py provides BaseTransformer, a pl.LightningModule subclass that serves as the foundation for training HuggingFace Transformer models using the PyTorch Lightning framework. It abstracts model initialization (via AutoConfig, AutoTokenizer, and task-specific AutoModel classes), optimizer configuration (AdamW or Adafactor with grouped weight decay), and learning rate scheduling (linear, cosine, cosine with restarts, or polynomial warmup). The module also includes LoggingCallback for logging learning rates and validation/test metrics, add_generic_args for CLI argument registration, and generic_train for standardized trainer initialization with checkpoint saving, early stopping, and distributed training support.
The module maps task types to AutoModel classes via the MODEL_MODES dictionary, supporting base, sequence-classification, question-answering, pretraining, token-classification, language-modeling, summarization, and translation modes.
This is part of the HuggingFace Transformers legacy examples bundled in the Microsoft LoRA repository.
⚠️ DEPRECATED: This file resides in the legacy/ directory and is not actively maintained. Prefer modern equivalents where available.
Usage
Use this module as a base class when building PyTorch Lightning training scripts for NLP tasks. Subclass BaseTransformer and implement get_dataloader(), training_step(), and validation_step()/validation_end() methods for your specific task. Call generic_train() to launch training with standard arguments.
Code Reference
Source Location
| Property | Value |
|---|---|
| File path | examples/NLU/examples/legacy/pytorch-lightning/lightning_base.py
|
| Lines | 391 |
| Module | lightning_base
|
Key Classes and Functions
| Name | Type | Signature / Description |
|---|---|---|
BaseTransformer |
class | __init__(self, hparams: argparse.Namespace, num_labels=None, mode="base", config=None, tokenizer=None, model=None, **config_kwargs)
|
BaseTransformer.configure_optimizers |
method | Returns [optimizer], [scheduler] with AdamW or Adafactor and selected LR schedule
|
BaseTransformer.total_steps |
method | total_steps() -> int -- computes total training steps from dataset size, batch size, accumulation, and epochs
|
BaseTransformer.on_save_checkpoint |
method | Saves model and tokenizer to output_dir/best_tfmr (rank zero only)
|
BaseTransformer.add_model_specific_args |
static method | Registers model-specific CLI arguments (model path, LR, scheduler, batch sizes, etc.) |
LoggingCallback |
class | pl.Callback that logs LR on batch end and validation/test metrics on epoch end
|
add_generic_args |
function | add_generic_args(parser, root_dir) -> None -- adds output_dir, fp16, seed, data_dir, etc.
|
generic_train |
function | generic_train(model, args, early_stopping_callback=None, logger=True, extra_callbacks=[], checkpoint_callback=None, logging_callback=None, **extra_train_kwargs) -- initializes and runs pl.Trainer
|
MODEL_MODES Dictionary
MODEL_MODES = {
"base": AutoModel,
"sequence-classification": AutoModelForSequenceClassification,
"question-answering": AutoModelForQuestionAnswering,
"pretraining": AutoModelForPreTraining,
"token-classification": AutoModelForTokenClassification,
"language-modeling": AutoModelWithLMHead,
"summarization": AutoModelForSeq2SeqLM,
"translation": AutoModelForSeq2SeqLM,
}
LR Scheduler Options
arg_to_scheduler = {
"linear": get_linear_schedule_with_warmup,
"cosine": get_cosine_schedule_with_warmup,
"cosine_w_restarts": get_cosine_with_hard_restarts_schedule_with_warmup,
"polynomial": get_polynomial_decay_schedule_with_warmup,
}
Import Usage
from lightning_base import BaseTransformer, add_generic_args, generic_train, LoggingCallback
I/O Contract
Inputs
| Input | Type | Description |
|---|---|---|
hparams |
argparse.Namespace |
CLI arguments including model_name_or_path, output_dir, learning_rate, lr_scheduler, warmup_steps, train_batch_size, max_epochs, etc.
|
num_labels |
Optional[int] |
Number of output labels (passed to AutoConfig)
|
mode |
str |
Key into MODEL_MODES dict selecting the AutoModel class (default "base")
|
config |
Optional[PretrainedConfig] |
Pre-built config (if None, loaded from model_name_or_path)
|
tokenizer |
Optional[PreTrainedTokenizer] |
Pre-built tokenizer (if None, loaded from model_name_or_path)
|
model |
Optional[PreTrainedModel] |
Pre-built model (if None, loaded from model_name_or_path)
|
Outputs
| Output | Type | Description |
|---|---|---|
generic_train() return |
pl.Trainer |
Configured and (optionally) fitted PyTorch Lightning Trainer |
| Checkpoint directory | directory | Model and tokenizer saved to output_dir/best_tfmr/
|
| Test results file | test_results.txt |
Written by LoggingCallback.on_test_end() to output_dir/
|
Usage Examples
Subclassing BaseTransformer
import argparse
from lightning_base import BaseTransformer, add_generic_args, generic_train
class MyClassifier(BaseTransformer):
mode = "sequence-classification"
def __init__(self, hparams):
super().__init__(hparams, num_labels=2, mode=self.mode)
def forward(self, **inputs):
return self.model(**inputs)
def training_step(self, batch, batch_idx):
outputs = self(**batch)
loss = outputs[0]
return {"loss": loss}
def validation_step(self, batch, batch_idx):
outputs = self(**batch)
loss = outputs[0]
return {"val_loss": loss}
def validation_end(self, outputs):
avg_loss = torch.stack([x["val_loss"] for x in outputs]).mean()
return {"val_loss": avg_loss}
def get_dataloader(self, type_path, batch_size, shuffle=False):
# Implement dataset loading
...
parser = argparse.ArgumentParser()
add_generic_args(parser, ".")
BaseTransformer.add_model_specific_args(parser, ".")
parser.add_argument("--gpus", type=int, default=1)
args = parser.parse_args()
model = MyClassifier(args)
trainer = generic_train(model, args)
CLI Usage
python my_task.py \ --model_name_or_path bert-base-uncased \ --data_dir /path/to/data \ --output_dir /path/to/output \ --do_train \ --learning_rate 5e-5 \ --lr_scheduler linear \ --warmup_steps 500 \ --num_train_epochs 3 \ --train_batch_size 32 \ --gpus 1