Implementation:Microsoft LoRA Lightning Base

Overview

Base PyTorch Lightning module and training utilities for fine-tuning HuggingFace Transformer models across multiple NLP task types.

Description

lightning_base.py provides BaseTransformer, a pl.LightningModule subclass that serves as the foundation for training HuggingFace Transformer models using the PyTorch Lightning framework. It abstracts model initialization (via AutoConfig, AutoTokenizer, and task-specific AutoModel classes), optimizer configuration (AdamW or Adafactor with grouped weight decay), and learning rate scheduling (linear, cosine, cosine with restarts, or polynomial warmup). The module also includes LoggingCallback for logging learning rates and validation/test metrics, add_generic_args for CLI argument registration, and generic_train for standardized trainer initialization with checkpoint saving, early stopping, and distributed training support.

The module maps task types to AutoModel classes via the MODEL_MODES dictionary, supporting base, sequence-classification, question-answering, pretraining, token-classification, language-modeling, summarization, and translation modes.

This is part of the HuggingFace Transformers legacy examples bundled in the Microsoft LoRA repository.

⚠️ DEPRECATED: This file resides in the legacy/ directory and is not actively maintained. Prefer modern equivalents where available.

Usage

Use this module as a base class when building PyTorch Lightning training scripts for NLP tasks. Subclass BaseTransformer and implement get_dataloader(), training_step(), and validation_step()/validation_end() methods for your specific task. Call generic_train() to launch training with standard arguments.

Code Reference

Source Location

Property	Value
File path	`examples/NLU/examples/legacy/pytorch-lightning/lightning_base.py`
Lines	391
Module	`lightning_base`

Key Classes and Functions

Name	Type	Signature / Description
`BaseTransformer`	class	`__init__(self, hparams: argparse.Namespace, num_labels=None, mode="base", config=None, tokenizer=None, model=None, **config_kwargs)`
`BaseTransformer.configure_optimizers`	method	Returns `[optimizer], [scheduler]` with AdamW or Adafactor and selected LR schedule
`BaseTransformer.total_steps`	method	`total_steps() -> int` -- computes total training steps from dataset size, batch size, accumulation, and epochs
`BaseTransformer.on_save_checkpoint`	method	Saves model and tokenizer to `output_dir/best_tfmr` (rank zero only)
`BaseTransformer.add_model_specific_args`	static method	Registers model-specific CLI arguments (model path, LR, scheduler, batch sizes, etc.)
`LoggingCallback`	class	`pl.Callback` that logs LR on batch end and validation/test metrics on epoch end
`add_generic_args`	function	`add_generic_args(parser, root_dir) -> None` -- adds output_dir, fp16, seed, data_dir, etc.
`generic_train`	function	`generic_train(model, args, early_stopping_callback=None, logger=True, extra_callbacks=[], checkpoint_callback=None, logging_callback=None, **extra_train_kwargs)` -- initializes and runs `pl.Trainer`

MODEL_MODES Dictionary

MODEL_MODES = {
    "base": AutoModel,
    "sequence-classification": AutoModelForSequenceClassification,
    "question-answering": AutoModelForQuestionAnswering,
    "pretraining": AutoModelForPreTraining,
    "token-classification": AutoModelForTokenClassification,
    "language-modeling": AutoModelWithLMHead,
    "summarization": AutoModelForSeq2SeqLM,
    "translation": AutoModelForSeq2SeqLM,
}

LR Scheduler Options

arg_to_scheduler = {
    "linear": get_linear_schedule_with_warmup,
    "cosine": get_cosine_schedule_with_warmup,
    "cosine_w_restarts": get_cosine_with_hard_restarts_schedule_with_warmup,
    "polynomial": get_polynomial_decay_schedule_with_warmup,
}

Import Usage

from lightning_base import BaseTransformer, add_generic_args, generic_train, LoggingCallback

I/O Contract

Inputs

Input	Type	Description
`hparams`	`argparse.Namespace`	CLI arguments including `model_name_or_path`, `output_dir`, `learning_rate`, `lr_scheduler`, `warmup_steps`, `train_batch_size`, `max_epochs`, etc.
`num_labels`	`Optional[int]`	Number of output labels (passed to `AutoConfig`)
`mode`	`str`	Key into `MODEL_MODES` dict selecting the AutoModel class (default `"base"`)
`config`	`Optional[PretrainedConfig]`	Pre-built config (if None, loaded from `model_name_or_path`)
`tokenizer`	`Optional[PreTrainedTokenizer]`	Pre-built tokenizer (if None, loaded from `model_name_or_path`)
`model`	`Optional[PreTrainedModel]`	Pre-built model (if None, loaded from `model_name_or_path`)

Outputs

Output	Type	Description
`generic_train()` return	`pl.Trainer`	Configured and (optionally) fitted PyTorch Lightning Trainer
Checkpoint directory	directory	Model and tokenizer saved to `output_dir/best_tfmr/`
Test results file	`test_results.txt`	Written by `LoggingCallback.on_test_end()` to `output_dir/`

Usage Examples

Subclassing BaseTransformer

import argparse
from lightning_base import BaseTransformer, add_generic_args, generic_train

class MyClassifier(BaseTransformer):
    mode = "sequence-classification"

    def __init__(self, hparams):
        super().__init__(hparams, num_labels=2, mode=self.mode)

    def forward(self, **inputs):
        return self.model(**inputs)

    def training_step(self, batch, batch_idx):
        outputs = self(**batch)
        loss = outputs[0]
        return {"loss": loss}

    def validation_step(self, batch, batch_idx):
        outputs = self(**batch)
        loss = outputs[0]
        return {"val_loss": loss}

    def validation_end(self, outputs):
        avg_loss = torch.stack([x["val_loss"] for x in outputs]).mean()
        return {"val_loss": avg_loss}

    def get_dataloader(self, type_path, batch_size, shuffle=False):
        # Implement dataset loading
        ...

parser = argparse.ArgumentParser()
add_generic_args(parser, ".")
BaseTransformer.add_model_specific_args(parser, ".")
parser.add_argument("--gpus", type=int, default=1)
args = parser.parse_args()

model = MyClassifier(args)
trainer = generic_train(model, args)

CLI Usage

python my_task.py \
  --model_name_or_path bert-base-uncased \
  --data_dir /path/to/data \
  --output_dir /path/to/output \
  --do_train \
  --learning_rate 5e-5 \
  --lr_scheduler linear \
  --warmup_steps 500 \
  --num_train_epochs 3 \
  --train_batch_size 32 \
  --gpus 1

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment