Implementation:Microsoft DeepSpeedExamples GLUE Classifier BERT Large
| Knowledge Sources | |
|---|---|
| Domains | NLP, Fine-tuning |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
BERT-large fine-tuning script for all nine GLUE benchmark tasks with DeepSpeed distributed training and checkpoint management for long-running training.
Description
run_glue_classifier_bert_large.py extends the BERT-base GLUE classifier with additional checkpoint management functionality specifically needed for the larger BERT-large model. Like the base version, it implements data processors for all nine GLUE tasks (MRPC, MNLI, CoLA, SST-2, STS-B, QQP, QNLI, RTE, WNLI) and supports DeepSpeed distributed training.
The key addition over the BERT-base version is the checkpoint management system via checkpoint_model() and load_checkpoint() functions. The checkpoint_model() function saves model state through DeepSpeed's model.save_checkpoint() method, persisting the current epoch, global step count, and global data sample count. The load_checkpoint() function restores training state from a previous checkpoint, enabling resumable training which is critical for BERT-large where training runs take significantly longer.
The training pipeline follows the same structure as the base version: data loading with task-specific processors, feature extraction with WordPiece tokenization, DeepSpeed-wrapped training with BertAdam optimizer and warmup scheduling, and task-specific evaluation metrics. The script also supports FocalLoss for class-imbalanced tasks and integrates with the pytorch_pretrained_bert library.
Usage
Use this script to fine-tune BERT-large on GLUE benchmark tasks with DeepSpeed. The checkpoint management makes it suitable for long-running training jobs that may need to be interrupted and resumed.
Code Reference
Source Location
- Repository: Microsoft_DeepSpeedExamples
- File:
training/BingBertGlue/run_glue_classifier_bert_large.py - Lines: 1-1260
Signature
def checkpoint_model(PATH, ckpt_id, model, epoch, last_global_step,
last_global_data_samples, **kwargs):
...
def load_checkpoint(model, PATH, ckpt_id):
...
class InputExample(object):
def __init__(self, guid, text_a, text_b=None, label=None):
...
class InputFeatures(object):
def __init__(self, input_ids, input_mask, segment_ids, label_id):
...
class DataProcessor(object):
def get_train_examples(self, data_dir): ...
def get_dev_examples(self, data_dir): ...
def get_labels(self): ...
def convert_examples_to_features(examples, label_list, max_seq_length, tokenizer, output_mode):
...
def compute_metrics(task_name, preds, labels):
...
def main():
...
Import
# This is a standalone training script, run via DeepSpeed launcher:
# deepspeed run_glue_classifier_bert_large.py --deepspeed_config ds_config.json ...
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| --data_dir | str | Yes | Directory containing GLUE task TSV data files |
| --bert_model | str | Yes | Pretrained BERT model name (e.g., bert-large-uncased) |
| --task_name | str | Yes | GLUE task name: mrpc, mnli, cola, sst-2, sts-b, qqp, qnli, rte, wnli |
| --output_dir | str | Yes | Directory for model predictions and checkpoints |
| --max_seq_length | int | No | Maximum tokenized sequence length (default: 128) |
| --do_train | flag | No | Run training phase |
| --do_eval | flag | No | Run evaluation phase |
| --train_batch_size | int | No | Training batch size (default: 32) |
| --learning_rate | float | No | Initial learning rate for Adam (default: 5e-5) |
| --num_train_epochs | float | No | Number of training epochs (default: 3.0) |
| --local_rank | int | No | Local rank for distributed training (default: -1) |
Outputs
| Name | Type | Description |
|---|---|---|
| eval_results.txt | file | Evaluation metrics per task (accuracy, F1, MCC, or correlation) |
| model checkpoint | directory | DeepSpeed checkpoint with model state, epoch, global step, and data sample count |
| training logs | stdout | Training loss, checkpoint status, and evaluation results |
Usage Examples
Fine-tune BERT-large on SST-2 with Checkpointing
# Launch with DeepSpeed for SST-2 sentiment analysis
deepspeed run_glue_classifier_bert_large.py \
--deepspeed_config ds_config.json \
--data_dir /data/glue/SST-2 \
--bert_model bert-large-uncased \
--task_name sst-2 \
--output_dir /output/sst2 \
--do_train \
--do_eval \
--do_lower_case \
--max_seq_length 128 \
--train_batch_size 16 \
--learning_rate 2e-5 \
--num_train_epochs 3.0