Implementation:Microsoft DeepSpeedExamples GLUE Classifier BERT Large

Knowledge Sources	Microsoft_DeepSpeedExamples
Domains	NLP, Fine-tuning
Last Updated	2026-02-07 12:00 GMT

Overview

BERT-large fine-tuning script for all nine GLUE benchmark tasks with DeepSpeed distributed training and checkpoint management for long-running training.

Description

run_glue_classifier_bert_large.py extends the BERT-base GLUE classifier with additional checkpoint management functionality specifically needed for the larger BERT-large model. Like the base version, it implements data processors for all nine GLUE tasks (MRPC, MNLI, CoLA, SST-2, STS-B, QQP, QNLI, RTE, WNLI) and supports DeepSpeed distributed training.

The key addition over the BERT-base version is the checkpoint management system via checkpoint_model() and load_checkpoint() functions. The checkpoint_model() function saves model state through DeepSpeed's model.save_checkpoint() method, persisting the current epoch, global step count, and global data sample count. The load_checkpoint() function restores training state from a previous checkpoint, enabling resumable training which is critical for BERT-large where training runs take significantly longer.

The training pipeline follows the same structure as the base version: data loading with task-specific processors, feature extraction with WordPiece tokenization, DeepSpeed-wrapped training with BertAdam optimizer and warmup scheduling, and task-specific evaluation metrics. The script also supports FocalLoss for class-imbalanced tasks and integrates with the pytorch_pretrained_bert library.

Usage

Use this script to fine-tune BERT-large on GLUE benchmark tasks with DeepSpeed. The checkpoint management makes it suitable for long-running training jobs that may need to be interrupted and resumed.

Code Reference

Source Location

Repository: Microsoft_DeepSpeedExamples
File: training/BingBertGlue/run_glue_classifier_bert_large.py
Lines: 1-1260

Signature

def checkpoint_model(PATH, ckpt_id, model, epoch, last_global_step,
                     last_global_data_samples, **kwargs):
    ...

def load_checkpoint(model, PATH, ckpt_id):
    ...

class InputExample(object):
    def __init__(self, guid, text_a, text_b=None, label=None):
        ...

class InputFeatures(object):
    def __init__(self, input_ids, input_mask, segment_ids, label_id):
        ...

class DataProcessor(object):
    def get_train_examples(self, data_dir): ...
    def get_dev_examples(self, data_dir): ...
    def get_labels(self): ...

def convert_examples_to_features(examples, label_list, max_seq_length, tokenizer, output_mode):
    ...

def compute_metrics(task_name, preds, labels):
    ...

def main():
    ...

Import

# This is a standalone training script, run via DeepSpeed launcher:
# deepspeed run_glue_classifier_bert_large.py --deepspeed_config ds_config.json ...

I/O Contract

Inputs

Name	Type	Required	Description
--data_dir	str	Yes	Directory containing GLUE task TSV data files
--bert_model	str	Yes	Pretrained BERT model name (e.g., bert-large-uncased)
--task_name	str	Yes	GLUE task name: mrpc, mnli, cola, sst-2, sts-b, qqp, qnli, rte, wnli
--output_dir	str	Yes	Directory for model predictions and checkpoints
--max_seq_length	int	No	Maximum tokenized sequence length (default: 128)
--do_train	flag	No	Run training phase
--do_eval	flag	No	Run evaluation phase
--train_batch_size	int	No	Training batch size (default: 32)
--learning_rate	float	No	Initial learning rate for Adam (default: 5e-5)
--num_train_epochs	float	No	Number of training epochs (default: 3.0)
--local_rank	int	No	Local rank for distributed training (default: -1)

Outputs

Name	Type	Description
eval_results.txt	file	Evaluation metrics per task (accuracy, F1, MCC, or correlation)
model checkpoint	directory	DeepSpeed checkpoint with model state, epoch, global step, and data sample count
training logs	stdout	Training loss, checkpoint status, and evaluation results

Usage Examples

Fine-tune BERT-large on SST-2 with Checkpointing

# Launch with DeepSpeed for SST-2 sentiment analysis
deepspeed run_glue_classifier_bert_large.py \
    --deepspeed_config ds_config.json \
    --data_dir /data/glue/SST-2 \
    --bert_model bert-large-uncased \
    --task_name sst-2 \
    --output_dir /output/sst2 \
    --do_train \
    --do_eval \
    --do_lower_case \
    --max_seq_length 128 \
    --train_batch_size 16 \
    --learning_rate 2e-5 \
    --num_train_epochs 3.0

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment