Implementation:Microsoft DeepSpeedExamples GLUE Classifier BERT Base
| Knowledge Sources | |
|---|---|
| Domains | NLP, Fine-tuning |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
BERT-base fine-tuning script for all nine GLUE benchmark tasks with DeepSpeed distributed training integration.
Description
run_glue_classifier_bert_base.py is a comprehensive fine-tuning script that trains BERT-base models on the GLUE benchmark (General Language Understanding Evaluation) using DeepSpeed for distributed training. The script implements data processors for all nine GLUE tasks: MRPC (paraphrase detection), MNLI (natural language inference), MNLI-MM (mismatched), CoLA (linguistic acceptability), SST-2 (sentiment analysis), STS-B (semantic similarity), QQP (question pair similarity), QNLI (question NLI), RTE (recognizing textual entailment), and WNLI (Winograd NLI).
Each task processor inherits from DataProcessor and implements methods to read TSV data files, create InputExample objects, and define the label set. The convert_examples_to_features() function tokenizes text pairs using BertTokenizer, pads sequences to max_seq_length, and produces InputFeatures with input IDs, attention masks, segment IDs, and label IDs. Task-specific metrics are computed via compute_metrics(), which dispatches to accuracy, F1, Matthews correlation, or Pearson/Spearman correlation depending on the task.
The training loop uses DeepSpeed for initialization and distributed training, with BertAdam optimizer and linear warmup scheduling. The script supports FocalLoss as an alternative to CrossEntropyLoss for handling class imbalance. It integrates with the pytorch_pretrained_bert library for model architecture and tokenization.
Usage
Use this script to fine-tune BERT-base on any GLUE benchmark task with DeepSpeed distributed training. It is the primary entry point for running BERT-base GLUE experiments in the BingBertGlue training example.
Code Reference
Source Location
- Repository: Microsoft_DeepSpeedExamples
- File:
training/BingBertGlue/run_glue_classifier_bert_base.py - Lines: 1-1145
Signature
class InputExample(object):
def __init__(self, guid, text_a, text_b=None, label=None):
...
class InputFeatures(object):
def __init__(self, input_ids, input_mask, segment_ids, label_id):
...
class DataProcessor(object):
def get_train_examples(self, data_dir): ...
def get_dev_examples(self, data_dir): ...
def get_labels(self): ...
class MrpcProcessor(DataProcessor): ...
class MnliProcessor(DataProcessor): ...
class ColaProcessor(DataProcessor): ...
class Sst2Processor(DataProcessor): ...
class StsbProcessor(DataProcessor): ...
class QqpProcessor(DataProcessor): ...
class QnliProcessor(DataProcessor): ...
class RteProcessor(DataProcessor): ...
class WnliProcessor(DataProcessor): ...
def convert_examples_to_features(examples, label_list, max_seq_length, tokenizer, output_mode):
...
def compute_metrics(task_name, preds, labels):
...
def main():
...
Import
# This is a standalone training script, run via DeepSpeed launcher:
# deepspeed run_glue_classifier_bert_base.py --deepspeed_config ds_config.json ...
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| --data_dir | str | Yes | Directory containing GLUE task TSV data files |
| --bert_model | str | Yes | Pretrained BERT model name (e.g., bert-base-uncased) |
| --task_name | str | Yes | GLUE task name: mrpc, mnli, cola, sst-2, sts-b, qqp, qnli, rte, wnli |
| --output_dir | str | Yes | Directory for model predictions and checkpoints |
| --max_seq_length | int | No | Maximum tokenized sequence length (default: 128) |
| --do_train | flag | No | Run training phase |
| --do_eval | flag | No | Run evaluation phase |
| --train_batch_size | int | No | Training batch size (default: 32) |
| --learning_rate | float | No | Initial learning rate for Adam (default: 5e-5) |
| --num_train_epochs | float | No | Number of training epochs (default: 3.0) |
| --local_rank | int | No | Local rank for distributed training (default: -1) |
Outputs
| Name | Type | Description |
|---|---|---|
| eval_results.txt | file | Evaluation metrics (accuracy, F1, MCC, or correlation depending on task) |
| model checkpoint | directory | Saved model weights and config in output_dir |
| training logs | stdout | Training loss and evaluation results |
Usage Examples
Fine-tune BERT-base on MRPC
# Launch with DeepSpeed for MRPC task
deepspeed run_glue_classifier_bert_base.py \
--deepspeed_config ds_config.json \
--data_dir /data/glue/MRPC \
--bert_model bert-base-uncased \
--task_name mrpc \
--output_dir /output/mrpc \
--do_train \
--do_eval \
--do_lower_case \
--max_seq_length 128 \
--train_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 3.0