Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Microsoft DeepSpeedExamples GLUE Classifier BERT Base

From Leeroopedia


Knowledge Sources
Domains NLP, Fine-tuning
Last Updated 2026-02-07 12:00 GMT

Overview

BERT-base fine-tuning script for all nine GLUE benchmark tasks with DeepSpeed distributed training integration.

Description

run_glue_classifier_bert_base.py is a comprehensive fine-tuning script that trains BERT-base models on the GLUE benchmark (General Language Understanding Evaluation) using DeepSpeed for distributed training. The script implements data processors for all nine GLUE tasks: MRPC (paraphrase detection), MNLI (natural language inference), MNLI-MM (mismatched), CoLA (linguistic acceptability), SST-2 (sentiment analysis), STS-B (semantic similarity), QQP (question pair similarity), QNLI (question NLI), RTE (recognizing textual entailment), and WNLI (Winograd NLI).

Each task processor inherits from DataProcessor and implements methods to read TSV data files, create InputExample objects, and define the label set. The convert_examples_to_features() function tokenizes text pairs using BertTokenizer, pads sequences to max_seq_length, and produces InputFeatures with input IDs, attention masks, segment IDs, and label IDs. Task-specific metrics are computed via compute_metrics(), which dispatches to accuracy, F1, Matthews correlation, or Pearson/Spearman correlation depending on the task.

The training loop uses DeepSpeed for initialization and distributed training, with BertAdam optimizer and linear warmup scheduling. The script supports FocalLoss as an alternative to CrossEntropyLoss for handling class imbalance. It integrates with the pytorch_pretrained_bert library for model architecture and tokenization.

Usage

Use this script to fine-tune BERT-base on any GLUE benchmark task with DeepSpeed distributed training. It is the primary entry point for running BERT-base GLUE experiments in the BingBertGlue training example.

Code Reference

Source Location

Signature

class InputExample(object):
    def __init__(self, guid, text_a, text_b=None, label=None):
        ...

class InputFeatures(object):
    def __init__(self, input_ids, input_mask, segment_ids, label_id):
        ...

class DataProcessor(object):
    def get_train_examples(self, data_dir): ...
    def get_dev_examples(self, data_dir): ...
    def get_labels(self): ...

class MrpcProcessor(DataProcessor): ...
class MnliProcessor(DataProcessor): ...
class ColaProcessor(DataProcessor): ...
class Sst2Processor(DataProcessor): ...
class StsbProcessor(DataProcessor): ...
class QqpProcessor(DataProcessor): ...
class QnliProcessor(DataProcessor): ...
class RteProcessor(DataProcessor): ...
class WnliProcessor(DataProcessor): ...

def convert_examples_to_features(examples, label_list, max_seq_length, tokenizer, output_mode):
    ...

def compute_metrics(task_name, preds, labels):
    ...

def main():
    ...

Import

# This is a standalone training script, run via DeepSpeed launcher:
# deepspeed run_glue_classifier_bert_base.py --deepspeed_config ds_config.json ...

I/O Contract

Inputs

Name Type Required Description
--data_dir str Yes Directory containing GLUE task TSV data files
--bert_model str Yes Pretrained BERT model name (e.g., bert-base-uncased)
--task_name str Yes GLUE task name: mrpc, mnli, cola, sst-2, sts-b, qqp, qnli, rte, wnli
--output_dir str Yes Directory for model predictions and checkpoints
--max_seq_length int No Maximum tokenized sequence length (default: 128)
--do_train flag No Run training phase
--do_eval flag No Run evaluation phase
--train_batch_size int No Training batch size (default: 32)
--learning_rate float No Initial learning rate for Adam (default: 5e-5)
--num_train_epochs float No Number of training epochs (default: 3.0)
--local_rank int No Local rank for distributed training (default: -1)

Outputs

Name Type Description
eval_results.txt file Evaluation metrics (accuracy, F1, MCC, or correlation depending on task)
model checkpoint directory Saved model weights and config in output_dir
training logs stdout Training loss and evaluation results

Usage Examples

Fine-tune BERT-base on MRPC

# Launch with DeepSpeed for MRPC task
deepspeed run_glue_classifier_bert_base.py \
    --deepspeed_config ds_config.json \
    --data_dir /data/glue/MRPC \
    --bert_model bert-base-uncased \
    --task_name mrpc \
    --output_dir /output/mrpc \
    --do_train \
    --do_eval \
    --do_lower_case \
    --max_seq_length 128 \
    --train_batch_size 32 \
    --learning_rate 2e-5 \
    --num_train_epochs 3.0

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment