Implementation:Microsoft LoRA Run XNLI
Template:Implementation metadata
Overview
run_xnli.py is a multilingual natural language inference fine-tuning script for the XNLI dataset using AutoModelForSequenceClassification and the HuggingFace Trainer API.
Description
This script fine-tunes multilingual transformer models (e.g., BERT multilingual, XLM-RoBERTa, DistilBERT multilingual) on the XNLI (Cross-lingual Natural Language Inference) benchmark. XNLI provides premise-hypothesis pairs labeled as entailment, contradiction, or neutral across 15 languages.
Key implementation details:
- Language configuration: Uses
--languagefor evaluation language and optionally--train_languageif training should use a different language (e.g., train on English, evaluate on French for cross-lingual transfer). - Dataset loading: Directly loads from the HuggingFace Hub via
load_dataset("xnli", language). No custom file support -- exclusively uses the XNLI dataset. - Label handling: Extracts label names from dataset features (
train_dataset.features["label"].names) to determinenum_labelsautomatically. - Preprocessing: Tokenizes premise-hypothesis pairs using
tokenizer(examples["premise"], examples["hypothesis"], ...)with configurable padding and truncation. - Metrics: Uses the
xnlimetric from the datasets library for evaluation. - Distant debugging support: Includes optional ptvsd-based remote debugging via
--server_ipand--server_portarguments. - Case handling: Supports
--do_lower_caseflag passed toAutoTokenizer.from_pretrained(). - Data collation: Selects between
default_data_collator(when padding to max length),DataCollatorWithPaddingwithpad_to_multiple_of=8(for FP16), or None (default behavior).
The script enforces check_min_version("4.4.0") and follows the standard checkpoint resumption pattern.
Usage
Use this script when you need to:
- Evaluate cross-lingual transfer learning on the XNLI benchmark
- Fine-tune multilingual models on natural language inference
- Train in one language and evaluate in another for zero-shot cross-lingual settings
Code Reference
Source Location
| Property | Value |
|---|---|
| File | examples/NLU/examples/text-classification/run_xnli.py
|
| Lines | 351 |
| Module | run_xnli
|
| Entry Point | main()
|
Signature/CLI
python run_xnli.py \
--model_name_or_path MODEL_NAME \
--language LANG_CODE \
--output_dir OUTPUT_DIR \
--do_train \
--do_eval \
[--train_language TRAIN_LANG] \
[--max_seq_length 128] \
[--pad_to_max_length] \
[--per_device_train_batch_size BATCH_SIZE] \
[--learning_rate LR] \
[--num_train_epochs EPOCHS] \
[--max_train_samples N] \
[--max_val_samples N] \
[--do_lower_case]
Import
from transformers import (
AutoConfig,
AutoModelForSequenceClassification,
AutoTokenizer,
DataCollatorWithPadding,
EvalPrediction,
HfArgumentParser,
Trainer,
TrainingArguments,
default_data_collator,
set_seed,
)
from datasets import load_dataset, load_metric
I/O Contract
Inputs
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--model_name_or_path |
str | Yes | - | Multilingual pretrained model (e.g., bert-base-multilingual-cased)
|
--language |
str | Yes | - | Evaluation language code (e.g., en, fr, de, zh)
|
--output_dir |
str | Yes | - | Directory for checkpoints and results |
--train_language |
str | No | None | Training language (defaults to --language if not set)
|
--max_seq_length |
int | No | 128 | Max tokenized sequence length |
--pad_to_max_length |
flag | No | True | Pad all samples to max length |
--do_lower_case |
flag | No | False | Lowercase input during tokenization |
--max_train_samples |
int | No | None | Truncate training set for debugging |
--max_val_samples |
int | No | None | Truncate validation set for debugging |
--max_test_samples |
int | No | None | Truncate test set for debugging |
Outputs
| Output | Location | Description |
|---|---|---|
| Trained model | {output_dir}/ |
Saved model, config, and tokenizer |
| Training metrics | {output_dir}/train_results.json |
Loss, runtime, samples per second |
| Evaluation metrics | {output_dir}/eval_results.json |
XNLI accuracy for the target language |
Usage Examples
Fine-tune mBERT on English XNLI
python examples/NLU/examples/text-classification/run_xnli.py \
--model_name_or_path bert-base-multilingual-cased \
--language en \
--do_train \
--do_eval \
--per_device_train_batch_size 32 \
--learning_rate 5e-5 \
--num_train_epochs 3 \
--max_seq_length 128 \
--output_dir /tmp/xnli_en
Cross-lingual transfer: train English, evaluate French
python examples/NLU/examples/text-classification/run_xnli.py \
--model_name_or_path xlm-roberta-base \
--language fr \
--train_language en \
--do_train \
--do_eval \
--per_device_train_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 5 \
--output_dir /tmp/xnli_en_to_fr
Debug with limited samples
python examples/NLU/examples/text-classification/run_xnli.py \
--model_name_or_path bert-base-multilingual-cased \
--language de \
--do_train \
--do_eval \
--max_train_samples 100 \
--max_val_samples 50 \
--output_dir /tmp/xnli_debug
Related Pages
- Environment:Microsoft_LoRA_NLU_Conda_Environment
- Implementation:Microsoft_LoRA_Run_GLUE_No_Trainer - GLUE classification with manual training loop
- Implementation:Microsoft_LoRA_Run_TF_Text_Classification - TensorFlow-based text classification