Implementation:Microsoft LoRA Run XNLI

Overview

run_xnli.py is a multilingual natural language inference fine-tuning script for the XNLI dataset using AutoModelForSequenceClassification and the HuggingFace Trainer API.

Description

This script fine-tunes multilingual transformer models (e.g., BERT multilingual, XLM-RoBERTa, DistilBERT multilingual) on the XNLI (Cross-lingual Natural Language Inference) benchmark. XNLI provides premise-hypothesis pairs labeled as entailment, contradiction, or neutral across 15 languages.

Key implementation details:

Language configuration: Uses --language for evaluation language and optionally --train_language if training should use a different language (e.g., train on English, evaluate on French for cross-lingual transfer).
Dataset loading: Directly loads from the HuggingFace Hub via load_dataset("xnli", language). No custom file support -- exclusively uses the XNLI dataset.
Label handling: Extracts label names from dataset features (train_dataset.features["label"].names) to determine num_labels automatically.
Preprocessing: Tokenizes premise-hypothesis pairs using tokenizer(examples["premise"], examples["hypothesis"], ...) with configurable padding and truncation.
Metrics: Uses the xnli metric from the datasets library for evaluation.
Distant debugging support: Includes optional ptvsd-based remote debugging via --server_ip and --server_port arguments.
Case handling: Supports --do_lower_case flag passed to AutoTokenizer.from_pretrained().
Data collation: Selects between default_data_collator (when padding to max length), DataCollatorWithPadding with pad_to_multiple_of=8 (for FP16), or None (default behavior).

The script enforces check_min_version("4.4.0") and follows the standard checkpoint resumption pattern.

Usage

Use this script when you need to:

Evaluate cross-lingual transfer learning on the XNLI benchmark
Fine-tune multilingual models on natural language inference
Train in one language and evaluate in another for zero-shot cross-lingual settings

Code Reference

Source Location

Property	Value
File	`examples/NLU/examples/text-classification/run_xnli.py`
Lines	351
Module	`run_xnli`
Entry Point	`main()`

Signature/CLI

python run_xnli.py \
    --model_name_or_path MODEL_NAME \
    --language LANG_CODE \
    --output_dir OUTPUT_DIR \
    --do_train \
    --do_eval \
    [--train_language TRAIN_LANG] \
    [--max_seq_length 128] \
    [--pad_to_max_length] \
    [--per_device_train_batch_size BATCH_SIZE] \
    [--learning_rate LR] \
    [--num_train_epochs EPOCHS] \
    [--max_train_samples N] \
    [--max_val_samples N] \
    [--do_lower_case]

Import

from transformers import (
    AutoConfig,
    AutoModelForSequenceClassification,
    AutoTokenizer,
    DataCollatorWithPadding,
    EvalPrediction,
    HfArgumentParser,
    Trainer,
    TrainingArguments,
    default_data_collator,
    set_seed,
)
from datasets import load_dataset, load_metric

I/O Contract

Inputs

Parameter	Type	Required	Default	Description
`--model_name_or_path`	str	Yes	-	Multilingual pretrained model (e.g., `bert-base-multilingual-cased`)
`--language`	str	Yes	-	Evaluation language code (e.g., `en`, `fr`, `de`, `zh`)
`--output_dir`	str	Yes	-	Directory for checkpoints and results
`--train_language`	str	No	None	Training language (defaults to `--language` if not set)
`--max_seq_length`	int	No	128	Max tokenized sequence length
`--pad_to_max_length`	flag	No	True	Pad all samples to max length
`--do_lower_case`	flag	No	False	Lowercase input during tokenization
`--max_train_samples`	int	No	None	Truncate training set for debugging
`--max_val_samples`	int	No	None	Truncate validation set for debugging
`--max_test_samples`	int	No	None	Truncate test set for debugging

Outputs

Output	Location	Description
Trained model	`{output_dir}/`	Saved model, config, and tokenizer
Training metrics	`{output_dir}/train_results.json`	Loss, runtime, samples per second
Evaluation metrics	`{output_dir}/eval_results.json`	XNLI accuracy for the target language

Usage Examples

Fine-tune mBERT on English XNLI

python examples/NLU/examples/text-classification/run_xnli.py \
    --model_name_or_path bert-base-multilingual-cased \
    --language en \
    --do_train \
    --do_eval \
    --per_device_train_batch_size 32 \
    --learning_rate 5e-5 \
    --num_train_epochs 3 \
    --max_seq_length 128 \
    --output_dir /tmp/xnli_en

Cross-lingual transfer: train English, evaluate French

python examples/NLU/examples/text-classification/run_xnli.py \
    --model_name_or_path xlm-roberta-base \
    --language fr \
    --train_language en \
    --do_train \
    --do_eval \
    --per_device_train_batch_size 32 \
    --learning_rate 2e-5 \
    --num_train_epochs 5 \
    --output_dir /tmp/xnli_en_to_fr

Debug with limited samples

python examples/NLU/examples/text-classification/run_xnli.py \
    --model_name_or_path bert-base-multilingual-cased \
    --language de \
    --do_train \
    --do_eval \
    --max_train_samples 100 \
    --max_val_samples 50 \
    --output_dir /tmp/xnli_debug

Related Pages

Environment:Microsoft_LoRA_NLU_Conda_Environment
Implementation:Microsoft_LoRA_Run_GLUE_No_Trainer - GLUE classification with manual training loop
Implementation:Microsoft_LoRA_Run_TF_Text_Classification - TensorFlow-based text classification

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment