Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Microsoft LoRA Run XNLI

From Leeroopedia


Template:Implementation metadata

Overview

run_xnli.py is a multilingual natural language inference fine-tuning script for the XNLI dataset using AutoModelForSequenceClassification and the HuggingFace Trainer API.

Description

This script fine-tunes multilingual transformer models (e.g., BERT multilingual, XLM-RoBERTa, DistilBERT multilingual) on the XNLI (Cross-lingual Natural Language Inference) benchmark. XNLI provides premise-hypothesis pairs labeled as entailment, contradiction, or neutral across 15 languages.

Key implementation details:

  • Language configuration: Uses --language for evaluation language and optionally --train_language if training should use a different language (e.g., train on English, evaluate on French for cross-lingual transfer).
  • Dataset loading: Directly loads from the HuggingFace Hub via load_dataset("xnli", language). No custom file support -- exclusively uses the XNLI dataset.
  • Label handling: Extracts label names from dataset features (train_dataset.features["label"].names) to determine num_labels automatically.
  • Preprocessing: Tokenizes premise-hypothesis pairs using tokenizer(examples["premise"], examples["hypothesis"], ...) with configurable padding and truncation.
  • Metrics: Uses the xnli metric from the datasets library for evaluation.
  • Distant debugging support: Includes optional ptvsd-based remote debugging via --server_ip and --server_port arguments.
  • Case handling: Supports --do_lower_case flag passed to AutoTokenizer.from_pretrained().
  • Data collation: Selects between default_data_collator (when padding to max length), DataCollatorWithPadding with pad_to_multiple_of=8 (for FP16), or None (default behavior).

The script enforces check_min_version("4.4.0") and follows the standard checkpoint resumption pattern.

Usage

Use this script when you need to:

  • Evaluate cross-lingual transfer learning on the XNLI benchmark
  • Fine-tune multilingual models on natural language inference
  • Train in one language and evaluate in another for zero-shot cross-lingual settings

Code Reference

Source Location

Property Value
File examples/NLU/examples/text-classification/run_xnli.py
Lines 351
Module run_xnli
Entry Point main()

Signature/CLI

python run_xnli.py \
    --model_name_or_path MODEL_NAME \
    --language LANG_CODE \
    --output_dir OUTPUT_DIR \
    --do_train \
    --do_eval \
    [--train_language TRAIN_LANG] \
    [--max_seq_length 128] \
    [--pad_to_max_length] \
    [--per_device_train_batch_size BATCH_SIZE] \
    [--learning_rate LR] \
    [--num_train_epochs EPOCHS] \
    [--max_train_samples N] \
    [--max_val_samples N] \
    [--do_lower_case]

Import

from transformers import (
    AutoConfig,
    AutoModelForSequenceClassification,
    AutoTokenizer,
    DataCollatorWithPadding,
    EvalPrediction,
    HfArgumentParser,
    Trainer,
    TrainingArguments,
    default_data_collator,
    set_seed,
)
from datasets import load_dataset, load_metric

I/O Contract

Inputs

Parameter Type Required Default Description
--model_name_or_path str Yes - Multilingual pretrained model (e.g., bert-base-multilingual-cased)
--language str Yes - Evaluation language code (e.g., en, fr, de, zh)
--output_dir str Yes - Directory for checkpoints and results
--train_language str No None Training language (defaults to --language if not set)
--max_seq_length int No 128 Max tokenized sequence length
--pad_to_max_length flag No True Pad all samples to max length
--do_lower_case flag No False Lowercase input during tokenization
--max_train_samples int No None Truncate training set for debugging
--max_val_samples int No None Truncate validation set for debugging
--max_test_samples int No None Truncate test set for debugging

Outputs

Output Location Description
Trained model {output_dir}/ Saved model, config, and tokenizer
Training metrics {output_dir}/train_results.json Loss, runtime, samples per second
Evaluation metrics {output_dir}/eval_results.json XNLI accuracy for the target language

Usage Examples

Fine-tune mBERT on English XNLI

python examples/NLU/examples/text-classification/run_xnli.py \
    --model_name_or_path bert-base-multilingual-cased \
    --language en \
    --do_train \
    --do_eval \
    --per_device_train_batch_size 32 \
    --learning_rate 5e-5 \
    --num_train_epochs 3 \
    --max_seq_length 128 \
    --output_dir /tmp/xnli_en

Cross-lingual transfer: train English, evaluate French

python examples/NLU/examples/text-classification/run_xnli.py \
    --model_name_or_path xlm-roberta-base \
    --language fr \
    --train_language en \
    --do_train \
    --do_eval \
    --per_device_train_batch_size 32 \
    --learning_rate 2e-5 \
    --num_train_epochs 5 \
    --output_dir /tmp/xnli_en_to_fr

Debug with limited samples

python examples/NLU/examples/text-classification/run_xnli.py \
    --model_name_or_path bert-base-multilingual-cased \
    --language de \
    --do_train \
    --do_eval \
    --max_train_samples 100 \
    --max_val_samples 50 \
    --output_dir /tmp/xnli_debug

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment