Implementation:Microsoft LoRA Legacy Finetune Trainer Seq2Seq
Template:Implementation metadata
Overview
Seq2seq fine-tuning script using Seq2SeqTrainer for summarization and translation tasks with BART, mBART, mT5, and other encoder-decoder models.
Description
finetune_trainer.py is a legacy HuggingFace Transformers seq2seq example script included in the Microsoft LoRA NLU example directory. It uses a custom Seq2SeqTrainer (from a co-located seq2seq_trainer module) and Seq2SeqTrainingArguments to fine-tune encoder-decoder models for conditional text generation tasks such as summarization and translation.
The script uses HfArgumentParser to parse three structured dataclasses: ModelArguments (model path, config, tokenizer, freeze options), DataTrainingArguments (data directory, task type, sequence lengths, beam search parameters, language IDs), and Seq2SeqTrainingArguments (extending standard TrainingArguments). It supports:
- Model freezing: Optional freezing of encoder parameters and/or embedding layers via
freeze_embeds()andfreeze_params() - mBART language handling: Automatic
decoder_start_token_idconfiguration for mBART models based on target language - Task-specific parameters: Applies model config task-specific params via
use_task_specific_params() - Custom metrics: Builds ROUGE (summarization) or BLEU (translation) compute_metrics function via
build_compute_metrics_fn() - JSON config support: Can parse arguments from a JSON file when a single argument is a
.jsonfile path
The pipeline supports train, evaluate, and predict phases, saving all metrics to JSON and optionally writing decoded test predictions to a text file.
This script is part of the HuggingFace Transformers library (legacy examples) bundled in the Microsoft LoRA repository.
⚠️ DEPRECATED: This file resides in the legacy/ directory and is not actively maintained. Prefer modern equivalents where available.
Usage
Use this script to fine-tune BART, mBART, T5, mT5, Pegasus, or other seq2seq models for summarization or translation tasks. It expects data in the standard seq2seq format with .source and .target files for each split (train, val, test).
Code Reference
Source Location
| Property | Value |
|---|---|
| File path | examples/NLU/examples/legacy/seq2seq/finetune_trainer.py
|
| Lines | 367 |
| Module | finetune_trainer
|
Key Classes and Functions
| Name | Type | Signature / Description |
|---|---|---|
ModelArguments |
dataclass | Fields: model_name_or_path, config_name, tokenizer_name, cache_dir, freeze_encoder, freeze_embeds
|
DataTrainingArguments |
dataclass | Fields: data_dir, task, max_source_length, max_target_length, val_max_target_length, test_max_target_length, n_train, n_val, n_test, src_lang, tgt_lang, eval_beams, ignore_pad_token_for_loss
|
handle_metrics |
function | handle_metrics(split, metrics, output_dir) -- logs and saves metrics to JSON
|
main |
function | Entry point: parses args, builds model/tokenizer/datasets/trainer, runs train/eval/predict |
_mp_fn |
function | TPU spawn entry point |
CLI Usage
python finetune_trainer.py \ --model_name_or_path facebook/bart-large \ --data_dir /path/to/summarization_data \ --output_dir /path/to/output \ --do_train \ --do_eval \ --task summarization \ --max_source_length 1024 \ --max_target_length 128 \ --per_device_train_batch_size 4
I/O Contract
Inputs
| Input | Type | Description |
|---|---|---|
--model_name_or_path |
str (required) |
Pretrained seq2seq model name or path |
--data_dir |
str (required) |
Directory containing {split}.source and {split}.target files
|
--task |
str (default "summarization") |
Task name: summarization, summarization_{dataset}, or translation
|
--max_source_length |
int (default 1024) |
Maximum source sequence length |
--max_target_length |
int (default 128) |
Maximum target sequence length for training |
--val_max_target_length |
int (default 142) |
Maximum target length for validation (also used for model.generate max_length)
|
--test_max_target_length |
int (default 142) |
Maximum target length for test prediction |
--eval_beams |
Optional[int] |
Number of beams for evaluation generation (defaults to model.config.num_beams)
|
--freeze_encoder |
flag | Freeze all encoder parameters |
--freeze_embeds |
flag | Freeze embedding layers |
--src_lang |
Optional[str] |
Source language ID (required for mBART) |
--tgt_lang |
Optional[str] |
Target language ID (required for mBART) |
--n_train |
int (default -1) |
Number of training examples (-1 for all) |
Data Directory Structure
data_dir/ train.source # One source document per line train.target # One target summary/translation per line val.source val.target test.source test.target
Outputs
| Output | Type | Description |
|---|---|---|
train_results.json |
JSON | Training metrics |
val_results.json |
JSON | Validation metrics (loss, ROUGE or BLEU) |
test_results.json |
JSON | Test metrics |
all_results.json |
JSON | Combined metrics from all phases |
test_generations.txt |
text file | Decoded test predictions (when predict_with_generate is enabled)
|
trainer_state.json |
JSON | Trainer state for resuming |
| Saved model | directory | Model, tokenizer, and config saved to output_dir
|
Usage Examples
Summarization with BART
python finetune_trainer.py \ --model_name_or_path facebook/bart-large-cnn \ --data_dir /data/cnn_dm/ \ --output_dir /output/bart_summarization/ \ --do_train \ --do_eval \ --do_predict \ --task summarization \ --max_source_length 1024 \ --max_target_length 142 \ --val_max_target_length 142 \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --predict_with_generate \ --eval_beams 4 \ --overwrite_output_dir
Translation with mBART
python finetune_trainer.py \ --model_name_or_path facebook/mbart-large-cc25 \ --data_dir /data/en_de_translation/ \ --output_dir /output/mbart_translation/ \ --do_train \ --do_eval \ --task translation \ --src_lang en_XX \ --tgt_lang de_DE \ --max_source_length 512 \ --max_target_length 128 \ --per_device_train_batch_size 8 \ --predict_with_generate \ --overwrite_output_dir
Training with Frozen Embeddings
python finetune_trainer.py \ --model_name_or_path facebook/bart-large \ --data_dir /data/summarization/ \ --output_dir /output/bart_frozen/ \ --do_train \ --do_eval \ --task summarization \ --freeze_embeds \ --max_source_length 1024 \ --max_target_length 128 \ --overwrite_output_dir