Environment:Microsoft LoRA NLG Eval External Tools
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, NLG_Evaluation |
| Last Updated | 2026-02-10 05:30 GMT |
Overview
External evaluation tool dependencies (Perl, Java, Python packages) required for computing NLG metrics: BLEU, METEOR, chrF++, TER, and BERTScore.
Description
The NLG evaluation pipeline relies on a mix of external tools beyond standard Python packages. The BLEU metric uses a Perl script (`multi-bleu-detok.perl`), METEOR requires Java 1.8+ with the METEOR JAR, and other metrics use Python packages (pyter for TER, bert_score for BERTScore, nltk for tokenization). These are installed via the `download_evalscript.sh` script which clones the WebNLG GenerationEval and e2e-metrics repositories.
Usage
Use this environment when running the NLG Evaluation Metrics workflow step. It is required by the `eval.py` script which computes BLEU, METEOR, chrF++, TER, BERTScore, and BLEURT on generated text from GPT-2 LoRA models.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux | Perl and Java must be globally accessible |
| Runtime | Perl 5+ | Required for `multi-bleu-detok.perl` BLEU script |
| Runtime | Java 1.8+ | Required for METEOR JAR (`meteor-1.5.jar`), needs `-Xmx2G` heap |
| Disk | ~500MB | For cloned evaluation repos and METEOR JAR |
Dependencies
System Packages
- `perl` (globally installed)
- `java` >= 1.8 (globally installed, needs 2GB heap for METEOR)
- `git` (for cloning evaluation repos)
Python Packages
- `pyter` (TER computation)
- `bert_score` (BERTScore computation)
- `nltk` (tokenization)
- `razdel` (Russian tokenization)
- `tabulate` (result formatting)
- `codecs` (standard library)
External Repositories
- `https://github.com/tuetschek/e2e-metrics` (e2e evaluation scripts)
- `https://github.com/WebNLG/GenerationEval` (WebNLG/DART evaluation with BLEU Perl script and METEOR JAR)
Credentials
No credentials required.
Quick Install
# Install Python evaluation packages
pip install pyter bert_score nltk razdel tabulate
# Run the evaluation setup script
cd examples/NLG
bash eval/download_evalscript.sh
Code Evidence
Perl BLEU dependency from `examples/NLG/eval/eval.py:59`:
BLEU_PATH = 'metrics/multi-bleu-detok.perl'
Java METEOR dependency from `examples/NLG/eval/eval.py:60`:
METEOR_PATH = 'metrics/meteor-1.5/meteor-1.5.jar'
Perl invocation from `examples/NLG/eval/eval.py:112`:
command = 'perl {0} {1} < {2}'.format(BLEU_PATH, ' '.join(ref_files), hyps_path)
result = subprocess.check_output(command, shell=True)
Java invocation from `examples/NLG/eval/eval.py:155-156`:
command = 'java -Xmx2G -jar {0} '.format(METEOR_PATH)
command += '{0} {1} -l {2} -norm -r {3}'.format(hyps_tmp, refs_tmp, lng, num_refs)
Error message for missing Perl from `examples/NLG/eval/eval.py:117-118`:
logging.error('ERROR ON COMPUTING METEOR. MAKE SURE YOU HAVE PERL INSTALLED GLOBALLY ON YOUR MACHINE.')
Error message for missing Java from `examples/NLG/eval/eval.py:160-161`:
logging.error('ERROR ON COMPUTING METEOR. MAKE SURE YOU HAVE JAVA INSTALLED GLOBALLY ON YOUR MACHINE.')
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ERROR ON COMPUTING METEOR. MAKE SURE YOU HAVE PERL INSTALLED GLOBALLY ON YOUR MACHINE.` | Perl not installed or BLEU script missing | Install Perl and run `bash eval/download_evalscript.sh` |
| `ERROR ON COMPUTING METEOR. MAKE SURE YOU HAVE JAVA INSTALLED GLOBALLY ON YOUR MACHINE.` | Java not installed or METEOR JAR missing | Install Java 1.8+ and run `bash eval/download_evalscript.sh` |
| `ModuleNotFoundError: No module named 'pyter'` | pyter package not installed | `pip install pyter` |
| `ModuleNotFoundError: No module named 'bert_score'` | bert_score package not installed | `pip install bert_score` |
Compatibility Notes
- Russian evaluation: Uses `razdel` tokenizer instead of NLTK when `--language ru` is specified.
- BLEURT: Only available for English (`lng == 'en'`). Requires a BLEURT checkpoint at `metrics/bleurt/bleurt-base-128`.
- BERTScore: Falls back to 0 precision/recall/F1 on failure (e.g., if model download fails).