Implementation:Microsoft LoRA Legacy Run NER
Template:Implementation metadata
Overview
Fine-tuning script for named entity recognition (NER) on CoNLL-2003 formatted datasets using AutoModelForTokenClassification with seqeval-based evaluation metrics.
Description
run_ner.py is a legacy HuggingFace Transformers example script included in the Microsoft LoRA NLU example directory. It fine-tunes any model compatible with AutoModelForTokenClassification for token-level classification tasks such as named entity recognition (NER) and part-of-speech (POS) tagging on CoNLL-2003 formatted data.
The script uses HfArgumentParser with three dataclasses: ModelArguments (model path, config, tokenizer, task type), DataTrainingArguments (data directory, labels file, max sequence length), and the built-in TrainingArguments. Task types are dynamically loaded from a tasks module via importlib, allowing extensibility to custom token classification tasks beyond NER.
Evaluation uses the seqeval library to compute entity-level precision, recall, F1-score, and token-level accuracy. The align_predictions function maps predicted and true label indices back to string labels, filtering out padding tokens (those with CrossEntropyLoss().ignore_index). The script supports train, evaluate, and predict phases, with predictions written to test_predictions.txt in the original CoNLL format.
The script also supports JSON configuration files as an alternative to command-line arguments, and includes a TPU spawn entry point for distributed training on TPU pods.
This script is part of the HuggingFace Transformers library (legacy examples) bundled in the Microsoft LoRA repository.
⚠️ DEPRECATED: This file resides in the legacy/ directory and is not actively maintained. Prefer modern equivalents where available.
Usage
Use this script to fine-tune a pretrained transformer model for named entity recognition on CoNLL-2003 or similarly formatted datasets. It supports any token classification task that can be defined as a TokenClassificationTask subclass.
Code Reference
Source Location
| Property | Value |
|---|---|
| File path | examples/NLU/examples/legacy/token-classification/run_ner.py
|
| Lines | 321 |
| Module | run_ner
|
Key Classes and Functions
| Name | Type | Signature / Description |
|---|---|---|
ModelArguments |
dataclass | Fields: model_name_or_path, config_name, task_type (default "NER"), tokenizer_name, use_fast, cache_dir
|
DataTrainingArguments |
dataclass | Fields: data_dir, labels, max_seq_length (default 128), overwrite_cache
|
align_predictions |
function (nested) | align_predictions(predictions: np.ndarray, label_ids: np.ndarray) -> Tuple[List[int], List[int]] -- maps predictions and labels to string label names, excluding padding
|
compute_metrics |
function (nested) | compute_metrics(p: EvalPrediction) -> Dict -- computes seqeval accuracy, precision, recall, F1
|
main |
function | Entry point: parses args, loads task class, builds model/tokenizer/datasets/trainer, runs train/eval/predict |
_mp_fn |
function | TPU spawn entry point |
CLI Usage
python run_ner.py \ --model_name_or_path bert-base-cased \ --data_dir /path/to/conll2003 \ --output_dir /path/to/output \ --do_train \ --do_eval \ --do_predict \ --max_seq_length 128 \ --per_device_train_batch_size 16 \ --num_train_epochs 3
I/O Contract
Inputs
| Input | Type | Description |
|---|---|---|
--model_name_or_path |
str (required) |
Pretrained model name or path |
--data_dir |
str (required) |
Directory with CoNLL-2003 formatted train.txt, dev.txt, test.txt
|
--labels |
Optional[str] |
Path to labels file; if not specified, CoNLL-2003 labels are used |
--task_type |
str (default "NER") |
Task class name to import from tasks module (e.g., "NER", "POS")
|
--max_seq_length |
int (default 128) |
Maximum input sequence length after tokenization |
--use_fast |
flag | Use fast tokenizer |
--overwrite_cache |
flag | Overwrite cached dataset files |
Standard TrainingArguments |
various | --output_dir, --do_train, --do_eval, --do_predict, --per_device_train_batch_size, --num_train_epochs, --fp16, etc.
|
Data Format (CoNLL-2003)
The input text files use CoNLL column format with blank lines separating sentences:
EU B-ORG rejects O German B-MISC call O to O boycott O British B-MISC lamb O . O Peter B-PER Blackburn I-PER
Outputs
| Output | Type | Description |
|---|---|---|
| Saved model | directory | Model, tokenizer, and config saved to output_dir
|
eval_results.txt |
text file | Evaluation metrics (accuracy, precision, recall, F1) |
test_results.txt |
text file | Test set metrics |
test_predictions.txt |
text file | Per-token NER predictions in CoNLL format |
| Return value | Dict[str, float] |
Dictionary with evaluation metrics |
Evaluation Metrics
| Metric | Source | Description |
|---|---|---|
accuracy_score |
seqeval | Token-level accuracy |
precision |
seqeval | Entity-level precision (strict matching) |
recall |
seqeval | Entity-level recall (strict matching) |
f1 |
seqeval | Entity-level F1 score (strict matching) |
Usage Examples
Fine-tuning BERT for NER on CoNLL-2003
python run_ner.py \ --model_name_or_path bert-base-cased \ --data_dir /data/conll2003/ \ --labels /data/conll2003/labels.txt \ --output_dir /output/bert_ner/ \ --do_train \ --do_eval \ --do_predict \ --max_seq_length 128 \ --per_device_train_batch_size 16 \ --per_device_eval_batch_size 16 \ --num_train_epochs 3 \ --learning_rate 5e-5 \ --overwrite_output_dir
NER with JSON Configuration
# config.json:
# {
# "model_name_or_path": "bert-base-cased",
# "data_dir": "/data/conll2003/",
# "output_dir": "/output/bert_ner/",
# "do_train": true,
# "do_eval": true,
# "max_seq_length": 128,
# "per_device_train_batch_size": 16,
# "num_train_epochs": 3
# }
python run_ner.py config.json
Fine-tuning for POS Tagging
python run_ner.py \ --model_name_or_path bert-base-cased \ --task_type POS \ --data_dir /data/pos_tagging/ \ --labels /data/pos_tagging/labels.txt \ --output_dir /output/bert_pos/ \ --do_train \ --do_eval \ --max_seq_length 128 \ --per_device_train_batch_size 32 \ --num_train_epochs 5 \ --overwrite_output_dir