Implementation:Microsoft LoRA Run NER

Overview

run_ner.py is a modern token classification fine-tuning script for Named Entity Recognition (NER) and similar tasks using AutoModelForTokenClassification, DataCollatorForTokenClassification, and the seqeval metric.

Description

This script fine-tunes transformer models on token-level classification tasks such as NER, part-of-speech tagging, and chunking. It handles the complexities of subword tokenization alignment -- mapping word-level labels to subword tokens produced by the tokenizer.

Key implementation details:

Fast tokenizer requirement: Requires PreTrainedTokenizerFast because it uses word_ids() to align subword tokens back to original words for label assignment.
Label discovery: Supports two modes:
- If the label column uses ClassLabel from the datasets library, extracts label names directly from features[label_column_name].feature.names.
- Otherwise, iterates over the training data to discover unique labels, sorts them, and builds a label_to_id mapping.
Subword label alignment: The tokenize_and_align_labels() function:
- Uses is_split_into_words=True since inputs are pre-tokenized word lists.
- Assigns -100 (ignored in loss) to special tokens (word_id is None).
- For the first subword token of each word, assigns the corresponding label.
- For subsequent subword tokens of the same word, assigns either the label (if label_all_tokens=True) or -100.
Column detection: Auto-detects tokens column for text and {task_name}_tags column for labels (e.g., ner_tags, pos_tags).
Data collation: Uses DataCollatorForTokenClassification for dynamic padding of variable-length token sequences.
Metrics: Uses the seqeval metric for entity-level evaluation. Supports two reporting modes:
- Default: Overall precision, recall, F1, and accuracy.
- Entity-level (--return_entity_level_metrics): Per-entity-type metrics unpacked from nested dictionaries.
Three-phase pipeline: Supports do_train, do_eval, and do_predict. Test predictions are saved as space-separated label sequences in test_predictions.txt.

Usage

Use this script when you need to:

Fine-tune models on NER, POS tagging, or chunking tasks
Handle subword-to-word label alignment for token classification
Train on standard datasets (CoNLL-2003, etc.) or custom CSV/JSON token-level data

Code Reference

Source Location

Property	Value
File	`examples/NLU/examples/token-classification/run_ner.py`
Lines	501
Module	`run_ner`
Entry Point	`main()`

Signature/CLI

python run_ner.py \
    --model_name_or_path MODEL_NAME \
    --dataset_name DATASET_NAME \
    --output_dir OUTPUT_DIR \
    --do_train \
    --do_eval \
    [--do_predict] \
    [--dataset_config_name CONFIG] \
    [--task_name ner] \
    [--train_file TRAIN_FILE] \
    [--validation_file VALIDATION_FILE] \
    [--test_file TEST_FILE] \
    [--pad_to_max_length] \
    [--label_all_tokens] \
    [--return_entity_level_metrics] \
    [--max_train_samples N] \
    [--max_val_samples N] \
    [--max_test_samples N]

Import

from transformers import (
    AutoConfig,
    AutoModelForTokenClassification,
    AutoTokenizer,
    DataCollatorForTokenClassification,
    HfArgumentParser,
    PreTrainedTokenizerFast,
    Trainer,
    TrainingArguments,
    set_seed,
)
from datasets import ClassLabel, load_dataset, load_metric

I/O Contract

Inputs

Parameter	Type	Required	Default	Description
`--model_name_or_path`	str	Yes	-	Pretrained model name or path
`--output_dir`	str	Yes	-	Directory for checkpoints and results
`--dataset_name`	str	No	None	HuggingFace dataset name (e.g., `conll2003`)
`--task_name`	str	No	ner	Task name used for label column detection (`{task}_tags`)
`--train_file`	str	No	None	Custom CSV/JSON training file
`--validation_file`	str	No	None	Custom CSV/JSON validation file
`--test_file`	str	No	None	Custom CSV/JSON test file
`--pad_to_max_length`	flag	No	False	Pad to model max length (required for TPU)
`--label_all_tokens`	flag	No	False	Label all subword tokens (not just first)
`--return_entity_level_metrics`	flag	No	False	Report per-entity-type metrics
`--max_train_samples`	int	No	None	Truncate training set for debugging
`--max_val_samples`	int	No	None	Truncate validation set for debugging
`--max_test_samples`	int	No	None	Truncate test set for debugging

Outputs

Output	Location	Description
Trained model	`{output_dir}/`	Saved model, config, and tokenizer
Training metrics	`{output_dir}/train_results.json`	Loss, runtime, samples per second
Evaluation metrics	`{output_dir}/eval_results.json`	Precision, recall, F1, accuracy (overall or per-entity)
Test metrics	`{output_dir}/test_results.json`	Test set seqeval metrics
Test predictions	`{output_dir}/test_predictions.txt`	Predicted labels, space-separated per line

Usage Examples

Fine-tune on CoNLL-2003 NER

python examples/NLU/examples/token-classification/run_ner.py \
    --model_name_or_path bert-base-cased \
    --dataset_name conll2003 \
    --do_train \
    --do_eval \
    --do_predict \
    --per_device_train_batch_size 16 \
    --learning_rate 2e-5 \
    --num_train_epochs 3 \
    --output_dir /tmp/ner_output

Fine-tune with entity-level metrics

python examples/NLU/examples/token-classification/run_ner.py \
    --model_name_or_path roberta-base \
    --dataset_name conll2003 \
    --do_train \
    --do_eval \
    --return_entity_level_metrics \
    --per_device_train_batch_size 32 \
    --learning_rate 5e-5 \
    --num_train_epochs 5 \
    --output_dir /tmp/ner_entity_metrics

POS tagging with custom data

python examples/NLU/examples/token-classification/run_ner.py \
    --model_name_or_path bert-base-uncased \
    --task_name pos \
    --train_file /path/to/train.json \
    --validation_file /path/to/val.json \
    --test_file /path/to/test.json \
    --do_train \
    --do_eval \
    --do_predict \
    --label_all_tokens \
    --output_dir /tmp/pos_output

Related Pages

Environment:Microsoft_LoRA_NLU_Conda_Environment
Implementation:Microsoft_LoRA_Run_GLUE_No_Trainer - Sentence-level classification counterpart
Implementation:Microsoft_LoRA_Run_XNLI - Multilingual sequence classification

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment