Implementation:Fastai Fastbook Language Model Learner
| Knowledge Sources | |
|---|---|
| Domains | Natural Language Processing, Transfer Learning, Language Modeling |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
Concrete tool for fine-tuning a pretrained language model on domain-specific text, provided by the fastai library.
Description
The language_model_learner function creates a Learner object configured for language model training. It:
- Instantiates the specified architecture (e.g., AWD_LSTM) with the vocabulary size derived from the DataLoaders.
- Loads pretrained weights from a Wikitext-103 pretrained model, automatically handling vocabulary mismatches between the pretrained vocabulary and the target vocabulary.
- Applies the specified dropout multiplier to all dropout layers.
- Configures accuracy and perplexity metrics for monitoring training progress.
The resulting Learner exposes the standard fastai training interface: fit_one_cycle for training with slanted triangular learning rates, lr_find for learning rate discovery, and save_encoder for persisting the encoder weights.
Usage
Use language_model_learner as the central step in Stage 2 of the ULMFiT pipeline. It is called after creating language model DataLoaders and before training the classifier. The typical workflow involves training for 1-2 epochs with frozen pretrained layers, then unfreezing all layers and training for additional epochs.
Code Reference
Source Location
- Repository: fastbook
- File: translations/cn/10_nlp.md (lines 551-611)
- Library module: fastai.text.learner
Signature
def language_model_learner(
dls: DataLoaders, # Language model DataLoaders
arch: callable = AWD_LSTM, # Model architecture class
config: dict = None, # Architecture config overrides
drop_mult: float = 1.0, # Multiplier for all dropout rates
backwards: bool = False, # Train a backwards LM
pretrained: bool = True, # Use pretrained weights
pretrained_fnames: list = None, # Custom pretrained weight file names
loss_func: callable = None, # Loss function (default: CrossEntropyLossFlat)
opt_func: callable = Adam, # Optimizer
lr: float = 0.001, # Base learning rate
splitter: callable = awd_lstm_lm_split, # Parameter group splitter
cbs: list = None, # Additional callbacks
metrics: list = None, # Metrics to track
path: Path = None, # Model save path
model_dir: str = 'models', # Subdirectory for saved models
wd: float = None, # Weight decay
wd_bn_bias: bool = False, # Apply weight decay to batch norm and bias
moms: tuple = (0.95, 0.85, 0.95) # Momentum schedule
) -> Learner
# Key Learner methods:
Learner.fit_one_cycle(n_epoch, lr_max=None, div=25., div_final=1e5, pct_start=0.25, wd=None, moms=None, cbs=None, reset_opt=False)
Learner.save_encoder(name: str)
Learner.lr_find(start_lr=1e-7, end_lr=10, num_it=100, stop_div=True, show_plot=True)
Import
from fastai.text.all import language_model_learner, AWD_LSTM, Perplexity
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| dls | DataLoaders | Yes | Language model DataLoaders created by DataBlock with TextBlock(is_lm=True). |
| arch | callable | No | Model architecture class. Default: AWD_LSTM. The AWD-LSTM uses 3 LSTM layers with 1,150 hidden units and 400-dimensional embeddings. |
| drop_mult | float | No | Scales all dropout rates. Default: 1.0. Use lower values (0.3-0.5) for smaller datasets to reduce regularization. |
| pretrained | bool | No | Whether to load Wikitext-103 pretrained weights. Default: True. |
| metrics | list | No | List of metric functions to compute during training. Common choices: [accuracy, Perplexity()]. |
| n_epoch | int | Yes (for fit_one_cycle) | Number of training epochs. |
| lr_max | float | No (for fit_one_cycle) | Peak learning rate for the one-cycle schedule. |
| name | str | Yes (for save_encoder) | Filename for the saved encoder weights (without extension). |
Outputs
| Name | Type | Description |
|---|---|---|
| learn | Learner | A configured Learner object with the language model, optimizer, loss function, and metrics ready for training. |
| encoder file | .pth file | Saved encoder weights (embedding + LSTM layers, excluding the output projection). Stored in path/model_dir/name.pth. |
| training metrics | dict | Per-epoch training loss, validation loss, accuracy, and perplexity logged during fit_one_cycle. |
Usage Examples
Basic Usage
from fastai.text.all import *
path = untar_data(URLs.IMDB)
# Create language model DataLoaders
dls_lm = DataBlock(
blocks=TextBlock.from_folder(path, is_lm=True),
get_items=get_text_files,
splitter=RandomSplitter(0.1)
).dataloaders(path, path=path, bs=128, seq_len=80)
# Create the language model learner
learn = language_model_learner(
dls_lm,
AWD_LSTM,
drop_mult=0.3,
metrics=[accuracy, Perplexity()]
)
# Fine-tune for 1 epoch
learn.fit_one_cycle(1, 2e-2)
Full Fine-tuning with Gradual Unfreezing
from fastai.text.all import *
path = untar_data(URLs.IMDB)
dls_lm = DataBlock(
blocks=TextBlock.from_folder(path, is_lm=True),
get_items=get_text_files,
splitter=RandomSplitter(0.1)
).dataloaders(path, path=path, bs=128, seq_len=80)
learn = language_model_learner(
dls_lm,
AWD_LSTM,
drop_mult=0.3,
metrics=[accuracy, Perplexity()]
)
# Stage 1: Train with frozen pretrained layers (only train the head)
learn.fit_one_cycle(1, 2e-2)
# Expected output:
# epoch train_loss valid_loss accuracy perplexity
# 0 4.120410 3.912788 0.299565 50.038246
# Stage 2: Unfreeze all layers and train with discriminative LR
learn.unfreeze()
learn.fit_one_cycle(10, 2e-3)
# Accuracy improves to ~0.35+, perplexity drops to ~25-30
# Save the encoder for classifier training
learn.save_encoder('finetuned')
Learning Rate Finder
from fastai.text.all import *
path = untar_data(URLs.IMDB)
dls_lm = DataBlock(
blocks=TextBlock.from_folder(path, is_lm=True),
get_items=get_text_files,
splitter=RandomSplitter(0.1)
).dataloaders(path, path=path, bs=128, seq_len=80)
learn = language_model_learner(
dls_lm, AWD_LSTM, drop_mult=0.3,
metrics=[accuracy, Perplexity()]
)
# Use lr_find to discover optimal learning rate
learn.lr_find()
# Plots a graph of loss vs. learning rate
# Choose lr_max where loss is steepest (typically 1e-2 to 2e-2)
Generating Text with the Fine-tuned Model
from fastai.text.all import *
# After training, use the model to generate text
# This verifies the model has learned domain-specific language
TEXT = "I liked this movie because"
N_WORDS = 40
N_SENTENCES = 2
print(learn.predict(TEXT, N_WORDS, temperature=0.75))
# Output: "I liked this movie because it was a great story about
# a family that had to deal with the loss of their father..."
Related Pages
Implements Principle
Requires Environment
- Environment:Fastai_Fastbook_Python_FastAI_Environment
- Environment:Fastai_Fastbook_CUDA_GPU_Environment
- Environment:Fastai_Fastbook_NLP_SpaCy_Environment