Implementation:Fastai Fastbook Language Model Learner

Knowledge Sources	fastbook fastai docs fastai.text.learner
Domains	Natural Language Processing, Transfer Learning, Language Modeling
Last Updated	2026-02-09 17:00 GMT

Overview

Concrete tool for fine-tuning a pretrained language model on domain-specific text, provided by the fastai library.

Description

The language_model_learner function creates a Learner object configured for language model training. It:

Instantiates the specified architecture (e.g., AWD_LSTM) with the vocabulary size derived from the DataLoaders.
Loads pretrained weights from a Wikitext-103 pretrained model, automatically handling vocabulary mismatches between the pretrained vocabulary and the target vocabulary.
Applies the specified dropout multiplier to all dropout layers.
Configures accuracy and perplexity metrics for monitoring training progress.

The resulting Learner exposes the standard fastai training interface: fit_one_cycle for training with slanted triangular learning rates, lr_find for learning rate discovery, and save_encoder for persisting the encoder weights.

Usage

Use language_model_learner as the central step in Stage 2 of the ULMFiT pipeline. It is called after creating language model DataLoaders and before training the classifier. The typical workflow involves training for 1-2 epochs with frozen pretrained layers, then unfreezing all layers and training for additional epochs.

Code Reference

Source Location

Repository: fastbook
File: translations/cn/10_nlp.md (lines 551-611)
Library module: fastai.text.learner

Signature

def language_model_learner(
    dls: DataLoaders,                 # Language model DataLoaders
    arch: callable = AWD_LSTM,        # Model architecture class
    config: dict = None,              # Architecture config overrides
    drop_mult: float = 1.0,           # Multiplier for all dropout rates
    backwards: bool = False,          # Train a backwards LM
    pretrained: bool = True,          # Use pretrained weights
    pretrained_fnames: list = None,   # Custom pretrained weight file names
    loss_func: callable = None,       # Loss function (default: CrossEntropyLossFlat)
    opt_func: callable = Adam,        # Optimizer
    lr: float = 0.001,                # Base learning rate
    splitter: callable = awd_lstm_lm_split,  # Parameter group splitter
    cbs: list = None,                 # Additional callbacks
    metrics: list = None,             # Metrics to track
    path: Path = None,                # Model save path
    model_dir: str = 'models',       # Subdirectory for saved models
    wd: float = None,                 # Weight decay
    wd_bn_bias: bool = False,         # Apply weight decay to batch norm and bias
    moms: tuple = (0.95, 0.85, 0.95) # Momentum schedule
) -> Learner

# Key Learner methods:
Learner.fit_one_cycle(n_epoch, lr_max=None, div=25., div_final=1e5, pct_start=0.25, wd=None, moms=None, cbs=None, reset_opt=False)
Learner.save_encoder(name: str)
Learner.lr_find(start_lr=1e-7, end_lr=10, num_it=100, stop_div=True, show_plot=True)

Import

from fastai.text.all import language_model_learner, AWD_LSTM, Perplexity

I/O Contract

Inputs

Name	Type	Required	Description
dls	DataLoaders	Yes	Language model DataLoaders created by DataBlock with TextBlock(is_lm=True).
arch	callable	No	Model architecture class. Default: AWD_LSTM. The AWD-LSTM uses 3 LSTM layers with 1,150 hidden units and 400-dimensional embeddings.
drop_mult	float	No	Scales all dropout rates. Default: 1.0. Use lower values (0.3-0.5) for smaller datasets to reduce regularization.
pretrained	bool	No	Whether to load Wikitext-103 pretrained weights. Default: True.
metrics	list	No	List of metric functions to compute during training. Common choices: [accuracy, Perplexity()].
n_epoch	int	Yes (for fit_one_cycle)	Number of training epochs.
lr_max	float	No (for fit_one_cycle)	Peak learning rate for the one-cycle schedule.
name	str	Yes (for save_encoder)	Filename for the saved encoder weights (without extension).

Outputs

Name	Type	Description
learn	Learner	A configured Learner object with the language model, optimizer, loss function, and metrics ready for training.
encoder file	.pth file	Saved encoder weights (embedding + LSTM layers, excluding the output projection). Stored in path/model_dir/name.pth.
training metrics	dict	Per-epoch training loss, validation loss, accuracy, and perplexity logged during fit_one_cycle.

Usage Examples

Basic Usage

from fastai.text.all import *

path = untar_data(URLs.IMDB)

# Create language model DataLoaders
dls_lm = DataBlock(
    blocks=TextBlock.from_folder(path, is_lm=True),
    get_items=get_text_files,
    splitter=RandomSplitter(0.1)
).dataloaders(path, path=path, bs=128, seq_len=80)

# Create the language model learner
learn = language_model_learner(
    dls_lm,
    AWD_LSTM,
    drop_mult=0.3,
    metrics=[accuracy, Perplexity()]
)

# Fine-tune for 1 epoch
learn.fit_one_cycle(1, 2e-2)

Full Fine-tuning with Gradual Unfreezing

from fastai.text.all import *

path = untar_data(URLs.IMDB)
dls_lm = DataBlock(
    blocks=TextBlock.from_folder(path, is_lm=True),
    get_items=get_text_files,
    splitter=RandomSplitter(0.1)
).dataloaders(path, path=path, bs=128, seq_len=80)

learn = language_model_learner(
    dls_lm,
    AWD_LSTM,
    drop_mult=0.3,
    metrics=[accuracy, Perplexity()]
)

# Stage 1: Train with frozen pretrained layers (only train the head)
learn.fit_one_cycle(1, 2e-2)
# Expected output:
# epoch  train_loss  valid_loss  accuracy  perplexity
# 0      4.120410    3.912788    0.299565  50.038246

# Stage 2: Unfreeze all layers and train with discriminative LR
learn.unfreeze()
learn.fit_one_cycle(10, 2e-3)
# Accuracy improves to ~0.35+, perplexity drops to ~25-30

# Save the encoder for classifier training
learn.save_encoder('finetuned')

Learning Rate Finder

from fastai.text.all import *

path = untar_data(URLs.IMDB)
dls_lm = DataBlock(
    blocks=TextBlock.from_folder(path, is_lm=True),
    get_items=get_text_files,
    splitter=RandomSplitter(0.1)
).dataloaders(path, path=path, bs=128, seq_len=80)

learn = language_model_learner(
    dls_lm, AWD_LSTM, drop_mult=0.3,
    metrics=[accuracy, Perplexity()]
)

# Use lr_find to discover optimal learning rate
learn.lr_find()
# Plots a graph of loss vs. learning rate
# Choose lr_max where loss is steepest (typically 1e-2 to 2e-2)

Generating Text with the Fine-tuned Model

from fastai.text.all import *

# After training, use the model to generate text
# This verifies the model has learned domain-specific language
TEXT = "I liked this movie because"
N_WORDS = 40
N_SENTENCES = 2

print(learn.predict(TEXT, N_WORDS, temperature=0.75))
# Output: "I liked this movie because it was a great story about
#          a family that had to deal with the loss of their father..."

Related Pages

Implements Principle

Principle:Fastai_Fastbook_Language_Model_Fine_Tuning

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment