Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Fastai Fastbook Language Model Learner

From Leeroopedia


Knowledge Sources
Domains Natural Language Processing, Transfer Learning, Language Modeling
Last Updated 2026-02-09 17:00 GMT

Overview

Concrete tool for fine-tuning a pretrained language model on domain-specific text, provided by the fastai library.

Description

The language_model_learner function creates a Learner object configured for language model training. It:

  • Instantiates the specified architecture (e.g., AWD_LSTM) with the vocabulary size derived from the DataLoaders.
  • Loads pretrained weights from a Wikitext-103 pretrained model, automatically handling vocabulary mismatches between the pretrained vocabulary and the target vocabulary.
  • Applies the specified dropout multiplier to all dropout layers.
  • Configures accuracy and perplexity metrics for monitoring training progress.

The resulting Learner exposes the standard fastai training interface: fit_one_cycle for training with slanted triangular learning rates, lr_find for learning rate discovery, and save_encoder for persisting the encoder weights.

Usage

Use language_model_learner as the central step in Stage 2 of the ULMFiT pipeline. It is called after creating language model DataLoaders and before training the classifier. The typical workflow involves training for 1-2 epochs with frozen pretrained layers, then unfreezing all layers and training for additional epochs.

Code Reference

Source Location

  • Repository: fastbook
  • File: translations/cn/10_nlp.md (lines 551-611)
  • Library module: fastai.text.learner

Signature

def language_model_learner(
    dls: DataLoaders,                 # Language model DataLoaders
    arch: callable = AWD_LSTM,        # Model architecture class
    config: dict = None,              # Architecture config overrides
    drop_mult: float = 1.0,           # Multiplier for all dropout rates
    backwards: bool = False,          # Train a backwards LM
    pretrained: bool = True,          # Use pretrained weights
    pretrained_fnames: list = None,   # Custom pretrained weight file names
    loss_func: callable = None,       # Loss function (default: CrossEntropyLossFlat)
    opt_func: callable = Adam,        # Optimizer
    lr: float = 0.001,                # Base learning rate
    splitter: callable = awd_lstm_lm_split,  # Parameter group splitter
    cbs: list = None,                 # Additional callbacks
    metrics: list = None,             # Metrics to track
    path: Path = None,                # Model save path
    model_dir: str = 'models',       # Subdirectory for saved models
    wd: float = None,                 # Weight decay
    wd_bn_bias: bool = False,         # Apply weight decay to batch norm and bias
    moms: tuple = (0.95, 0.85, 0.95) # Momentum schedule
) -> Learner

# Key Learner methods:
Learner.fit_one_cycle(n_epoch, lr_max=None, div=25., div_final=1e5, pct_start=0.25, wd=None, moms=None, cbs=None, reset_opt=False)
Learner.save_encoder(name: str)
Learner.lr_find(start_lr=1e-7, end_lr=10, num_it=100, stop_div=True, show_plot=True)

Import

from fastai.text.all import language_model_learner, AWD_LSTM, Perplexity

I/O Contract

Inputs

Name Type Required Description
dls DataLoaders Yes Language model DataLoaders created by DataBlock with TextBlock(is_lm=True).
arch callable No Model architecture class. Default: AWD_LSTM. The AWD-LSTM uses 3 LSTM layers with 1,150 hidden units and 400-dimensional embeddings.
drop_mult float No Scales all dropout rates. Default: 1.0. Use lower values (0.3-0.5) for smaller datasets to reduce regularization.
pretrained bool No Whether to load Wikitext-103 pretrained weights. Default: True.
metrics list No List of metric functions to compute during training. Common choices: [accuracy, Perplexity()].
n_epoch int Yes (for fit_one_cycle) Number of training epochs.
lr_max float No (for fit_one_cycle) Peak learning rate for the one-cycle schedule.
name str Yes (for save_encoder) Filename for the saved encoder weights (without extension).

Outputs

Name Type Description
learn Learner A configured Learner object with the language model, optimizer, loss function, and metrics ready for training.
encoder file .pth file Saved encoder weights (embedding + LSTM layers, excluding the output projection). Stored in path/model_dir/name.pth.
training metrics dict Per-epoch training loss, validation loss, accuracy, and perplexity logged during fit_one_cycle.

Usage Examples

Basic Usage

from fastai.text.all import *

path = untar_data(URLs.IMDB)

# Create language model DataLoaders
dls_lm = DataBlock(
    blocks=TextBlock.from_folder(path, is_lm=True),
    get_items=get_text_files,
    splitter=RandomSplitter(0.1)
).dataloaders(path, path=path, bs=128, seq_len=80)

# Create the language model learner
learn = language_model_learner(
    dls_lm,
    AWD_LSTM,
    drop_mult=0.3,
    metrics=[accuracy, Perplexity()]
)

# Fine-tune for 1 epoch
learn.fit_one_cycle(1, 2e-2)

Full Fine-tuning with Gradual Unfreezing

from fastai.text.all import *

path = untar_data(URLs.IMDB)
dls_lm = DataBlock(
    blocks=TextBlock.from_folder(path, is_lm=True),
    get_items=get_text_files,
    splitter=RandomSplitter(0.1)
).dataloaders(path, path=path, bs=128, seq_len=80)

learn = language_model_learner(
    dls_lm,
    AWD_LSTM,
    drop_mult=0.3,
    metrics=[accuracy, Perplexity()]
)

# Stage 1: Train with frozen pretrained layers (only train the head)
learn.fit_one_cycle(1, 2e-2)
# Expected output:
# epoch  train_loss  valid_loss  accuracy  perplexity
# 0      4.120410    3.912788    0.299565  50.038246

# Stage 2: Unfreeze all layers and train with discriminative LR
learn.unfreeze()
learn.fit_one_cycle(10, 2e-3)
# Accuracy improves to ~0.35+, perplexity drops to ~25-30

# Save the encoder for classifier training
learn.save_encoder('finetuned')

Learning Rate Finder

from fastai.text.all import *

path = untar_data(URLs.IMDB)
dls_lm = DataBlock(
    blocks=TextBlock.from_folder(path, is_lm=True),
    get_items=get_text_files,
    splitter=RandomSplitter(0.1)
).dataloaders(path, path=path, bs=128, seq_len=80)

learn = language_model_learner(
    dls_lm, AWD_LSTM, drop_mult=0.3,
    metrics=[accuracy, Perplexity()]
)

# Use lr_find to discover optimal learning rate
learn.lr_find()
# Plots a graph of loss vs. learning rate
# Choose lr_max where loss is steepest (typically 1e-2 to 2e-2)

Generating Text with the Fine-tuned Model

from fastai.text.all import *

# After training, use the model to generate text
# This verifies the model has learned domain-specific language
TEXT = "I liked this movie because"
N_WORDS = 40
N_SENTENCES = 2

print(learn.predict(TEXT, N_WORDS, temperature=0.75))
# Output: "I liked this movie because it was a great story about
#          a family that had to deal with the loss of their father..."

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment