Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Lm sys FastChat Train Yuan2

From Leeroopedia


Knowledge Sources
Domains Training, NLP
Last Updated 2026-02-07 06:00 GMT

Overview

Supervised fine-tuning pipeline for Yuan2 causal language models with advanced loss masking modes and custom token support.

Description

Train Yuan2 implements a supervised fine-tuning pipeline for the Yuan2 family of causal language models, featuring three distinct loss masking strategies and support for custom special tokens. The module extends the standard FastChat training pattern with Yuan2-specific capabilities including RoPE (Rotary Position Embedding) scaling for handling longer contexts and registration of custom tokens (<eod>, <sep>, <pad>, <mask>, and others) into the tokenizer and model embedding layer.

The central preprocessing function preprocess(sources, tokenizer, data_args) supports three mutually exclusive loss computation modes controlled by flags in DataArguments:

  • split_example_loss: Splits each multi-turn conversation into individual examples, one per assistant response, so the loss is computed independently for each turn. This produces more training steps per conversation but uses more memory.
  • efficient_loss: Keeps the full conversation as a single sequence but applies fine-grained masking so loss is computed on all assistant tokens simultaneously. This is the most memory-efficient mode for multi-turn data.
  • last_response_loss: Only computes loss on the final assistant response in each conversation, ignoring all prior turns. This is useful when earlier turns serve only as context and the model should focus on the last reply.

The make_supervised_data_module(tokenizer, data_args) factory loads the JSON data and constructs either a SupervisedDataset (eager) or LazySupervisedDataset (lazy) depending on configuration. The trainer_save_model_safe(trainer) utility handles safe model saving, particularly in distributed training settings where only the primary process should write to disk.

The train() entry point orchestrates the full workflow: it parses ModelArguments, DataArguments, and TrainingArguments, loads the Yuan2 model with optional RoPE scaling configuration, registers custom tokens, invokes the data module factory, instantiates the HuggingFace Trainer, runs training, and saves the final model.

Usage

Use this when fine-tuning Yuan2-series models (e.g., IEITYuan/Yuan2-2B-hf) on multi-turn conversation data. Choose the appropriate loss mode based on your training objective: last_response_loss for single-turn focus, split_example_loss for per-turn training, or efficient_loss for memory-efficient multi-turn training.

Code Reference

Source Location

Key Functions

Function Description
train() Main entry point: loads Yuan2 model with RoPE scaling, registers custom tokens, trains with HuggingFace Trainer
preprocess(sources, tokenizer, data_args) Tokenizes conversations and applies one of three loss masking modes (split_example_loss, efficient_loss, last_response_loss)
make_supervised_data_module(tokenizer, data_args) Factory that loads JSON data, constructs eager or lazy dataset, and returns data module dict
trainer_save_model_safe(trainer) Safely saves model in distributed training, ensuring only the primary process writes to disk

Dataclasses

Dataclass Key Fields Description
ModelArguments model_name_or_path, trust_remote_code Model checkpoint path and loading options
DataArguments data_path, lazy_preprocess, last_response_loss, split_example_loss, efficient_loss Data path and loss masking mode flags
TrainingArguments (extends HuggingFace TrainingArguments) Standard training configuration with additional custom fields

Loss Mode Flags

Flag Default Description
last_response_loss False Only compute loss on the final assistant response in each conversation
split_example_loss False Split multi-turn conversations into individual per-turn examples
efficient_loss False Compute loss on all assistant tokens in one pass with fine-grained masking

Custom Tokens

The training script registers these special tokens into the Yuan2 tokenizer:

custom_tokens = ["<eod>", "<sep>", "<pad>", "<mask>", ...]

Signature

def train():
    ...

Import

from fastchat.train.train_yuan2 import train

I/O Contract

Inputs

Name Type Required Description
--model_name_or_path str Yes HuggingFace model path for a Yuan2 checkpoint (e.g., IEITYuan/Yuan2-2B-hf)
--data_path str Yes Path to training JSON data in ShareGPT conversation format
--output_dir str Yes Directory for saving model checkpoints and final weights
--last_response_loss bool No Enable loss computation only on the last assistant response (default: False)
--split_example_loss bool No Enable per-turn example splitting for loss computation (default: False)
--efficient_loss bool No Enable memory-efficient multi-turn loss masking (default: False)
--lazy_preprocess bool No If set, use LazySupervisedDataset for deferred tokenization (default: False)
--num_train_epochs int No Number of training epochs
--per_device_train_batch_size int No Batch size per GPU device during training
--learning_rate float No Peak learning rate for the optimizer

Outputs

Name Type Description
checkpoints Files Model checkpoints saved in output_dir at configured intervals
final_model Files Final Yuan2 model weights with custom token embeddings and tokenizer
trainer_state JSON Training state including loss curves, learning rate schedule, and metrics

Usage Examples

# Fine-tune Yuan2-2B with efficient loss mode on 4 GPUs
torchrun --nproc_per_node=4 -m fastchat.train.train_yuan2 \
    --model_name_or_path IEITYuan/Yuan2-2B-hf \
    --data_path data/dummy_conversation.json \
    --output_dir ./output_yuan2 \
    --num_train_epochs 3 \
    --per_device_train_batch_size 2 \
    --learning_rate 2e-5 \
    --bf16 True \
    --efficient_loss True
# Fine-tune Yuan2 with last-response-only loss
torchrun --nproc_per_node=4 -m fastchat.train.train_yuan2 \
    --model_name_or_path IEITYuan/Yuan2-2B-hf \
    --data_path data/dummy_conversation.json \
    --output_dir ./output_yuan2_last \
    --num_train_epochs 3 \
    --last_response_loss True

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment