Implementation:Lm sys FastChat Train Yuan2

Knowledge Sources	Lm_sys_FastChat FastChat Training Docs
Domains	Training, NLP
Last Updated	2026-02-07 06:00 GMT

Overview

Supervised fine-tuning pipeline for Yuan2 causal language models with advanced loss masking modes and custom token support.

Description

Train Yuan2 implements a supervised fine-tuning pipeline for the Yuan2 family of causal language models, featuring three distinct loss masking strategies and support for custom special tokens. The module extends the standard FastChat training pattern with Yuan2-specific capabilities including RoPE (Rotary Position Embedding) scaling for handling longer contexts and registration of custom tokens (<eod>, <sep>, <pad>, <mask>, and others) into the tokenizer and model embedding layer.

The central preprocessing function preprocess(sources, tokenizer, data_args) supports three mutually exclusive loss computation modes controlled by flags in DataArguments:

split_example_loss: Splits each multi-turn conversation into individual examples, one per assistant response, so the loss is computed independently for each turn. This produces more training steps per conversation but uses more memory.
efficient_loss: Keeps the full conversation as a single sequence but applies fine-grained masking so loss is computed on all assistant tokens simultaneously. This is the most memory-efficient mode for multi-turn data.
last_response_loss: Only computes loss on the final assistant response in each conversation, ignoring all prior turns. This is useful when earlier turns serve only as context and the model should focus on the last reply.

The make_supervised_data_module(tokenizer, data_args) factory loads the JSON data and constructs either a SupervisedDataset (eager) or LazySupervisedDataset (lazy) depending on configuration. The trainer_save_model_safe(trainer) utility handles safe model saving, particularly in distributed training settings where only the primary process should write to disk.

The train() entry point orchestrates the full workflow: it parses ModelArguments, DataArguments, and TrainingArguments, loads the Yuan2 model with optional RoPE scaling configuration, registers custom tokens, invokes the data module factory, instantiates the HuggingFace Trainer, runs training, and saves the final model.

Usage

Use this when fine-tuning Yuan2-series models (e.g., IEITYuan/Yuan2-2B-hf) on multi-turn conversation data. Choose the appropriate loss mode based on your training objective: last_response_loss for single-turn focus, split_example_loss for per-turn training, or efficient_loss for memory-efficient multi-turn training.

Code Reference

Source Location

Repository: Lm_sys_FastChat
File: fastchat/train/train_yuan2.py
Lines: 1-482

Key Functions

Function	Description
train()	Main entry point: loads Yuan2 model with RoPE scaling, registers custom tokens, trains with HuggingFace Trainer
preprocess(sources, tokenizer, data_args)	Tokenizes conversations and applies one of three loss masking modes (split_example_loss, efficient_loss, last_response_loss)
make_supervised_data_module(tokenizer, data_args)	Factory that loads JSON data, constructs eager or lazy dataset, and returns data module dict
trainer_save_model_safe(trainer)	Safely saves model in distributed training, ensuring only the primary process writes to disk

Dataclasses

Dataclass	Key Fields	Description
ModelArguments	model_name_or_path, trust_remote_code	Model checkpoint path and loading options
DataArguments	data_path, lazy_preprocess, last_response_loss, split_example_loss, efficient_loss	Data path and loss masking mode flags
TrainingArguments	(extends HuggingFace TrainingArguments)	Standard training configuration with additional custom fields

Loss Mode Flags

Flag	Default	Description
last_response_loss	False	Only compute loss on the final assistant response in each conversation
split_example_loss	False	Split multi-turn conversations into individual per-turn examples
efficient_loss	False	Compute loss on all assistant tokens in one pass with fine-grained masking

Custom Tokens

The training script registers these special tokens into the Yuan2 tokenizer:

custom_tokens = ["<eod>", "<sep>", "<pad>", "<mask>", ...]

Signature

def train():
    ...

Import

from fastchat.train.train_yuan2 import train

I/O Contract

Inputs

Name	Type	Required	Description
--model_name_or_path	str	Yes	HuggingFace model path for a Yuan2 checkpoint (e.g., IEITYuan/Yuan2-2B-hf)
--data_path	str	Yes	Path to training JSON data in ShareGPT conversation format
--output_dir	str	Yes	Directory for saving model checkpoints and final weights
--last_response_loss	bool	No	Enable loss computation only on the last assistant response (default: False)
--split_example_loss	bool	No	Enable per-turn example splitting for loss computation (default: False)
--efficient_loss	bool	No	Enable memory-efficient multi-turn loss masking (default: False)
--lazy_preprocess	bool	No	If set, use LazySupervisedDataset for deferred tokenization (default: False)
--num_train_epochs	int	No	Number of training epochs
--per_device_train_batch_size	int	No	Batch size per GPU device during training
--learning_rate	float	No	Peak learning rate for the optimizer

Outputs

Name	Type	Description
checkpoints	Files	Model checkpoints saved in output_dir at configured intervals
final_model	Files	Final Yuan2 model weights with custom token embeddings and tokenizer
trainer_state	JSON	Training state including loss curves, learning rate schedule, and metrics

Usage Examples

# Fine-tune Yuan2-2B with efficient loss mode on 4 GPUs
torchrun --nproc_per_node=4 -m fastchat.train.train_yuan2 \
    --model_name_or_path IEITYuan/Yuan2-2B-hf \
    --data_path data/dummy_conversation.json \
    --output_dir ./output_yuan2 \
    --num_train_epochs 3 \
    --per_device_train_batch_size 2 \
    --learning_rate 2e-5 \
    --bf16 True \
    --efficient_loss True

# Fine-tune Yuan2 with last-response-only loss
torchrun --nproc_per_node=4 -m fastchat.train.train_yuan2 \
    --model_name_or_path IEITYuan/Yuan2-2B-hf \
    --data_path data/dummy_conversation.json \
    --output_dir ./output_yuan2_last \
    --num_train_epochs 3 \
    --last_response_loss True

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment