Implementation:Lm sys FastChat Train Yuan2
| Knowledge Sources | |
|---|---|
| Domains | Training, NLP |
| Last Updated | 2026-02-07 06:00 GMT |
Overview
Supervised fine-tuning pipeline for Yuan2 causal language models with advanced loss masking modes and custom token support.
Description
Train Yuan2 implements a supervised fine-tuning pipeline for the Yuan2 family of causal language models, featuring three distinct loss masking strategies and support for custom special tokens. The module extends the standard FastChat training pattern with Yuan2-specific capabilities including RoPE (Rotary Position Embedding) scaling for handling longer contexts and registration of custom tokens (<eod>, <sep>, <pad>, <mask>, and others) into the tokenizer and model embedding layer.
The central preprocessing function preprocess(sources, tokenizer, data_args) supports three mutually exclusive loss computation modes controlled by flags in DataArguments:
- split_example_loss: Splits each multi-turn conversation into individual examples, one per assistant response, so the loss is computed independently for each turn. This produces more training steps per conversation but uses more memory.
- efficient_loss: Keeps the full conversation as a single sequence but applies fine-grained masking so loss is computed on all assistant tokens simultaneously. This is the most memory-efficient mode for multi-turn data.
- last_response_loss: Only computes loss on the final assistant response in each conversation, ignoring all prior turns. This is useful when earlier turns serve only as context and the model should focus on the last reply.
The make_supervised_data_module(tokenizer, data_args) factory loads the JSON data and constructs either a SupervisedDataset (eager) or LazySupervisedDataset (lazy) depending on configuration. The trainer_save_model_safe(trainer) utility handles safe model saving, particularly in distributed training settings where only the primary process should write to disk.
The train() entry point orchestrates the full workflow: it parses ModelArguments, DataArguments, and TrainingArguments, loads the Yuan2 model with optional RoPE scaling configuration, registers custom tokens, invokes the data module factory, instantiates the HuggingFace Trainer, runs training, and saves the final model.
Usage
Use this when fine-tuning Yuan2-series models (e.g., IEITYuan/Yuan2-2B-hf) on multi-turn conversation data. Choose the appropriate loss mode based on your training objective: last_response_loss for single-turn focus, split_example_loss for per-turn training, or efficient_loss for memory-efficient multi-turn training.
Code Reference
Source Location
- Repository: Lm_sys_FastChat
- File: fastchat/train/train_yuan2.py
- Lines: 1-482
Key Functions
| Function | Description |
|---|---|
| train() | Main entry point: loads Yuan2 model with RoPE scaling, registers custom tokens, trains with HuggingFace Trainer |
| preprocess(sources, tokenizer, data_args) | Tokenizes conversations and applies one of three loss masking modes (split_example_loss, efficient_loss, last_response_loss) |
| make_supervised_data_module(tokenizer, data_args) | Factory that loads JSON data, constructs eager or lazy dataset, and returns data module dict |
| trainer_save_model_safe(trainer) | Safely saves model in distributed training, ensuring only the primary process writes to disk |
Dataclasses
| Dataclass | Key Fields | Description |
|---|---|---|
| ModelArguments | model_name_or_path, trust_remote_code | Model checkpoint path and loading options |
| DataArguments | data_path, lazy_preprocess, last_response_loss, split_example_loss, efficient_loss | Data path and loss masking mode flags |
| TrainingArguments | (extends HuggingFace TrainingArguments) | Standard training configuration with additional custom fields |
Loss Mode Flags
| Flag | Default | Description |
|---|---|---|
| last_response_loss | False | Only compute loss on the final assistant response in each conversation |
| split_example_loss | False | Split multi-turn conversations into individual per-turn examples |
| efficient_loss | False | Compute loss on all assistant tokens in one pass with fine-grained masking |
Custom Tokens
The training script registers these special tokens into the Yuan2 tokenizer:
custom_tokens = ["<eod>", "<sep>", "<pad>", "<mask>", ...]
Signature
def train():
...
Import
from fastchat.train.train_yuan2 import train
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| --model_name_or_path | str | Yes | HuggingFace model path for a Yuan2 checkpoint (e.g., IEITYuan/Yuan2-2B-hf) |
| --data_path | str | Yes | Path to training JSON data in ShareGPT conversation format |
| --output_dir | str | Yes | Directory for saving model checkpoints and final weights |
| --last_response_loss | bool | No | Enable loss computation only on the last assistant response (default: False) |
| --split_example_loss | bool | No | Enable per-turn example splitting for loss computation (default: False) |
| --efficient_loss | bool | No | Enable memory-efficient multi-turn loss masking (default: False) |
| --lazy_preprocess | bool | No | If set, use LazySupervisedDataset for deferred tokenization (default: False) |
| --num_train_epochs | int | No | Number of training epochs |
| --per_device_train_batch_size | int | No | Batch size per GPU device during training |
| --learning_rate | float | No | Peak learning rate for the optimizer |
Outputs
| Name | Type | Description |
|---|---|---|
| checkpoints | Files | Model checkpoints saved in output_dir at configured intervals |
| final_model | Files | Final Yuan2 model weights with custom token embeddings and tokenizer |
| trainer_state | JSON | Training state including loss curves, learning rate schedule, and metrics |
Usage Examples
# Fine-tune Yuan2-2B with efficient loss mode on 4 GPUs
torchrun --nproc_per_node=4 -m fastchat.train.train_yuan2 \
--model_name_or_path IEITYuan/Yuan2-2B-hf \
--data_path data/dummy_conversation.json \
--output_dir ./output_yuan2 \
--num_train_epochs 3 \
--per_device_train_batch_size 2 \
--learning_rate 2e-5 \
--bf16 True \
--efficient_loss True
# Fine-tune Yuan2 with last-response-only loss
torchrun --nproc_per_node=4 -m fastchat.train.train_yuan2 \
--model_name_or_path IEITYuan/Yuan2-2B-hf \
--data_path data/dummy_conversation.json \
--output_dir ./output_yuan2_last \
--num_train_epochs 3 \
--last_response_loss True
Related Pages
- Principle:Lm_sys_FastChat_Distributed_SFT_Training
- Implements: Principle:Lm_sys_FastChat_Distributed_SFT_Training
- Environment:Lm_sys_FastChat_SFT_Training_Environment
- Heuristic:Lm_sys_FastChat_Vicuna_SFT_Training_Hyperparameters
- Implementation:Lm_sys_FastChat_Trainer_Save_Model_Safe
- Implementation:Lm_sys_FastChat_Make_Supervised_Data_Module