Principle:Volcengine Verl SFT Data Preparation
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Supervised_Learning, NLP |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
A dataset class that loads parquet files with prompt-response pairs and applies chat template tokenization to produce training batches for supervised fine-tuning.
Description
SFT Data Preparation handles the loading and tokenization of supervised fine-tuning data. Unlike RL data preparation which stores prompts and reward config, SFT data contains explicit prompt-response pairs where the model is trained to produce the response given the prompt.
Key features:
- Applies the model's chat template to format prompt and response
- Creates loss masks that only compute loss on response tokens (not prompt tokens)
- Supports truncation strategies (error, left, right) for sequences exceeding max length
- Handles both single-turn (prompt/response columns) and multi-turn (messages column) formats
Usage
Use SFT data preparation when running supervised fine-tuning with verl.trainer.fsdp_sft_trainer. The data should be in parquet format with either:
prompt+responsecolumns (single-turn)messagescolumn (multi-turn, OpenAI format)
Theoretical Basis
SFT training minimizes the cross-entropy loss only on response tokens:
Where the loss mask ensures prompt tokens do not contribute to the gradient:
# Abstract SFT data preparation
prompt_tokens = tokenize(chat_template(prompt))
response_tokens = tokenize(response)
input_ids = prompt_tokens + response_tokens
loss_mask = [0] * len(prompt_tokens) + [1] * len(response_tokens)