Principle:Volcengine Verl SFT Data Preparation

Knowledge Sources	verl verl SFT Documentation
Domains	Data_Engineering, Supervised_Learning, NLP
Last Updated	2026-02-07 14:00 GMT

Overview

A dataset class that loads parquet files with prompt-response pairs and applies chat template tokenization to produce training batches for supervised fine-tuning.

Description

SFT Data Preparation handles the loading and tokenization of supervised fine-tuning data. Unlike RL data preparation which stores prompts and reward config, SFT data contains explicit prompt-response pairs where the model is trained to produce the response given the prompt.

Key features:

Applies the model's chat template to format prompt and response
Creates loss masks that only compute loss on response tokens (not prompt tokens)
Supports truncation strategies (error, left, right) for sequences exceeding max length
Handles both single-turn (prompt/response columns) and multi-turn (messages column) formats

Usage

Use SFT data preparation when running supervised fine-tuning with verl.trainer.fsdp_sft_trainer. The data should be in parquet format with either:

prompt + response columns (single-turn)
messages column (multi-turn, OpenAI format)

Theoretical Basis

SFT training minimizes the cross-entropy loss only on response tokens:

$L_{S F T} = - \sum_{t \in r e s p o n s e} \log π_{θ} (y_{t} | x_{< t})$

Where the loss mask ensures prompt tokens do not contribute to the gradient:

# Abstract SFT data preparation
prompt_tokens = tokenize(chat_template(prompt))
response_tokens = tokenize(response)
input_ids = prompt_tokens + response_tokens
loss_mask = [0] * len(prompt_tokens) + [1] * len(response_tokens)

Related Pages

Implemented By

Implementation:Volcengine_Verl_SFTDataset

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment