Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Volcengine Verl SFT Data Preparation

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Supervised_Learning, NLP
Last Updated 2026-02-07 14:00 GMT

Overview

A dataset class that loads parquet files with prompt-response pairs and applies chat template tokenization to produce training batches for supervised fine-tuning.

Description

SFT Data Preparation handles the loading and tokenization of supervised fine-tuning data. Unlike RL data preparation which stores prompts and reward config, SFT data contains explicit prompt-response pairs where the model is trained to produce the response given the prompt.

Key features:

  • Applies the model's chat template to format prompt and response
  • Creates loss masks that only compute loss on response tokens (not prompt tokens)
  • Supports truncation strategies (error, left, right) for sequences exceeding max length
  • Handles both single-turn (prompt/response columns) and multi-turn (messages column) formats

Usage

Use SFT data preparation when running supervised fine-tuning with verl.trainer.fsdp_sft_trainer. The data should be in parquet format with either:

  • prompt + response columns (single-turn)
  • messages column (multi-turn, OpenAI format)

Theoretical Basis

SFT training minimizes the cross-entropy loss only on response tokens:

LSFT=tresponselogπθ(yt|x<t)

Where the loss mask ensures prompt tokens do not contribute to the gradient:

# Abstract SFT data preparation
prompt_tokens = tokenize(chat_template(prompt))
response_tokens = tokenize(response)
input_ids = prompt_tokens + response_tokens
loss_mask = [0] * len(prompt_tokens) + [1] * len(response_tokens)

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment