Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Alibaba ROLL Distillation Dataset Preparation

From Leeroopedia


Knowledge Sources
Domains Data_Processing, Knowledge_Distillation
Last Updated 2026-02-07 20:00 GMT

Overview

A data preprocessing principle for preparing instruction-response data for knowledge distillation with optional prompt-inclusive distillation.

Description

Distillation Dataset Preparation follows the SFT data pipeline with an additional option: distill_on_prompt. When enabled, prompt tokens are also included in the distillation loss computation (not masked). This can improve distillation quality by teaching the student to match the teacher's prompt representations.

Usage

Use when preparing data for knowledge distillation training.

Theoretical Basis

When distill_on_prompt is True, all tokens contribute to the distillation loss. When False, only response tokens contribute (prompt masked with -100).

Related Pages

Implemented By

Related Heuristics

No specific heuristics inform this principle.

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment