Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Alibaba ROLL DistillConfig

From Leeroopedia


Knowledge Sources
Domains Knowledge_Distillation, Configuration
Last Updated 2026-02-07 20:00 GMT

Overview

Concrete knowledge distillation configuration dataclass provided by the Alibaba ROLL library.

Description

The DistillConfig class extends BaseConfig with distillation-specific parameters for teacher-student training.

Usage

Loaded from YAML via Hydra for distillation pipelines.

Code Reference

Source Location

  • Repository: Alibaba ROLL
  • File: roll/pipeline/distill/distill_config.py
  • Lines: L12-143

Signature

@dataclass
class DistillConfig(BaseConfig):
    """
    Attributes:
        student_pretrain: str - student model path
        teacher_pretrain: str - teacher model path
        student: WorkerConfig - student worker configuration
        teacher: WorkerConfig - teacher worker configuration
        kd_objective: str - "forward_kl"/"reverse_kl"/"adaptive_kl"/"skewed_forward_kl"/"skewed_reverse_kl"/"js"
        kd_temperature: float = 1 - student softmax temperature
        teacher_temperature: float = 1 - teacher softmax temperature
        distill_loss_weight: float = 0.5 - weight for distillation loss
        logits_topk: int = 64 - top-k teacher logits
        logits_transfer_backend: str = "ipc+nccl" - transfer backend
        distill_on_prompt: bool = False - include prompt in distillation
    """

Import

from roll.pipeline.distill.distill_config import DistillConfig

I/O Contract

Inputs

Name Type Required Description
YAML config file str Yes Hydra-managed YAML

Outputs

Name Type Description
DistillConfig DistillConfig Config with student and teacher WorkerConfigs

Usage Examples

from hydra import compose, initialize
import dacite
from omegaconf import OmegaConf

initialize(config_path="examples/qwen2.5-7B-distill_megatron")
cfg = compose(config_name="distill_megatron")
config = dacite.from_dict(data_class=DistillConfig, data=OmegaConf.to_container(cfg, resolve=True))
print(config.kd_objective)       # "forward_kl"
print(config.distill_loss_weight) # 0.5

Related Pages

Implements Principle

Requires Environment

Environment Dependencies

This implementation requires the following environment constraints:

Heuristics Applied

No specific heuristics apply to this implementation.

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment