Implementation:Alibaba ROLL DistillConfig
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Knowledge_Distillation, Configuration |
| Last Updated | 2026-02-07 20:00 GMT |
Overview
Concrete knowledge distillation configuration dataclass provided by the Alibaba ROLL library.
Description
The DistillConfig class extends BaseConfig with distillation-specific parameters for teacher-student training.
Usage
Loaded from YAML via Hydra for distillation pipelines.
Code Reference
Source Location
- Repository: Alibaba ROLL
- File: roll/pipeline/distill/distill_config.py
- Lines: L12-143
Signature
@dataclass
class DistillConfig(BaseConfig):
"""
Attributes:
student_pretrain: str - student model path
teacher_pretrain: str - teacher model path
student: WorkerConfig - student worker configuration
teacher: WorkerConfig - teacher worker configuration
kd_objective: str - "forward_kl"/"reverse_kl"/"adaptive_kl"/"skewed_forward_kl"/"skewed_reverse_kl"/"js"
kd_temperature: float = 1 - student softmax temperature
teacher_temperature: float = 1 - teacher softmax temperature
distill_loss_weight: float = 0.5 - weight for distillation loss
logits_topk: int = 64 - top-k teacher logits
logits_transfer_backend: str = "ipc+nccl" - transfer backend
distill_on_prompt: bool = False - include prompt in distillation
"""
Import
from roll.pipeline.distill.distill_config import DistillConfig
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| YAML config file | str | Yes | Hydra-managed YAML |
Outputs
| Name | Type | Description |
|---|---|---|
| DistillConfig | DistillConfig | Config with student and teacher WorkerConfigs |
Usage Examples
from hydra import compose, initialize
import dacite
from omegaconf import OmegaConf
initialize(config_path="examples/qwen2.5-7B-distill_megatron")
cfg = compose(config_name="distill_megatron")
config = dacite.from_dict(data_class=DistillConfig, data=OmegaConf.to_container(cfg, resolve=True))
print(config.kd_objective) # "forward_kl"
print(config.distill_loss_weight) # 0.5
Related Pages
Implements Principle
Requires Environment
Environment Dependencies
This implementation requires the following environment constraints:
Heuristics Applied
No specific heuristics apply to this implementation.
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment