Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Alibaba ROLL DPO Get Encode Function

From Leeroopedia


Knowledge Sources
Domains Data_Processing, Alignment
Last Updated 2026-02-07 20:00 GMT

Overview

Concrete preference data encoding and collation functions for DPO training provided by the Alibaba ROLL library.

Description

The get_encode_function creates a callable that encodes chosen/rejected pairs using chat templates. DataCollatorWithPaddingForDPO interleaves and pads chosen/rejected sequences into training batches.

Usage

Called during DPO pipeline initialization to prepare the training dataloader.

Code Reference

Source Location

  • Repository: Alibaba ROLL
  • File: roll/pipeline/dpo/dpo_pipeline.py
  • Lines: L29-91

Signature

def get_encode_function(
    template_name: str,
    tokenizer,
    chosen_key: str,
    rejected_key: str
) -> Callable:
    """
    Create encoding function for DPO data.

    Args:
        template_name: Chat template name
        tokenizer: Tokenizer instance
        chosen_key: Dataset key for chosen responses
        rejected_key: Dataset key for rejected responses

    Returns:
        Callable that encodes chosen/rejected pairs
    """

@dataclass
class DataCollatorWithPaddingForDPO:
    tokenizer: PreTrainedTokenizerBase
    max_length: Optional[int] = None
    return_tensors: str = "pt"

    def __call__(self, batch: List[Dict]) -> Dict[str, Any]:
        """Collate batch with interleaved chosen/rejected sequences."""

Import

from roll.pipeline.dpo.dpo_pipeline import get_encode_function
from roll.datasets.collator import DataCollatorWithPaddingForDPO

I/O Contract

Inputs

Name Type Required Description
dataset datasets.Dataset Yes Preference dataset with chosen/rejected pairs
tokenizer PreTrainedTokenizer Yes Model tokenizer

Outputs

Name Type Description
DataLoader DataLoader Batches of (2*B, seq_len) with interleaved chosen/rejected

Usage Examples

from roll.pipeline.dpo.dpo_pipeline import get_encode_function, preprocess_dataset

encode_fn = get_encode_function("qwen2_5", tokenizer, "chosen", "rejected")
processed = preprocess_dataset(dataset, prompt_len=1024, encode_function=encode_fn, num_proc=4)

Related Pages

Implements Principle

Requires Environment

Environment Dependencies

This implementation requires the following environment constraints:

Heuristics Applied

No specific heuristics apply to this implementation.

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment