Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Hiyouga LLaMA Factory Pairwise Processor

From Leeroopedia
Revision as of 15:06, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Hiyouga_LLaMA_Factory_Pairwise_Processor.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Data Processing, Preference Learning
Last Updated 2026-02-06 19:00 GMT

Overview

Dataset processor for pairwise preference training (DPO/reward modeling) that encodes chosen and rejected response pairs against a shared prompt.

Description

The PairwiseDatasetProcessor class extends DatasetProcessor to prepare training data for DPO (Direct Preference Optimization) and reward model training. For each example, it encodes both the chosen response (index 0) and rejected response (index 1) against the same prompt, sharing the tokenized prompt IDs. Sequence truncation via infer_seqlen uses the longer of the two responses to determine the cutoff budget. Prompt tokens are masked with IGNORE_INDEX in both chosen and rejected label sequences, ensuring the loss is computed only on response tokens.

Usage

Use this processor when preparing datasets for DPO or reward model training stages. It is selected automatically by the data loading pipeline when the training configuration specifies a pairwise preference task. Each input example must contain at least two responses (chosen and rejected).

Code Reference

Source Location

Signature

class PairwiseDatasetProcessor(DatasetProcessor):
    def _encode_data_example(
        self,
        prompt: list[dict[str, str]],
        response: list[dict[str, str]],
        system: Optional[str],
        tools: Optional[str],
        images: list["ImageInput"],
        videos: list["VideoInput"],
        audios: list["AudioInput"],
    ) -> tuple[list[int], list[int], list[int], list[int]]

    def preprocess_dataset(self, examples: dict[str, list[Any]]) -> dict[str, list[Any]]

    def print_data_example(self, example: dict[str, list[int]]) -> None

Import

from llamafactory.data.processor.pairwise import PairwiseDatasetProcessor

I/O Contract

Inputs

Name Type Required Description
examples dict[str, list[Any]] Yes Batch of raw examples with keys _prompt, _response, _system, _tools, _images, _videos, _audios
_prompt[i] list[dict[str, str]] Yes Conversation prompt messages (must have odd length)
_response[i] list[dict[str, str]] Yes Response pair: index 0 is chosen, index 1 is rejected (must have at least 2 entries)

Outputs

Name Type Description
chosen_input_ids list[list[int]] Tokenized input sequences for the chosen response path
chosen_attention_mask list[list[int]] Attention masks for chosen sequences (all ones)
chosen_labels list[list[int]] Labels for chosen sequences with prompt tokens masked as IGNORE_INDEX
rejected_input_ids list[list[int]] Tokenized input sequences for the rejected response path
rejected_attention_mask list[list[int]] Attention masks for rejected sequences (all ones)
rejected_labels list[list[int]] Labels for rejected sequences with prompt tokens masked as IGNORE_INDEX

Usage Examples

from llamafactory.data.processor.pairwise import PairwiseDatasetProcessor

# Instantiate with required dependencies
processor = PairwiseDatasetProcessor(
    template=template,
    tokenizer=tokenizer,
    processor=None,
    data_args=data_args,
)

# Preprocess a batch of examples
model_inputs = processor.preprocess_dataset(examples)
# model_inputs keys: chosen_input_ids, chosen_labels, rejected_input_ids, rejected_labels, etc.

# Debug: print chosen and rejected for a single example
processor.print_data_example(model_inputs[0])

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment