Implementation:Hiyouga LLaMA Factory Pairwise Processor
| Knowledge Sources | |
|---|---|
| Domains | Data Processing, Preference Learning |
| Last Updated | 2026-02-06 19:00 GMT |
Overview
Dataset processor for pairwise preference training (DPO/reward modeling) that encodes chosen and rejected response pairs against a shared prompt.
Description
The PairwiseDatasetProcessor class extends DatasetProcessor to prepare training data for DPO (Direct Preference Optimization) and reward model training. For each example, it encodes both the chosen response (index 0) and rejected response (index 1) against the same prompt, sharing the tokenized prompt IDs. Sequence truncation via infer_seqlen uses the longer of the two responses to determine the cutoff budget. Prompt tokens are masked with IGNORE_INDEX in both chosen and rejected label sequences, ensuring the loss is computed only on response tokens.
Usage
Use this processor when preparing datasets for DPO or reward model training stages. It is selected automatically by the data loading pipeline when the training configuration specifies a pairwise preference task. Each input example must contain at least two responses (chosen and rejected).
Code Reference
Source Location
- Repository: Hiyouga_LLaMA_Factory
- File: src/llamafactory/data/processor/pairwise.py
- Lines: 1-118
Signature
class PairwiseDatasetProcessor(DatasetProcessor):
def _encode_data_example(
self,
prompt: list[dict[str, str]],
response: list[dict[str, str]],
system: Optional[str],
tools: Optional[str],
images: list["ImageInput"],
videos: list["VideoInput"],
audios: list["AudioInput"],
) -> tuple[list[int], list[int], list[int], list[int]]
def preprocess_dataset(self, examples: dict[str, list[Any]]) -> dict[str, list[Any]]
def print_data_example(self, example: dict[str, list[int]]) -> None
Import
from llamafactory.data.processor.pairwise import PairwiseDatasetProcessor
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| examples | dict[str, list[Any]] |
Yes | Batch of raw examples with keys _prompt, _response, _system, _tools, _images, _videos, _audios |
| _prompt[i] | list[dict[str, str]] |
Yes | Conversation prompt messages (must have odd length) |
| _response[i] | list[dict[str, str]] |
Yes | Response pair: index 0 is chosen, index 1 is rejected (must have at least 2 entries) |
Outputs
| Name | Type | Description |
|---|---|---|
| chosen_input_ids | list[list[int]] |
Tokenized input sequences for the chosen response path |
| chosen_attention_mask | list[list[int]] |
Attention masks for chosen sequences (all ones) |
| chosen_labels | list[list[int]] |
Labels for chosen sequences with prompt tokens masked as IGNORE_INDEX |
| rejected_input_ids | list[list[int]] |
Tokenized input sequences for the rejected response path |
| rejected_attention_mask | list[list[int]] |
Attention masks for rejected sequences (all ones) |
| rejected_labels | list[list[int]] |
Labels for rejected sequences with prompt tokens masked as IGNORE_INDEX |
Usage Examples
from llamafactory.data.processor.pairwise import PairwiseDatasetProcessor
# Instantiate with required dependencies
processor = PairwiseDatasetProcessor(
template=template,
tokenizer=tokenizer,
processor=None,
data_args=data_args,
)
# Preprocess a batch of examples
model_inputs = processor.preprocess_dataset(examples)
# model_inputs keys: chosen_input_ids, chosen_labels, rejected_input_ids, rejected_labels, etc.
# Debug: print chosen and rejected for a single example
processor.print_data_example(model_inputs[0])
Related Pages
- Hiyouga_LLaMA_Factory_Processor_Utils - Provides the DatasetProcessor base class and infer_seqlen utility
- Hiyouga_LLaMA_Factory_Feedback_Processor - Alternative processor for KTO-style feedback training
- Hiyouga_LLaMA_Factory_Supervised_Processor - Processor for standard supervised fine-tuning
- Hiyouga_LLaMA_Factory_Data_Args - DataArguments controlling cutoff_len and processing parameters