Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Hiyouga LLaMA Factory Feedback Processor

From Leeroopedia


Knowledge Sources
Domains Data Processing, Preference Learning
Last Updated 2026-02-06 19:00 GMT

Overview

Dataset processor for KTO (Kahneman-Tversky Optimization) training that encodes examples into paired target and KL-reference sequences with desirable/undesirable preference tags.

Description

The FeedbackDatasetProcessor class extends DatasetProcessor to prepare training data for KTO-style preference learning. For each example, it determines whether the response is desirable or undesirable based on content presence, then encodes both the target sequence and a KL-reference sequence. The KL-reference is created by shifting responses by +1 across the batch to produce mismatched prompt-completion pairs. Prompt tokens are masked with IGNORE_INDEX in labels, and a boolean kto_tag tracks the preference direction per example.

Usage

Use this processor when preparing datasets for KTO training. It is selected automatically by the data loading pipeline when the training stage requires feedback-style preference data. The processor validates that each batch contains both desirable and undesirable examples, logging a warning if only one preference type is present.

Code Reference

Source Location

Signature

class FeedbackDatasetProcessor(DatasetProcessor):
    def _encode_data_example(
        self,
        prompt: list[dict[str, str]],
        response: list[dict[str, str]],
        kl_response: list[dict[str, str]],
        system: Optional[str],
        tools: Optional[str],
        images: list["ImageInput"],
        videos: list["VideoInput"],
        audios: list["AudioInput"],
    ) -> tuple[list[int], list[int], list[int], list[int], bool]

    def preprocess_dataset(self, examples: dict[str, list[Any]]) -> dict[str, list[Any]]

    def print_data_example(self, example: dict[str, list[int]]) -> None

Import

from llamafactory.data.processor.feedback import FeedbackDatasetProcessor

I/O Contract

Inputs

Name Type Required Description
examples dict[str, list[Any]] Yes Batch of raw examples with keys _prompt, _response, _system, _tools, _images, _videos, _audios
_prompt[i] list[dict[str, str]] Yes Conversation prompt messages (must have odd length)
_response[i] list[dict[str, str]] Yes Response pair: index 0 is the desired response, index 1 is the undesired response (must have at least 2 entries)

Outputs

Name Type Description
input_ids list[list[int]] Tokenized target input sequences
attention_mask list[list[int]] Attention masks for target sequences (all ones)
labels list[list[int]] Target labels with prompt tokens masked as IGNORE_INDEX
kl_input_ids list[list[int]] Tokenized KL-reference input sequences (mismatched pairs)
kl_attention_mask list[list[int]] Attention masks for KL-reference sequences
kl_labels list[list[int]] KL-reference labels with prompt tokens masked
kto_tags list[bool] True for desirable examples, False for undesirable

Usage Examples

from llamafactory.data.processor.feedback import FeedbackDatasetProcessor

# Instantiate with required dependencies
processor = FeedbackDatasetProcessor(
    template=template,
    tokenizer=tokenizer,
    processor=None,
    data_args=data_args,
)

# Preprocess a batch of examples
model_inputs = processor.preprocess_dataset(examples)
# model_inputs contains: input_ids, labels, kl_input_ids, kl_labels, kto_tags, etc.

# Debug: print a single example
processor.print_data_example(model_inputs[0])

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment