Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Hiyouga LLaMA Factory Data Formatter

From Leeroopedia


Knowledge Sources
Domains Data Processing, Template System
Last Updated 2026-02-06 19:00 GMT

Overview

Data Formatter defines the abstract formatter hierarchy and concrete implementations for encoding chat messages, tool calls, and tool descriptions into template slot sequences.

Description

The module provides an abstract Formatter base class with apply() (renders slots from inputs) and extract() (extracts function calls from response text) methods, along with four concrete implementations. EmptyFormatter returns static slot sequences with no placeholder substitution and validates that no placeholders exist. StringFormatter performs Template:Name placeholder replacement and validates that at least one placeholder exists. FunctionFormatter extends StringFormatter to parse JSON function call content, extract thought/tool_call blocks using configurable delimiter words, and format them via the configured tool utilities. ToolFormatter formats tool definition JSON into system prompt text and extracts function calls from model responses using the tool utilities' extractor.

Usage

Use these formatters when defining chat templates. Each template role (user, assistant, function, tool) uses a specific formatter to convert message content into the model's expected token slot format. The formatters are instantiated during template construction and called during encoding.

Code Reference

Source Location

Signature

@dataclass
class Formatter(ABC):
    slots: SLOTS = field(default_factory=list)
    tool_format: str | None = None

    @abstractmethod
    def apply(self, **kwargs) -> SLOTS: ...

    def extract(self, content: str) -> str | list["FunctionCall"]: ...

@dataclass
class EmptyFormatter(Formatter):
    def apply(self, **kwargs) -> SLOTS: ...

@dataclass
class StringFormatter(Formatter):
    def apply(self, **kwargs) -> SLOTS: ...

@dataclass
class FunctionFormatter(StringFormatter):
    def apply(self, **kwargs) -> SLOTS: ...

@dataclass
class ToolFormatter(Formatter):
    def apply(self, **kwargs) -> SLOTS: ...
    def extract(self, content: str) -> str | list["FunctionCall"]: ...

Import

from llamafactory.data.formatter import (
    Formatter,
    EmptyFormatter,
    StringFormatter,
    FunctionFormatter,
    ToolFormatter,
)

I/O Contract

Inputs

Name Type Required Description
slots SLOTS Yes Template slot sequence (strings, sets, or dicts) defining the format pattern
tool_format str No Tool format identifier (e.g., "default", "glm4") for selecting tool utilities
**kwargs (apply) dict Varies Key-value pairs for placeholder substitution; "content" is required for FunctionFormatter and ToolFormatter
content (extract) str Yes (for extract) Model response text to extract function calls from
thought_words list[str] No Two-element list of start/end delimiters for thought blocks in FunctionFormatter
tool_call_words list[str] No Two-element list of start/end delimiters for tool call blocks in FunctionFormatter

Outputs

Name Type Description
SLOTS list[str, set, dict] Rendered template slots ready for tokenization
str or list[FunctionCall] str or list[FunctionCall] Extracted plain text or structured function calls from extract()

Usage Examples

from llamafactory.data.formatter import StringFormatter, EmptyFormatter, ToolFormatter

# StringFormatter with placeholder substitution
formatter = StringFormatter(slots=["<|user|>\n{{content}}<|end|>"])
result = formatter.apply(content="Hello, world!")
# result: ["<|user|>\nHello, world!<|end|>"]

# EmptyFormatter for static tokens
eos_formatter = EmptyFormatter(slots=["</s>"])
result = eos_formatter.apply()
# result: ["</s>"]

# ToolFormatter for tool descriptions
tool_formatter = ToolFormatter(slots=["{{content}}"], tool_format="default")
tools_json = '[{"name": "get_weather", "description": "Get weather", "parameters": {}}]'
result = tool_formatter.apply(content=tools_json)

# Extract function calls from response
extracted = tool_formatter.extract("get_weather\n{\"location\": \"NYC\"}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment