Implementation:Hiyouga LLaMA Factory Data Formatter
| Knowledge Sources | |
|---|---|
| Domains | Data Processing, Template System |
| Last Updated | 2026-02-06 19:00 GMT |
Overview
Data Formatter defines the abstract formatter hierarchy and concrete implementations for encoding chat messages, tool calls, and tool descriptions into template slot sequences.
Description
The module provides an abstract Formatter base class with apply() (renders slots from inputs) and extract() (extracts function calls from response text) methods, along with four concrete implementations. EmptyFormatter returns static slot sequences with no placeholder substitution and validates that no placeholders exist. StringFormatter performs Template:Name placeholder replacement and validates that at least one placeholder exists. FunctionFormatter extends StringFormatter to parse JSON function call content, extract thought/tool_call blocks using configurable delimiter words, and format them via the configured tool utilities. ToolFormatter formats tool definition JSON into system prompt text and extracts function calls from model responses using the tool utilities' extractor.
Usage
Use these formatters when defining chat templates. Each template role (user, assistant, function, tool) uses a specific formatter to convert message content into the model's expected token slot format. The formatters are instantiated during template construction and called during encoding.
Code Reference
Source Location
- Repository: Hiyouga_LLaMA_Factory
- File: src/llamafactory/data/formatter.py
- Lines: 1-159
Signature
@dataclass
class Formatter(ABC):
slots: SLOTS = field(default_factory=list)
tool_format: str | None = None
@abstractmethod
def apply(self, **kwargs) -> SLOTS: ...
def extract(self, content: str) -> str | list["FunctionCall"]: ...
@dataclass
class EmptyFormatter(Formatter):
def apply(self, **kwargs) -> SLOTS: ...
@dataclass
class StringFormatter(Formatter):
def apply(self, **kwargs) -> SLOTS: ...
@dataclass
class FunctionFormatter(StringFormatter):
def apply(self, **kwargs) -> SLOTS: ...
@dataclass
class ToolFormatter(Formatter):
def apply(self, **kwargs) -> SLOTS: ...
def extract(self, content: str) -> str | list["FunctionCall"]: ...
Import
from llamafactory.data.formatter import (
Formatter,
EmptyFormatter,
StringFormatter,
FunctionFormatter,
ToolFormatter,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| slots | SLOTS | Yes | Template slot sequence (strings, sets, or dicts) defining the format pattern |
| tool_format | str | No | Tool format identifier (e.g., "default", "glm4") for selecting tool utilities |
| **kwargs (apply) | dict | Varies | Key-value pairs for placeholder substitution; "content" is required for FunctionFormatter and ToolFormatter |
| content (extract) | str | Yes (for extract) | Model response text to extract function calls from |
| thought_words | list[str] | No | Two-element list of start/end delimiters for thought blocks in FunctionFormatter |
| tool_call_words | list[str] | No | Two-element list of start/end delimiters for tool call blocks in FunctionFormatter |
Outputs
| Name | Type | Description |
|---|---|---|
| SLOTS | list[str, set, dict] | Rendered template slots ready for tokenization |
| str or list[FunctionCall] | str or list[FunctionCall] | Extracted plain text or structured function calls from extract() |
Usage Examples
from llamafactory.data.formatter import StringFormatter, EmptyFormatter, ToolFormatter
# StringFormatter with placeholder substitution
formatter = StringFormatter(slots=["<|user|>\n{{content}}<|end|>"])
result = formatter.apply(content="Hello, world!")
# result: ["<|user|>\nHello, world!<|end|>"]
# EmptyFormatter for static tokens
eos_formatter = EmptyFormatter(slots=["</s>"])
result = eos_formatter.apply()
# result: ["</s>"]
# ToolFormatter for tool descriptions
tool_formatter = ToolFormatter(slots=["{{content}}"], tool_format="default")
tools_json = '[{"name": "get_weather", "description": "Get weather", "parameters": {}}]'
result = tool_formatter.apply(content=tools_json)
# Extract function calls from response
extracted = tool_formatter.extract("get_weather\n{\"location\": \"NYC\"}")
Related Pages
- Hiyouga_LLaMA_Factory_Data_Utils - Defines the SLOTS type alias used by all formatters
- Hiyouga_LLaMA_Factory_API_Chat - API layer that uses tool extraction from formatters
- Hiyouga_LLaMA_Factory_Base_Engine - Engine interface whose template attribute uses these formatters