Implementation:Hiyouga LLaMA Factory Base Engine
| Knowledge Sources | |
|---|---|
| Domains | Inference, Architecture |
| Last Updated | 2026-02-06 19:00 GMT |
Overview
Base Engine defines the abstract base class and response dataclass that all LLaMA Factory inference backends must implement.
Description
The module provides the Response dataclass which encapsulates inference results (response_text, response_length, prompt_length, finish_reason) and the BaseEngine abstract base class. BaseEngine declares required attributes (name, model, tokenizer, can_generate, template, generating_args) and four abstract methods: __init__ for engine initialization, chat for batch response generation, stream_chat for token-by-token streaming via async generators, and get_scores for reward model scoring. All concrete engines (HuggingFace, vLLM, SGLang, KTransformers) must implement this interface.
Usage
Use this module as the contract when implementing a new inference backend. Subclass BaseEngine and implement all abstract methods to integrate a new serving framework with the ChatModel facade.
Code Reference
Source Location
- Repository: Hiyouga_LLaMA_Factory
- File: src/llamafactory/chat/base_engine.py
- Lines: 1-98
Signature
@dataclass
class Response:
response_text: str
response_length: int
prompt_length: int
finish_reason: Literal["stop", "length"]
class BaseEngine(ABC):
name: "EngineName"
model: Union["PreTrainedModel", "AsyncLLMEngine"]
tokenizer: "PreTrainedTokenizer"
can_generate: bool
template: "Template"
generating_args: dict[str, Any]
@abstractmethod
def __init__(
self,
model_args: "ModelArguments",
data_args: "DataArguments",
finetuning_args: "FinetuningArguments",
generating_args: "GeneratingArguments",
) -> None: ...
@abstractmethod
async def chat(
self,
messages: list[dict[str, str]],
system: Optional[str] = None,
tools: Optional[str] = None,
images: Optional[list["ImageInput"]] = None,
videos: Optional[list["VideoInput"]] = None,
audios: Optional[list["AudioInput"]] = None,
**input_kwargs,
) -> list["Response"]: ...
@abstractmethod
async def stream_chat(
self,
messages: list[dict[str, str]],
system: Optional[str] = None,
tools: Optional[str] = None,
images: Optional[list["ImageInput"]] = None,
videos: Optional[list["VideoInput"]] = None,
audios: Optional[list["AudioInput"]] = None,
**input_kwargs,
) -> AsyncGenerator[str, None]: ...
@abstractmethod
async def get_scores(
self,
batch_input: list[str],
**input_kwargs,
) -> list[float]: ...
Import
from llamafactory.chat.base_engine import BaseEngine, Response
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| messages | list[dict[str, str]] | Yes | Chat messages with "role" and "content" keys |
| system | str | No | System prompt to prepend |
| tools | str | No | JSON-serialized tool definitions |
| images | list[ImageInput] | No | Image inputs for multimodal models |
| videos | list[VideoInput] | No | Video inputs for multimodal models |
| audios | list[AudioInput] | No | Audio inputs for multimodal models |
| **input_kwargs | dict | No | Additional generation parameters (temperature, top_p, max_new_tokens, etc.) |
Outputs
| Name | Type | Description |
|---|---|---|
| list[Response] | list[Response] | List of generated responses from chat() |
| AsyncGenerator[str, None] | async generator | Token-by-token streaming output from stream_chat() |
| list[float] | list[float] | Reward model scores from get_scores() |
Usage Examples
from llamafactory.chat.base_engine import BaseEngine, Response
# Implementing a custom engine
class MyEngine(BaseEngine):
def __init__(self, model_args, data_args, finetuning_args, generating_args):
self.name = EngineName.HF
# ... initialization ...
async def chat(self, messages, system=None, tools=None, **kwargs):
# ... generate responses ...
return [Response(response_text="Hello!", response_length=2, prompt_length=5, finish_reason="stop")]
async def stream_chat(self, messages, system=None, tools=None, **kwargs):
yield "Hello"
yield "!"
async def get_scores(self, batch_input, **kwargs):
return [0.95, 0.87]
Related Pages
- Hiyouga_LLaMA_Factory_Chat_Model - Facade that delegates to BaseEngine implementations
- Hiyouga_LLaMA_Factory_VLLM_Engine - vLLM implementation of BaseEngine
- Hiyouga_LLaMA_Factory_SGLang_Engine - SGLang implementation of BaseEngine
- Hiyouga_LLaMA_Factory_KT_Engine - KTransformers implementation of BaseEngine