Implementation:Hiyouga LLaMA Factory Base Engine

Knowledge Sources	Hiyouga_LLaMA_Factory
Domains	Inference, Architecture
Last Updated	2026-02-06 19:00 GMT

Overview

Base Engine defines the abstract base class and response dataclass that all LLaMA Factory inference backends must implement.

Description

The module provides the Response dataclass which encapsulates inference results (response_text, response_length, prompt_length, finish_reason) and the BaseEngine abstract base class. BaseEngine declares required attributes (name, model, tokenizer, can_generate, template, generating_args) and four abstract methods: __init__ for engine initialization, chat for batch response generation, stream_chat for token-by-token streaming via async generators, and get_scores for reward model scoring. All concrete engines (HuggingFace, vLLM, SGLang, KTransformers) must implement this interface.

Usage

Use this module as the contract when implementing a new inference backend. Subclass BaseEngine and implement all abstract methods to integrate a new serving framework with the ChatModel facade.

Code Reference

Source Location

Repository: Hiyouga_LLaMA_Factory
File: src/llamafactory/chat/base_engine.py
Lines: 1-98

Signature

@dataclass
class Response:
    response_text: str
    response_length: int
    prompt_length: int
    finish_reason: Literal["stop", "length"]

class BaseEngine(ABC):
    name: "EngineName"
    model: Union["PreTrainedModel", "AsyncLLMEngine"]
    tokenizer: "PreTrainedTokenizer"
    can_generate: bool
    template: "Template"
    generating_args: dict[str, Any]

    @abstractmethod
    def __init__(
        self,
        model_args: "ModelArguments",
        data_args: "DataArguments",
        finetuning_args: "FinetuningArguments",
        generating_args: "GeneratingArguments",
    ) -> None: ...

    @abstractmethod
    async def chat(
        self,
        messages: list[dict[str, str]],
        system: Optional[str] = None,
        tools: Optional[str] = None,
        images: Optional[list["ImageInput"]] = None,
        videos: Optional[list["VideoInput"]] = None,
        audios: Optional[list["AudioInput"]] = None,
        **input_kwargs,
    ) -> list["Response"]: ...

    @abstractmethod
    async def stream_chat(
        self,
        messages: list[dict[str, str]],
        system: Optional[str] = None,
        tools: Optional[str] = None,
        images: Optional[list["ImageInput"]] = None,
        videos: Optional[list["VideoInput"]] = None,
        audios: Optional[list["AudioInput"]] = None,
        **input_kwargs,
    ) -> AsyncGenerator[str, None]: ...

    @abstractmethod
    async def get_scores(
        self,
        batch_input: list[str],
        **input_kwargs,
    ) -> list[float]: ...

Import

from llamafactory.chat.base_engine import BaseEngine, Response

I/O Contract

Inputs

Name	Type	Required	Description
messages	list[dict[str, str]]	Yes	Chat messages with "role" and "content" keys
system	str	No	System prompt to prepend
tools	str	No	JSON-serialized tool definitions
images	list[ImageInput]	No	Image inputs for multimodal models
videos	list[VideoInput]	No	Video inputs for multimodal models
audios	list[AudioInput]	No	Audio inputs for multimodal models
**input_kwargs	dict	No	Additional generation parameters (temperature, top_p, max_new_tokens, etc.)

Outputs

Name	Type	Description
list[Response]	list[Response]	List of generated responses from chat()
AsyncGenerator[str, None]	async generator	Token-by-token streaming output from stream_chat()
list[float]	list[float]	Reward model scores from get_scores()

Usage Examples

from llamafactory.chat.base_engine import BaseEngine, Response

# Implementing a custom engine
class MyEngine(BaseEngine):
    def __init__(self, model_args, data_args, finetuning_args, generating_args):
        self.name = EngineName.HF
        # ... initialization ...

    async def chat(self, messages, system=None, tools=None, **kwargs):
        # ... generate responses ...
        return [Response(response_text="Hello!", response_length=2, prompt_length=5, finish_reason="stop")]

    async def stream_chat(self, messages, system=None, tools=None, **kwargs):
        yield "Hello"
        yield "!"

    async def get_scores(self, batch_input, **kwargs):
        return [0.95, 0.87]

Related Pages

Hiyouga_LLaMA_Factory_Chat_Model - Facade that delegates to BaseEngine implementations
Hiyouga_LLaMA_Factory_VLLM_Engine - vLLM implementation of BaseEngine
Hiyouga_LLaMA_Factory_SGLang_Engine - SGLang implementation of BaseEngine
Hiyouga_LLaMA_Factory_KT_Engine - KTransformers implementation of BaseEngine

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment