Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Hiyouga LLaMA Factory Base Engine

From Leeroopedia


Knowledge Sources
Domains Inference, Architecture
Last Updated 2026-02-06 19:00 GMT

Overview

Base Engine defines the abstract base class and response dataclass that all LLaMA Factory inference backends must implement.

Description

The module provides the Response dataclass which encapsulates inference results (response_text, response_length, prompt_length, finish_reason) and the BaseEngine abstract base class. BaseEngine declares required attributes (name, model, tokenizer, can_generate, template, generating_args) and four abstract methods: __init__ for engine initialization, chat for batch response generation, stream_chat for token-by-token streaming via async generators, and get_scores for reward model scoring. All concrete engines (HuggingFace, vLLM, SGLang, KTransformers) must implement this interface.

Usage

Use this module as the contract when implementing a new inference backend. Subclass BaseEngine and implement all abstract methods to integrate a new serving framework with the ChatModel facade.

Code Reference

Source Location

Signature

@dataclass
class Response:
    response_text: str
    response_length: int
    prompt_length: int
    finish_reason: Literal["stop", "length"]

class BaseEngine(ABC):
    name: "EngineName"
    model: Union["PreTrainedModel", "AsyncLLMEngine"]
    tokenizer: "PreTrainedTokenizer"
    can_generate: bool
    template: "Template"
    generating_args: dict[str, Any]

    @abstractmethod
    def __init__(
        self,
        model_args: "ModelArguments",
        data_args: "DataArguments",
        finetuning_args: "FinetuningArguments",
        generating_args: "GeneratingArguments",
    ) -> None: ...

    @abstractmethod
    async def chat(
        self,
        messages: list[dict[str, str]],
        system: Optional[str] = None,
        tools: Optional[str] = None,
        images: Optional[list["ImageInput"]] = None,
        videos: Optional[list["VideoInput"]] = None,
        audios: Optional[list["AudioInput"]] = None,
        **input_kwargs,
    ) -> list["Response"]: ...

    @abstractmethod
    async def stream_chat(
        self,
        messages: list[dict[str, str]],
        system: Optional[str] = None,
        tools: Optional[str] = None,
        images: Optional[list["ImageInput"]] = None,
        videos: Optional[list["VideoInput"]] = None,
        audios: Optional[list["AudioInput"]] = None,
        **input_kwargs,
    ) -> AsyncGenerator[str, None]: ...

    @abstractmethod
    async def get_scores(
        self,
        batch_input: list[str],
        **input_kwargs,
    ) -> list[float]: ...

Import

from llamafactory.chat.base_engine import BaseEngine, Response

I/O Contract

Inputs

Name Type Required Description
messages list[dict[str, str]] Yes Chat messages with "role" and "content" keys
system str No System prompt to prepend
tools str No JSON-serialized tool definitions
images list[ImageInput] No Image inputs for multimodal models
videos list[VideoInput] No Video inputs for multimodal models
audios list[AudioInput] No Audio inputs for multimodal models
**input_kwargs dict No Additional generation parameters (temperature, top_p, max_new_tokens, etc.)

Outputs

Name Type Description
list[Response] list[Response] List of generated responses from chat()
AsyncGenerator[str, None] async generator Token-by-token streaming output from stream_chat()
list[float] list[float] Reward model scores from get_scores()

Usage Examples

from llamafactory.chat.base_engine import BaseEngine, Response

# Implementing a custom engine
class MyEngine(BaseEngine):
    def __init__(self, model_args, data_args, finetuning_args, generating_args):
        self.name = EngineName.HF
        # ... initialization ...

    async def chat(self, messages, system=None, tools=None, **kwargs):
        # ... generate responses ...
        return [Response(response_text="Hello!", response_length=2, prompt_length=5, finish_reason="stop")]

    async def stream_chat(self, messages, system=None, tools=None, **kwargs):
        yield "Hello"
        yield "!"

    async def get_scores(self, batch_input, **kwargs):
        return [0.95, 0.87]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment