Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Hiyouga LLaMA Factory Chat Model

From Leeroopedia


Knowledge Sources
Domains Inference, API
Last Updated 2026-02-06 19:00 GMT

Overview

Chat Model is the primary user-facing inference class that provides a unified sync/async interface over multiple inference backends.

Description

The ChatModel class acts as a facade over the engine layer. During initialization, it parses inference arguments via get_infer_args, selects the appropriate backend engine (HuggingFace, vLLM, SGLang, or KTransformers) based on the infer_backend configuration, and starts a background asyncio event loop thread. It exposes three pairs of sync/async methods: chat/achat for batch generation, stream_chat/astream_chat for token-by-token streaming, and get_scores/aget_scores for reward model scoring. Synchronous methods use asyncio.run_coroutine_threadsafe to bridge to the async engine. The module also provides run_chat() for an interactive CLI chat loop with history management.

Usage

Use ChatModel as the primary entry point for inference in both programmatic and server contexts. It is used by the API server (app.py) for HTTP-based inference and by run_chat() for interactive command-line use.

Code Reference

Source Location

Signature

class ChatModel:
    def __init__(self, args: Optional[dict[str, Any]] = None) -> None: ...

    def chat(
        self,
        messages: list[dict[str, str]],
        system: Optional[str] = None,
        tools: Optional[str] = None,
        images: Optional[list["ImageInput"]] = None,
        videos: Optional[list["VideoInput"]] = None,
        audios: Optional[list["AudioInput"]] = None,
        **input_kwargs,
    ) -> list["Response"]: ...

    async def achat(self, messages, system=None, tools=None, images=None, videos=None, audios=None, **input_kwargs) -> list["Response"]: ...

    def stream_chat(self, messages, system=None, tools=None, images=None, videos=None, audios=None, **input_kwargs) -> Generator[str, None, None]: ...

    async def astream_chat(self, messages, system=None, tools=None, images=None, videos=None, audios=None, **input_kwargs) -> AsyncGenerator[str, None]: ...

    def get_scores(self, batch_input: list[str], **input_kwargs) -> list[float]: ...

    async def aget_scores(self, batch_input: list[str], **input_kwargs) -> list[float]: ...

def run_chat() -> None: ...

Import

from llamafactory.chat import ChatModel
from llamafactory.chat.chat_model import run_chat

I/O Contract

Inputs

Name Type Required Description
args dict[str, Any] No Configuration dictionary passed to get_infer_args; if None, parsed from command line
messages list[dict[str, str]] Yes Chat messages with "role" and "content" keys
system str No System prompt
tools str No JSON-serialized tool definitions
images list[ImageInput] No Image inputs for multimodal models
videos list[VideoInput] No Video inputs for multimodal models
audios list[AudioInput] No Audio inputs for multimodal models
batch_input list[str] Yes (for scoring) Text inputs for reward model scoring

Outputs

Name Type Description
list[Response] list[Response] Generated responses from chat/achat
Generator[str, None, None] sync generator Token stream from stream_chat
AsyncGenerator[str, None] async generator Token stream from astream_chat
list[float] list[float] Reward model scores from get_scores/aget_scores

Usage Examples

from llamafactory.chat import ChatModel

# Initialize with custom arguments
chat_model = ChatModel(args={
    "model_name_or_path": "meta-llama/Llama-2-7b-chat-hf",
    "template": "llama2",
    "infer_backend": "huggingface",
})

# Synchronous chat
messages = [{"role": "user", "content": "What is machine learning?"}]
responses = chat_model.chat(messages)
print(responses[0].response_text)

# Streaming chat
for token in chat_model.stream_chat(messages):
    print(token, end="", flush=True)

# Async chat (in async context)
responses = await chat_model.achat(messages)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment