Implementation:Hiyouga LLaMA Factory Chat Model
| Knowledge Sources | |
|---|---|
| Domains | Inference, API |
| Last Updated | 2026-02-06 19:00 GMT |
Overview
Chat Model is the primary user-facing inference class that provides a unified sync/async interface over multiple inference backends.
Description
The ChatModel class acts as a facade over the engine layer. During initialization, it parses inference arguments via get_infer_args, selects the appropriate backend engine (HuggingFace, vLLM, SGLang, or KTransformers) based on the infer_backend configuration, and starts a background asyncio event loop thread. It exposes three pairs of sync/async methods: chat/achat for batch generation, stream_chat/astream_chat for token-by-token streaming, and get_scores/aget_scores for reward model scoring. Synchronous methods use asyncio.run_coroutine_threadsafe to bridge to the async engine. The module also provides run_chat() for an interactive CLI chat loop with history management.
Usage
Use ChatModel as the primary entry point for inference in both programmatic and server contexts. It is used by the API server (app.py) for HTTP-based inference and by run_chat() for interactive command-line use.
Code Reference
Source Location
- Repository: Hiyouga_LLaMA_Factory
- File: src/llamafactory/chat/chat_model.py
- Lines: 1-210
Signature
class ChatModel:
def __init__(self, args: Optional[dict[str, Any]] = None) -> None: ...
def chat(
self,
messages: list[dict[str, str]],
system: Optional[str] = None,
tools: Optional[str] = None,
images: Optional[list["ImageInput"]] = None,
videos: Optional[list["VideoInput"]] = None,
audios: Optional[list["AudioInput"]] = None,
**input_kwargs,
) -> list["Response"]: ...
async def achat(self, messages, system=None, tools=None, images=None, videos=None, audios=None, **input_kwargs) -> list["Response"]: ...
def stream_chat(self, messages, system=None, tools=None, images=None, videos=None, audios=None, **input_kwargs) -> Generator[str, None, None]: ...
async def astream_chat(self, messages, system=None, tools=None, images=None, videos=None, audios=None, **input_kwargs) -> AsyncGenerator[str, None]: ...
def get_scores(self, batch_input: list[str], **input_kwargs) -> list[float]: ...
async def aget_scores(self, batch_input: list[str], **input_kwargs) -> list[float]: ...
def run_chat() -> None: ...
Import
from llamafactory.chat import ChatModel
from llamafactory.chat.chat_model import run_chat
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| args | dict[str, Any] | No | Configuration dictionary passed to get_infer_args; if None, parsed from command line |
| messages | list[dict[str, str]] | Yes | Chat messages with "role" and "content" keys |
| system | str | No | System prompt |
| tools | str | No | JSON-serialized tool definitions |
| images | list[ImageInput] | No | Image inputs for multimodal models |
| videos | list[VideoInput] | No | Video inputs for multimodal models |
| audios | list[AudioInput] | No | Audio inputs for multimodal models |
| batch_input | list[str] | Yes (for scoring) | Text inputs for reward model scoring |
Outputs
| Name | Type | Description |
|---|---|---|
| list[Response] | list[Response] | Generated responses from chat/achat |
| Generator[str, None, None] | sync generator | Token stream from stream_chat |
| AsyncGenerator[str, None] | async generator | Token stream from astream_chat |
| list[float] | list[float] | Reward model scores from get_scores/aget_scores |
Usage Examples
from llamafactory.chat import ChatModel
# Initialize with custom arguments
chat_model = ChatModel(args={
"model_name_or_path": "meta-llama/Llama-2-7b-chat-hf",
"template": "llama2",
"infer_backend": "huggingface",
})
# Synchronous chat
messages = [{"role": "user", "content": "What is machine learning?"}]
responses = chat_model.chat(messages)
print(responses[0].response_text)
# Streaming chat
for token in chat_model.stream_chat(messages):
print(token, end="", flush=True)
# Async chat (in async context)
responses = await chat_model.achat(messages)
Related Pages
- Hiyouga_LLaMA_Factory_Base_Engine - Abstract interface implemented by all backends
- Hiyouga_LLaMA_Factory_VLLM_Engine - vLLM backend engine
- Hiyouga_LLaMA_Factory_SGLang_Engine - SGLang backend engine
- Hiyouga_LLaMA_Factory_KT_Engine - KTransformers backend engine
- Hiyouga_LLaMA_Factory_API_App - API server that uses ChatModel