Implementation:Hiyouga LLaMA Factory V1 CLI Sampler
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, CLI Tools |
| Last Updated | 2026-02-06 19:00 GMT |
Overview
SyncSampler wraps the asynchronous BaseSampler in a synchronous interface, and run_chat provides an interactive command-line REPL for model inference.
Description
The SyncSampler class extends BaseSampler by creating a background asyncio event loop running in a daemon thread. Its generate method wraps the parent's async generator using asyncio.run_coroutine_threadsafe, yielding tokens synchronously. The batch_infer method similarly bridges async batch inference to a blocking call. The run_chat function orchestrates the full CLI experience: it parses arguments, initializes the ModelEngine and SyncSampler, and either runs batch inference on a provided dataset or enters an interactive loop where users type queries, receive streaming responses, and can clear history or exit.
Usage
Use run_chat as the entry point for command-line model interaction. It can be invoked directly as a script (python -m llamafactory.v1.samplers.cli_sampler) or called programmatically with argument dictionaries. Use SyncSampler when you need synchronous access to the async inference pipeline, such as in non-async contexts or simple scripts.
Code Reference
Source Location
- Repository: Hiyouga_LLaMA_Factory
- File: src/llamafactory/v1/samplers/cli_sampler.py
- Lines: 1-125
Signature
class SyncSampler(BaseSampler):
def __init__(
self,
args: SampleArguments,
model_args: ModelArguments,
model: HFModel,
renderer: Renderer,
) -> None: ...
def generate(self, messages: list[Message], tools: str | None = None) -> Generator[str, None, None]: ...
def batch_infer(self, dataset: TorchDataset) -> list[Sample]: ...
def run_chat(args: InputArgument = None): ...
Import
from llamafactory.v1.samplers.cli_sampler import SyncSampler, run_chat
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| args | SampleArguments | Yes | Sampling configuration arguments (temperature, top_p, backend, etc.) |
| model_args | ModelArguments | Yes | Model loading configuration (model path, dtype, etc.) |
| model | HFModel | Yes | The loaded HuggingFace model instance |
| renderer | Renderer | Yes | Message renderer for template formatting and parsing |
| messages (generate) | list[Message] | Yes | Chat message history to generate a response for |
| tools (generate) | str or None | No | JSON string of available tools for tool-calling |
| dataset (batch_infer) | TorchDataset | Yes | Dataset of samples for batch inference |
Outputs
| Name | Type | Description |
|---|---|---|
| generate | Generator[str, None, None] | Yields generated token strings one at a time |
| batch_infer | list[Sample] | List of inference results for all dataset samples |
| run_chat | None | Runs the interactive CLI loop (side effect: prints to stdout) |
Usage Examples
# Running the CLI chat from the command line
# python -m llamafactory.v1.samplers.cli_sampler --model_name_or_path my_model
# Programmatic usage
from llamafactory.v1.samplers.cli_sampler import run_chat
run_chat({"model_name_or_path": "Qwen/Qwen2-7B", "sample_backend": "hf"})
# Using SyncSampler directly
from llamafactory.v1.samplers.cli_sampler import SyncSampler
sampler = SyncSampler(sample_args, model_args, model, renderer)
messages = [{"role": "user", "content": [{"type": "text", "value": "What is 2+2?"}]}]
for token in sampler.generate(messages):
print(token, end="", flush=True)
Related Pages
- Hiyouga_LLaMA_Factory_V1_Types - Type definitions for Message, Sample, HFModel, TorchDataset
- Hiyouga_LLaMA_Factory_V1_Rendering_Plugin - Rendering plugins used by the renderer for message formatting