Implementation:Hiyouga LLaMA Factory V1 Base Sampler

Knowledge Sources	Hiyouga_LLaMA_Factory
Domains	Machine Learning, Text Generation
Last Updated	2026-02-06 19:00 GMT

Overview

BaseSampler is the base class for asynchronous text generation sampling that delegates to a configurable inference engine backend.

Description

The BaseSampler class provides the abstract sampling interface for the LLaMA-Factory v1 system. It initializes an inference engine based on the configured SampleBackend setting (currently supporting the HuggingFace backend) and exposes two async methods: generate for streaming token-by-token generation and batch_infer for batch inference over a dataset. Concrete samplers such as the CLI sampler extend this class to implement specific user-facing workflows while reusing the same inference backend.

Usage

Use BaseSampler when building a new sampler implementation that needs to perform asynchronous text generation or batch inference. Subclass it to add custom pre/post-processing logic around the core generation pipeline. This class is typically not instantiated directly but rather through its subclasses like CLISampler.

Code Reference

Source Location

Repository: Hiyouga_LLaMA_Factory
File: src/llamafactory/v1/core/base_sampler.py
Lines: 1-67

Signature

class BaseSampler:
    def __init__(
        self,
        args: SampleArguments,
        model_args: ModelArguments,
        model: HFModel,
        renderer: Renderer,
    ) -> None: ...

    async def generate(
        self, messages: list[Message], tools: str | None = None
    ) -> AsyncGenerator[str, None]: ...

    async def batch_infer(self, dataset: TorchDataset) -> list[Sample]: ...

Import

from llamafactory.v1.core.base_sampler import BaseSampler

I/O Contract

Inputs

Name	Type	Required	Description
args	SampleArguments	Yes	Sample configuration arguments including the sample backend setting.
model_args	ModelArguments	Yes	Model configuration arguments for initializing the inference engine.
model	HFModel	Yes	The HuggingFace model instance used for generation.
renderer	Renderer	Yes	The renderer used for template-based message formatting.
messages	list[Message]	Yes (generate)	List of conversation messages to generate a response for.
tools	str or None	No (generate)	Optional tools string for tool-augmented generation.
dataset	TorchDataset	Yes (batch_infer)	A PyTorch dataset for batch inference.

Outputs

Name	Type	Description
generate return	AsyncGenerator[str, None]	Asynchronous stream of generated tokens (strings).
batch_infer return	list[Sample]	List of inferred samples from the dataset.

Usage Examples

# Subclassing BaseSampler for a custom sampler
from llamafactory.v1.core.base_sampler import BaseSampler

class CustomSampler(BaseSampler):
    async def run_chat(self, messages):
        response = ""
        async for token in self.generate(messages):
            response += token
        return response

# Instantiation
sampler = CustomSampler(
    args=sample_args,
    model_args=model_args,
    model=model,
    renderer=renderer,
)

Related Pages

Hiyouga_LLaMA_Factory_V1_Inference_Engine - The underlying inference engine that BaseSampler delegates to.
Hiyouga_LLaMA_Factory_V1_Rendering - The Renderer class used for message formatting.
Hiyouga_LLaMA_Factory_V1_Model_Engine - The ModelEngine that provides the model and renderer.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment