Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Hpcaitech ColossalAI HuggingFaceModel Inference

From Leeroopedia


Knowledge Sources
Domains Evaluation, Distributed_Computing
Last Updated 2026-02-09 00:00 GMT

Overview

Model wrapper for distributed inference with tensor-parallel HuggingFace models across evaluation benchmarks, provided by ColossalEval.

Description

HuggingFaceModel wraps a HuggingFace model with ColossalAI's ShardFormer for tensor-parallel inference. The inference() method processes benchmark datasets in batches, computing logits, losses, and generated outputs based on the task type.

Usage

Create with a model path and shard config, then call inference() on each dataset's data loader.

Code Reference

Source Location

  • Repository: ColossalAI
  • File: applications/ColossalEval/colossal_eval/models/huggingface.py
  • Lines: 39-621

Signature

class HuggingFaceModel:
    def __init__(
        self,
        path: str,
        model_max_length: int = 2048,
        tokenizer_path: Optional[str] = None,
        tokenizer_kwargs: dict = {},
        peft_path: Optional[str] = None,
        model_kwargs: Dict = None,
        prompt_template: Conversation = None,
        batch_size: int = 1,
        logger: DistributedLogger = None,
        shard_config: ShardConfig = None,
    ):
        """
        Args:
            path: HuggingFace model path
            model_max_length: Maximum model context length
            shard_config: ShardConfig for tensor-parallel inference
        """

    def inference(
        self,
        data_loader: DataLoader,
        inference_kwargs: Dict[str, Any],
        debug: bool = False,
    ) -> List[Dict]:
        """Run inference on a dataset, returning results with outputs."""

Import

from colossal_eval.models import HuggingFaceModel

I/O Contract

Inputs

Name Type Required Description
path str Yes HuggingFace model path
shard_config ShardConfig No Tensor parallel sharding config
data_loader DataLoader Yes Benchmark dataset batches
inference_kwargs Dict Yes max_new_tokens, temperature, etc.

Outputs

Name Type Description
results List[Dict] Inference results with logits, loss, generated output per sample

Usage Examples

from colossal_eval.models import HuggingFaceModel
from colossalai.shardformer import ShardConfig

model = HuggingFaceModel(
    path="meta-llama/Llama-2-7b-hf",
    model_max_length=4096,
    batch_size=8,
    shard_config=ShardConfig(tensor_parallel_size=2),
)

results = model.inference(
    data_loader=mmlu_dataloader,
    inference_kwargs={"max_new_tokens": 5},
)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment