Implementation:Hpcaitech ColossalAI HuggingFaceModel Inference
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Distributed_Computing |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Model wrapper for distributed inference with tensor-parallel HuggingFace models across evaluation benchmarks, provided by ColossalEval.
Description
HuggingFaceModel wraps a HuggingFace model with ColossalAI's ShardFormer for tensor-parallel inference. The inference() method processes benchmark datasets in batches, computing logits, losses, and generated outputs based on the task type.
Usage
Create with a model path and shard config, then call inference() on each dataset's data loader.
Code Reference
Source Location
- Repository: ColossalAI
- File: applications/ColossalEval/colossal_eval/models/huggingface.py
- Lines: 39-621
Signature
class HuggingFaceModel:
def __init__(
self,
path: str,
model_max_length: int = 2048,
tokenizer_path: Optional[str] = None,
tokenizer_kwargs: dict = {},
peft_path: Optional[str] = None,
model_kwargs: Dict = None,
prompt_template: Conversation = None,
batch_size: int = 1,
logger: DistributedLogger = None,
shard_config: ShardConfig = None,
):
"""
Args:
path: HuggingFace model path
model_max_length: Maximum model context length
shard_config: ShardConfig for tensor-parallel inference
"""
def inference(
self,
data_loader: DataLoader,
inference_kwargs: Dict[str, Any],
debug: bool = False,
) -> List[Dict]:
"""Run inference on a dataset, returning results with outputs."""
Import
from colossal_eval.models import HuggingFaceModel
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| path | str | Yes | HuggingFace model path |
| shard_config | ShardConfig | No | Tensor parallel sharding config |
| data_loader | DataLoader | Yes | Benchmark dataset batches |
| inference_kwargs | Dict | Yes | max_new_tokens, temperature, etc. |
Outputs
| Name | Type | Description |
|---|---|---|
| results | List[Dict] | Inference results with logits, loss, generated output per sample |
Usage Examples
from colossal_eval.models import HuggingFaceModel
from colossalai.shardformer import ShardConfig
model = HuggingFaceModel(
path="meta-llama/Llama-2-7b-hf",
model_max_length=4096,
batch_size=8,
shard_config=ShardConfig(tensor_parallel_size=2),
)
results = model.inference(
data_loader=mmlu_dataloader,
inference_kwargs={"max_new_tokens": 5},
)
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment