Implementation:Microsoft BIPIA AutoLLM
Overview
Concrete tool for loading LLM model classes provided by the BIPIA benchmark library.
Description
AutoLLM is a factory class that maps 21 model identifiers to their corresponding model wrapper classes. It supports three loading paths:
- Direct name lookup -- The provided string matches a key in the
LLM_NAME_TO_CLASSdictionary (e.g.,"gpt35"), and the corresponding class is returned immediately. - YAML config file path -- The provided string ends with
.yamlor.yml. The factory loads the YAML file, reads themodel_namekey, and resolves it through the same dictionary. - ValueError -- If the name matches neither a known key nor a valid YAML path, a
ValueErroris raised with a message listing the supported model names.
The factory returns a class (not an instance). The caller must then construct the returned class with the appropriate arguments, which vary by backend type (see I/O Contract below).
Usage
Import AutoLLM when you need to instantiate any supported LLM for benchmark inference within the BIPIA framework. The function accepts either a model name string or a YAML config file path and returns the appropriate model class for construction.
from bipia.model import AutoLLM
Code Reference
Source: BIPIA repo, File: bipia/model/__init__.py, Lines: L1-72
Signature:
@classmethod def from_name(cls, name: str) -> Type[BaseModel]
The returned class constructors vary by backend:
- GPTModel
(config=str|dict)-- For OpenAI API-based models. - LLMModel
(config=str|dict, accelerator=Accelerator)-- For HuggingFace Transformers models requiring a HuggingFace Accelerator instance. - vLLMModel
(config=str|dict, tensor_parallel_size=int)-- For vLLM-accelerated models with configurable tensor parallelism.
Import:
from bipia.model import AutoLLM
LLM_NAME_TO_CLASS Mapping (21 entries):
LLM_NAME_TO_CLASS = {
"gpt35": GPTModel,
"gpt4": GPTModel,
"gpt35_0613": GPTModel,
"gpt4_0613": GPTModel,
"gpt4_1106": GPTModel,
"llama2_7b": LLMModel,
"llama2_13b": LLMModel,
"llama2_70b": LLMModel,
"vicuna_7b": LLMModel,
"vicuna_13b": LLMModel,
"vicuna_33b": LLMModel,
"falcon_7b": LLMModel,
"falcon_40b": LLMModel,
"mpt_7b": LLMModel,
"mpt_30b": LLMModel,
"mistral": vLLMModel,
"llama2_7b_vllm": vLLMModel,
"llama2_13b_vllm": vLLMModel,
"llama2_70b_vllm": vLLMModel,
"vicuna_7b_vllm": vLLMModel,
"vicuna_13b_vllm": vLLMModel,
}
I/O Contract
| Parameter | Type | Required | Description |
|---|---|---|---|
| name | str | Yes | A model name key (e.g., "gpt35") or a path to a YAML config file (e.g., "config/vicuna_13b.yaml")
|
| Return Type | Description |
|---|---|
Type[BaseModel] |
A model class (not an instance) that exposes process_fn() and generate() methods. Must be constructed by the caller with backend-specific arguments.
|
Usage Examples
1. Basic usage with a direct model name:
llm_cls = AutoLLM.from_name("gpt35")
llm = llm_cls(config="config/gpt35.yaml")
output = llm.generate(llm.process_fn(prompt))
2. Loading from a YAML config file (HuggingFace backend):
llm_cls = AutoLLM.from_name("config/vicuna_13b.yaml")
llm = llm_cls(config="config/vicuna_13b.yaml", accelerator=accelerator)
output = llm.generate(llm.process_fn(prompt))
3. Using vLLM with tensor parallelism:
llm_cls = AutoLLM.from_name("mistral")
llm = llm_cls(config="config/mistral_7b.yaml", tensor_parallel_size=4)
output = llm.generate(llm.process_fn(prompt))
Related Pages
- Principle:Microsoft_BIPIA_Model_Loading
- Environment:Microsoft_BIPIA_Python_CUDA_GPU_Environment
- Heuristic:Microsoft_BIPIA_BF16_Compute_Capability_Check
- Heuristic:Microsoft_BIPIA_Torch_Compile_Platform_Guard
- Heuristic:Microsoft_BIPIA_LLAMA_Pad_Token_Workaround
- Heuristic:Microsoft_BIPIA_Delta_Weight_CPU_Loading