Implementation:Microsoft DeepSpeedExamples Get Model Config
Appearance
Overview
Concrete tool for loading model configuration and tokenizer for ZeRO-Inference.
Description
The get_model_config and get_tokenizer functions provide the configuration and tokenizer loading step of the ZeRO-Inference pipeline. Together, they extract architectural metadata and text encoding capabilities from HuggingFace model identifiers, handling special cases where model artifacts are not directly available (e.g., OPT-175B).
The get_model_config function:
- Checks if the model name contains
"175b", indicating the OPT-175B special case. - For OPT-175B: loads
facebook/opt-66bconfiguration and overrides five architectural parameters to match the 175B variant. - For all other models: loads configuration directly via
AutoConfig.from_pretrainedwithtrust_remote_code=True. - Explicitly sets
config.model_type = 'bloom'for BLOOM models (ensuring consistent type detection).
The get_tokenizer function:
- For OPT models: substitutes
"175b"with"66b"in the model name to load from a publicly available tokenizer. - For all other models: loads the tokenizer directly from the model name.
- Sets
tokenizer.pad_token = tokenizer.eos_tokenfor consistent padding during batch encoding. - Configures
padding_side="left"for OPT models (required for correct causal LM generation with padding).
Code Reference
Source
| Repository | File | Lines |
|---|---|---|
| DeepSpeedExamples | inference/huggingface/zero_inference/run_model.py |
30-59 |
Signature: get_model_config
def get_model_config(model_name):
"""Load model configuration, with special-case handling for OPT-175B.
For OPT-175B: loads opt-66b config then overrides
hidden_size=12288, word_embed_proj_dim=12288, ffn_dim=49152,
num_attention_heads=96, num_hidden_layers=96.
For all other models: loads config directly via AutoConfig.
"""
if "175b" in model_name:
config = AutoConfig.from_pretrained("facebook/opt-66b")
config.hidden_size = 12288
config.word_embed_proj_dim = 12288
config.ffn_dim = 12288 * 4
config.num_attention_heads = 96
config.num_hidden_layers = 96
else:
config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
if 'bloom' in model_name:
config.model_type = 'bloom'
return config
Signature: get_tokenizer
def get_tokenizer(model_name, config):
"""Load tokenizer with model-specific handling.
For OPT models: substitutes opt-175b -> opt-66b for tokenizer source,
uses left padding for proper causal LM generation.
For all other models: loads tokenizer directly.
Sets pad_token = eos_token for all models.
"""
if config.model_type == "opt":
tokenizer = AutoTokenizer.from_pretrained(
model_name.replace("175b", "66b"),
padding_side="left"
)
else:
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
return tokenizer
Import
# These functions are defined directly in run_model.py
# They use the following imports:
from transformers import AutoConfig, AutoTokenizer
I/O Contract
get_model_config
Inputs:
| Parameter | Type | Required | Description |
|---|---|---|---|
model_name |
str |
Yes | HuggingFace model identifier (e.g., "facebook/opt-66b", "bigscience/bloom", "meta-llama/Llama-2-70b-hf")
|
Outputs:
| Name | Type | Description |
|---|---|---|
config |
AutoConfig |
Model configuration object with hidden_size, num_hidden_layers, num_attention_heads, vocab_size, model_type, and torch_dtype attributes
|
get_tokenizer
Inputs:
| Parameter | Type | Required | Description |
|---|---|---|---|
model_name |
str |
Yes | HuggingFace model identifier |
config |
AutoConfig |
Yes | Model configuration (used to determine model_type for OPT special-casing)
|
Outputs:
| Name | Type | Description |
|---|---|---|
tokenizer |
AutoTokenizer |
Tokenizer with pad_token set to eos_token; OPT models use left-side padding
|
Usage Examples
Standard Model Configuration
# Load configuration for LLaMA-2-70B
config = get_model_config("meta-llama/Llama-2-70b-hf")
# config.hidden_size = 8192
# config.num_hidden_layers = 80
# config.num_attention_heads = 64
# config.model_type = "llama"
tokenizer = get_tokenizer("meta-llama/Llama-2-70b-hf", config)
# tokenizer.pad_token == tokenizer.eos_token
OPT-175B Special Case
# OPT-175B: loads from opt-66b and overrides dimensions
config = get_model_config("facebook/opt-175b")
# config.hidden_size = 12288 (overridden from opt-66b's 9216)
# config.num_hidden_layers = 96 (overridden from opt-66b's 64)
# config.num_attention_heads = 96 (overridden from opt-66b's 72)
# config.ffn_dim = 49152 (overridden from opt-66b's 36864)
# config.model_type = "opt"
tokenizer = get_tokenizer("facebook/opt-175b", config)
# Actually loads from "facebook/opt-66b" with padding_side="left"
BLOOM Model Configuration
# BLOOM: explicit model_type override
config = get_model_config("bigscience/bloom")
# config.model_type = 'bloom' (explicitly set)
tokenizer = get_tokenizer("bigscience/bloom", config)
Design Decisions
- OPT-175B proxy configuration: Since OPT-175B weights and configuration are not publicly available on HuggingFace, the code derives its architecture from OPT-66B. This approach avoids hardcoding the full configuration while leveraging the published architectural scaling pattern.
- Left padding for OPT: OPT models require left-side padding because the causal language model generates tokens from left to right, and right-padded inputs would place padding tokens before the generation position.
- trust_remote_code=True: Enabled for non-OPT-175B models to support architectures (like Mixtral) that require custom modeling code hosted on HuggingFace.
- Explicit BLOOM model_type: The BLOOM model's configuration may not always set
model_typecorrectly, so it is explicitly overridden when'bloom'appears in the model name.
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment