Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Microsoft DeepSpeedExamples Get Model Config

From Leeroopedia


Overview

Concrete tool for loading model configuration and tokenizer for ZeRO-Inference.

Description

The get_model_config and get_tokenizer functions provide the configuration and tokenizer loading step of the ZeRO-Inference pipeline. Together, they extract architectural metadata and text encoding capabilities from HuggingFace model identifiers, handling special cases where model artifacts are not directly available (e.g., OPT-175B).

The get_model_config function:

  1. Checks if the model name contains "175b", indicating the OPT-175B special case.
  2. For OPT-175B: loads facebook/opt-66b configuration and overrides five architectural parameters to match the 175B variant.
  3. For all other models: loads configuration directly via AutoConfig.from_pretrained with trust_remote_code=True.
  4. Explicitly sets config.model_type = 'bloom' for BLOOM models (ensuring consistent type detection).

The get_tokenizer function:

  1. For OPT models: substitutes "175b" with "66b" in the model name to load from a publicly available tokenizer.
  2. For all other models: loads the tokenizer directly from the model name.
  3. Sets tokenizer.pad_token = tokenizer.eos_token for consistent padding during batch encoding.
  4. Configures padding_side="left" for OPT models (required for correct causal LM generation with padding).

Code Reference

Source

Repository File Lines
DeepSpeedExamples inference/huggingface/zero_inference/run_model.py 30-59

Signature: get_model_config

def get_model_config(model_name):
    """Load model configuration, with special-case handling for OPT-175B.

    For OPT-175B: loads opt-66b config then overrides
    hidden_size=12288, word_embed_proj_dim=12288, ffn_dim=49152,
    num_attention_heads=96, num_hidden_layers=96.

    For all other models: loads config directly via AutoConfig.
    """
    if "175b" in model_name:
        config = AutoConfig.from_pretrained("facebook/opt-66b")
        config.hidden_size = 12288
        config.word_embed_proj_dim = 12288
        config.ffn_dim = 12288 * 4
        config.num_attention_heads = 96
        config.num_hidden_layers = 96
    else:
        config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)

    if 'bloom' in model_name:
        config.model_type = 'bloom'

    return config

Signature: get_tokenizer

def get_tokenizer(model_name, config):
    """Load tokenizer with model-specific handling.

    For OPT models: substitutes opt-175b -> opt-66b for tokenizer source,
    uses left padding for proper causal LM generation.

    For all other models: loads tokenizer directly.
    Sets pad_token = eos_token for all models.
    """
    if config.model_type == "opt":
        tokenizer = AutoTokenizer.from_pretrained(
            model_name.replace("175b", "66b"),
            padding_side="left"
        )
    else:
        tokenizer = AutoTokenizer.from_pretrained(model_name)

    tokenizer.pad_token = tokenizer.eos_token
    return tokenizer

Import

# These functions are defined directly in run_model.py
# They use the following imports:
from transformers import AutoConfig, AutoTokenizer

I/O Contract

get_model_config

Inputs:

Parameter Type Required Description
model_name str Yes HuggingFace model identifier (e.g., "facebook/opt-66b", "bigscience/bloom", "meta-llama/Llama-2-70b-hf")

Outputs:

Name Type Description
config AutoConfig Model configuration object with hidden_size, num_hidden_layers, num_attention_heads, vocab_size, model_type, and torch_dtype attributes

get_tokenizer

Inputs:

Parameter Type Required Description
model_name str Yes HuggingFace model identifier
config AutoConfig Yes Model configuration (used to determine model_type for OPT special-casing)

Outputs:

Name Type Description
tokenizer AutoTokenizer Tokenizer with pad_token set to eos_token; OPT models use left-side padding

Usage Examples

Standard Model Configuration

# Load configuration for LLaMA-2-70B
config = get_model_config("meta-llama/Llama-2-70b-hf")
# config.hidden_size = 8192
# config.num_hidden_layers = 80
# config.num_attention_heads = 64
# config.model_type = "llama"

tokenizer = get_tokenizer("meta-llama/Llama-2-70b-hf", config)
# tokenizer.pad_token == tokenizer.eos_token

OPT-175B Special Case

# OPT-175B: loads from opt-66b and overrides dimensions
config = get_model_config("facebook/opt-175b")
# config.hidden_size = 12288  (overridden from opt-66b's 9216)
# config.num_hidden_layers = 96  (overridden from opt-66b's 64)
# config.num_attention_heads = 96  (overridden from opt-66b's 72)
# config.ffn_dim = 49152  (overridden from opt-66b's 36864)
# config.model_type = "opt"

tokenizer = get_tokenizer("facebook/opt-175b", config)
# Actually loads from "facebook/opt-66b" with padding_side="left"

BLOOM Model Configuration

# BLOOM: explicit model_type override
config = get_model_config("bigscience/bloom")
# config.model_type = 'bloom'  (explicitly set)

tokenizer = get_tokenizer("bigscience/bloom", config)

Design Decisions

  • OPT-175B proxy configuration: Since OPT-175B weights and configuration are not publicly available on HuggingFace, the code derives its architecture from OPT-66B. This approach avoids hardcoding the full configuration while leveraging the published architectural scaling pattern.
  • Left padding for OPT: OPT models require left-side padding because the causal language model generates tokens from left to right, and right-padded inputs would place padding tokens before the generation position.
  • trust_remote_code=True: Enabled for non-OPT-175B models to support architectures (like Mixtral) that require custom modeling code hosted on HuggingFace.
  • Explicit BLOOM model_type: The BLOOM model's configuration may not always set model_type correctly, so it is explicitly overridden when 'bloom' appears in the model name.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment