Implementation:FMInference FlexLLMGen Get Opt Config

Field	Value
Sources	Repo: FlexLLMGen
Domains	Model_Architecture, Configuration
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for resolving OPT model names to architecture configurations provided by the FlexLLMGen library.

Description

get_opt_config() takes a model name string, strips the organization prefix (e.g., "facebook/"), handles IML variants, and returns an OptConfig frozen dataclass with architectural parameters. The function supports the full range of OPT models from OPT-125M through OPT-175B, plus Galactica-30B.

The OptConfig dataclass contains all parameters needed to define the model architecture:

Structural parameters -- num_hidden_layers, hidden_size, n_head, input_dim, ffn_embed_dim define the Transformer dimensions.
Sequence parameters -- max_seq_len defines the maximum sequence length (default 2048).
Vocabulary parameters -- vocab_size and pad_token_id define the tokenizer interface.
Numerical parameters -- dtype (default np.float16) and layer_norm_eps control numerical precision.
Utility methods -- model_bytes(), cache_bytes(), hidden_bytes() compute memory requirements.

The function also accepts **kwargs to override any config field, enabling custom configurations for testing or experimentation.

Usage

Call get_opt_config() before creating OptLM to get the model's architecture specification. The returned OptConfig is passed to OptLM's constructor along with the ExecutionEnv and Policy.

Code Reference

Field	Value
Repository	FlexLLMGen
File	flexllmgen/opt_config.py
Lines	17-125

Signature:

@dataclasses.dataclass(frozen=True)
class OptConfig:
    name: str = "opt-125m"
    num_hidden_layers: int = 12
    max_seq_len: int = 2048
    hidden_size: int = 768
    n_head: int = 12
    input_dim: int = 768
    ffn_embed_dim: int = 3072
    pad: int = 1
    activation_fn: str = 'relu'
    vocab_size: int = 50272
    layer_norm_eps: float = 0.00001
    pad_token_id: int = 1
    dtype: type = np.float16

def get_opt_config(name, **kwargs):
    # Resolves name -> OptConfig
    ...
    return dataclasses.replace(config, **kwargs)

Import:

from flexllmgen.opt_config import OptConfig, get_opt_config

I/O Contract

Inputs

Parameter	Type	Required	Description
name	str	Yes	Model name (e.g., "facebook/opt-30b" or "opt-6.7b")
**kwargs	Any	No	Override config fields (e.g., max_seq_len=1024)

Outputs

Output	Type	Description
OptConfig	frozen dataclass	Complete architecture specification
.name	str	Normalized model name
.num_hidden_layers	int	Number of Transformer decoder layers
.max_seq_len	int	Maximum sequence length
.hidden_size	int	Hidden representation dimensionality
.n_head	int	Number of attention heads
.input_dim	int	Input embedding dimension
.ffn_embed_dim	int	Feed-forward network intermediate dimension
.pad	int	Padding token index
.activation_fn	str	Activation function name
.vocab_size	int	Vocabulary size
.layer_norm_eps	float	Layer normalization epsilon
.pad_token_id	int	Padding token ID
.dtype	type	Data type for model parameters

Usage Examples

Example 1: Get configuration for OPT-6.7B

from flexllmgen.opt_config import get_opt_config

config = get_opt_config("facebook/opt-6.7b")

print(config.name)              # "opt-6.7b"
print(config.num_hidden_layers) # 32
print(config.hidden_size)       # 4096
print(config.n_head)            # 32
print(config.ffn_embed_dim)     # 16384

Example 2: Get configuration for OPT-175B

from flexllmgen.opt_config import get_opt_config

config_175b = get_opt_config("facebook/opt-175b")

print(config_175b.name)              # "opt-175b"
print(config_175b.num_hidden_layers) # 96
print(config_175b.hidden_size)       # 12288
print(config_175b.n_head)            # 96
print(config_175b.ffn_embed_dim)     # 49152

Example 3: Compute memory requirements with model_bytes()

from flexllmgen.opt_config import get_opt_config

config = get_opt_config("facebook/opt-30b")

# Compute total model weight size in bytes
total_weight_bytes = config.model_bytes()
print(f"Model weights: {total_weight_bytes / (1024**3):.1f} GB")

# Compute KV cache size for a given batch and sequence length
batch_size = 8
seq_len = 2048
cache_bytes = config.cache_bytes(batch_size, seq_len)
print(f"KV cache: {cache_bytes / (1024**3):.1f} GB")

# Compute hidden state activation size
hidden_bytes = config.hidden_bytes(batch_size, seq_len)
print(f"Hidden states: {hidden_bytes / (1024**3):.1f} GB")

Example 4: Override config fields

from flexllmgen.opt_config import get_opt_config

# Get OPT-6.7B config but with shorter max sequence length
config = get_opt_config("facebook/opt-6.7b", max_seq_len=1024)
print(config.max_seq_len)  # 1024

Related Pages

Principle:FMInference_FlexLLMGen_Model_Configuration_Resolution

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment