Implementation:FMInference FlexLLMGen Get Opt Config
| Field | Value |
|---|---|
| Sources | Repo: FlexLLMGen |
| Domains | Model_Architecture, Configuration |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for resolving OPT model names to architecture configurations provided by the FlexLLMGen library.
Description
get_opt_config() takes a model name string, strips the organization prefix (e.g., "facebook/"), handles IML variants, and returns an OptConfig frozen dataclass with architectural parameters. The function supports the full range of OPT models from OPT-125M through OPT-175B, plus Galactica-30B.
The OptConfig dataclass contains all parameters needed to define the model architecture:
- Structural parameters -- num_hidden_layers, hidden_size, n_head, input_dim, ffn_embed_dim define the Transformer dimensions.
- Sequence parameters -- max_seq_len defines the maximum sequence length (default 2048).
- Vocabulary parameters -- vocab_size and pad_token_id define the tokenizer interface.
- Numerical parameters -- dtype (default np.float16) and layer_norm_eps control numerical precision.
- Utility methods -- model_bytes(), cache_bytes(), hidden_bytes() compute memory requirements.
The function also accepts **kwargs to override any config field, enabling custom configurations for testing or experimentation.
Usage
Call get_opt_config() before creating OptLM to get the model's architecture specification. The returned OptConfig is passed to OptLM's constructor along with the ExecutionEnv and Policy.
Code Reference
| Field | Value |
|---|---|
| Repository | FlexLLMGen |
| File | flexllmgen/opt_config.py |
| Lines | 17-125 |
Signature:
@dataclasses.dataclass(frozen=True)
class OptConfig:
name: str = "opt-125m"
num_hidden_layers: int = 12
max_seq_len: int = 2048
hidden_size: int = 768
n_head: int = 12
input_dim: int = 768
ffn_embed_dim: int = 3072
pad: int = 1
activation_fn: str = 'relu'
vocab_size: int = 50272
layer_norm_eps: float = 0.00001
pad_token_id: int = 1
dtype: type = np.float16
def get_opt_config(name, **kwargs):
# Resolves name -> OptConfig
...
return dataclasses.replace(config, **kwargs)
Import:
from flexllmgen.opt_config import OptConfig, get_opt_config
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
| name | str | Yes | Model name (e.g., "facebook/opt-30b" or "opt-6.7b") |
| **kwargs | Any | No | Override config fields (e.g., max_seq_len=1024) |
Outputs
| Output | Type | Description |
|---|---|---|
| OptConfig | frozen dataclass | Complete architecture specification |
| .name | str | Normalized model name |
| .num_hidden_layers | int | Number of Transformer decoder layers |
| .max_seq_len | int | Maximum sequence length |
| .hidden_size | int | Hidden representation dimensionality |
| .n_head | int | Number of attention heads |
| .input_dim | int | Input embedding dimension |
| .ffn_embed_dim | int | Feed-forward network intermediate dimension |
| .pad | int | Padding token index |
| .activation_fn | str | Activation function name |
| .vocab_size | int | Vocabulary size |
| .layer_norm_eps | float | Layer normalization epsilon |
| .pad_token_id | int | Padding token ID |
| .dtype | type | Data type for model parameters |
Usage Examples
Example 1: Get configuration for OPT-6.7B
from flexllmgen.opt_config import get_opt_config
config = get_opt_config("facebook/opt-6.7b")
print(config.name) # "opt-6.7b"
print(config.num_hidden_layers) # 32
print(config.hidden_size) # 4096
print(config.n_head) # 32
print(config.ffn_embed_dim) # 16384
Example 2: Get configuration for OPT-175B
from flexllmgen.opt_config import get_opt_config
config_175b = get_opt_config("facebook/opt-175b")
print(config_175b.name) # "opt-175b"
print(config_175b.num_hidden_layers) # 96
print(config_175b.hidden_size) # 12288
print(config_175b.n_head) # 96
print(config_175b.ffn_embed_dim) # 49152
Example 3: Compute memory requirements with model_bytes()
from flexllmgen.opt_config import get_opt_config
config = get_opt_config("facebook/opt-30b")
# Compute total model weight size in bytes
total_weight_bytes = config.model_bytes()
print(f"Model weights: {total_weight_bytes / (1024**3):.1f} GB")
# Compute KV cache size for a given batch and sequence length
batch_size = 8
seq_len = 2048
cache_bytes = config.cache_bytes(batch_size, seq_len)
print(f"KV cache: {cache_bytes / (1024**3):.1f} GB")
# Compute hidden state activation size
hidden_bytes = config.hidden_bytes(batch_size, seq_len)
print(f"Hidden states: {hidden_bytes / (1024**3):.1f} GB")
Example 4: Override config fields
from flexllmgen.opt_config import get_opt_config
# Get OPT-6.7B config but with shorter max sequence length
config = get_opt_config("facebook/opt-6.7b", max_seq_len=1024)
print(config.max_seq_len) # 1024