Implementation:Turboderp org Exllamav2 ExLlamaV2Config
| Knowledge Sources | |
|---|---|
| Domains | Model_Architecture, Configuration, Deep_Learning |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Concrete tool for parsing and initializing transformer model configuration from HuggingFace-format model directories, provided by exllamav2.
Description
ExLlamaV2Config is the configuration class that reads a model's config.json and associated files to establish all architecture parameters needed for inference. It handles:
- Detecting the model architecture (Llama, Mistral, Qwen2, Gemma, Phi, DeepSeek, Cohere, etc.) from the architectures field
- Parsing hidden dimensions, layer counts, attention head configurations, and vocabulary sizes
- Reading RoPE (Rotary Position Embedding) settings including base frequency and scaling
- Scanning and mapping safetensors weight files to expected tensor names
- Handling EXL2 and GPTQ quantization metadata
- Applying attention backend compatibility overrides for flash-attn, xformers, and SDPA
The configuration lifecycle follows three steps: construction, prepare() to read and parse all files, and optionally arch_compat_overrides() to adjust attention settings for the current hardware.
Usage
Use ExLlamaV2Config as the first step in any exllamav2 inference pipeline. Every model, cache, tokenizer, and generator depends on a properly initialized config.
Code Reference
Source Location
- Repository: exllamav2
- File: exllamav2/config.py
- Lines: L167-197 (__init__), L210-626 (prepare), L629-676 (arch_compat_overrides)
Signature
class ExLlamaV2Config:
def __init__(self, model_dir: str | None = None):
...
def prepare(self, no_tensors: bool = False):
...
def arch_compat_overrides(self, quiet: bool = False, warn_only: bool = False):
...
Import
from exllamav2 import ExLlamaV2Config
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_dir | str or None | Yes (at init or before prepare) | Path to HuggingFace/EXL2/GPTQ model directory containing config.json and safetensors files |
| no_tensors | bool | No (default False) | If True, skip scanning tensor files; useful for inspecting config without loading weights |
| quiet | bool | No (default False) | Suppress compatibility override messages |
| warn_only | bool | No (default False) | Show warnings instead of raising errors on compatibility issues |
Outputs
| Name | Type | Description |
|---|---|---|
| config instance | ExLlamaV2Config | Fully initialized configuration object with the following key attributes: |
| config.hidden_size | int | Model hidden dimension (d_model) |
| config.num_hidden_layers | int | Number of transformer layers |
| config.num_attention_heads | int | Number of attention heads |
| config.num_key_value_heads | int | Number of key-value heads (for GQA) |
| config.vocab_size | int | Vocabulary size |
| config.max_seq_len | int | Maximum sequence length |
| config.architecture | str | Detected model architecture name |
| config.tensor_file_map | dict | Mapping of tensor names to file paths |
| config.rope_theta | float | RoPE base frequency |
Usage Examples
Basic Configuration
from exllamav2 import ExLlamaV2Config
# Initialize and prepare config
config = ExLlamaV2Config("/path/to/model")
config.prepare()
# Access architecture parameters
print(f"Architecture: {config.architecture}")
print(f"Hidden size: {config.hidden_size}")
print(f"Layers: {config.num_hidden_layers}")
print(f"Vocab size: {config.vocab_size}")
Configuration with Compatibility Overrides
from exllamav2 import ExLlamaV2Config
config = ExLlamaV2Config("/path/to/model")
config.prepare()
# Apply attention backend overrides for current hardware
config.arch_compat_overrides(quiet=True)
# Optionally override max sequence length
config.max_seq_len = 4096
Inspect Config Without Loading Tensors
from exllamav2 import ExLlamaV2Config
config = ExLlamaV2Config("/path/to/model")
config.prepare(no_tensors=True)
print(f"Model has {config.num_hidden_layers} layers")
print(f"Head dim: {config.hidden_size // config.num_attention_heads}")
Related Pages
Implements Principle
Requires Environment
- Environment:Turboderp_org_Exllamav2_CUDA_GPU_Runtime
- Environment:Turboderp_org_Exllamav2_Build_Toolchain