Implementation:Mit han lab Llm awq NVILA Builder
| Knowledge Sources | |
|---|---|
| Domains | NLP, Model_Loading |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Provides builder functions for constructing NVILA language models and tokenizers, with support for multiple quantization backends including QLlama, QMemLlama, and FP8 variants.
Description
This module contains the core construction utilities for the NVILA model family's language model and tokenizer components.
has_tokenizer checks whether a given path or HuggingFace repo contains a tokenizer by looking for "tokenizer_config.json" locally or via the HuggingFace Hub API. It gracefully handles HFValidationError exceptions.
context_length_extension scales RoPE (Rotary Position Embedding) when the model_max_length exceeds the original max_position_embeddings, applying linear scaling with a computed factor.
build_llm_and_tokenizer is the primary builder function. It loads an LLM configuration via AutoConfig, optionally modifies it for quantization, instantiates the model, and configures the tokenizer. Quantization support includes:
- "QLlamaForCausalLM": Quantized LLaMA using AWQ-style quantization with QLlamaConfig
- "QMemLlamaForCausalLM": Memory-efficient quantized LLaMA via QMemLlamaConfig
- "FP8LinearQwen2ForCausalLM": FP8 linear quantization for Qwen2 models
- "FP8ActivationQwen2ForCausalLM": FP8 activation quantization with checkpoint restoration
- "FP8ActivationResidualQwen2ForCausalLM": FP8 activation + residual quantization
The tokenizer setup includes chat template loading from Jinja files, stop token inference, and media token registration (for image/video tokens). The build_tokenizer function provides the same tokenizer construction without instantiating the LLM.
Usage
Import build_llm_and_tokenizer when constructing an NVILA model that requires both a quantized LLM and a properly configured tokenizer. Use build_tokenizer when only the tokenizer is needed (e.g., for the NVILA LlavaMetaModel init_vlm path).
Code Reference
Source Location
- Repository: Mit_han_lab_Llm_awq
- File: tinychat/models/nvila/builder.py
- Lines: 1-291
Signature
def has_tokenizer(repo_id_or_path: str) -> bool: ...
def context_length_extension(config) -> PretrainedConfig: ...
def build_llm_and_tokenizer(
model_name_or_path: str,
config: PretrainedConfig,
attn_implementation=None,
model_max_length=None,
*args,
**kwargs,
) -> Tuple[PreTrainedModel, PreTrainedTokenizer]: ...
def build_tokenizer(
model_name_or_path: str,
config: PretrainedConfig,
attn_implementation=None,
model_max_length=None,
*args,
**kwargs,
) -> PreTrainedTokenizer: ...
Import
from tinychat.models.nvila.builder import build_llm_and_tokenizer, build_tokenizer, has_tokenizer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_name_or_path | str | Yes | Path or HuggingFace model ID for the LLM |
| config | PretrainedConfig | Yes | Top-level model config containing model_dtype, chat_template, etc. |
| attn_implementation | str | No | Attention implementation type (e.g., "flash_attention_2") |
| model_max_length | int | No | Maximum sequence length; triggers RoPE scaling if exceeds original context |
| quantize_model_class | str | No (kwarg) | Quantization class name (e.g., "QLlamaForCausalLM") |
| model_args | dataclass | No (kwarg) | Quantization-specific arguments |
| fp8_llm_cfg | str | No (kwarg) | Path for FP8 checkpoint restoration |
Outputs
| Name | Type | Description |
|---|---|---|
| llm | PreTrainedModel | Instantiated language model (possibly quantized) |
| tokenizer | PreTrainedTokenizer | Configured tokenizer with media tokens, stop tokens, and chat template |
Usage Examples
Building LLM and tokenizer
from tinychat.models.nvila.builder import build_llm_and_tokenizer
llm, tokenizer = build_llm_and_tokenizer(
model_name_or_path="meta-llama/Llama-2-7b-hf",
config=nvila_config,
attn_implementation="flash_attention_2",
model_max_length=4096,
)
Checking tokenizer availability
from tinychat.models.nvila.builder import has_tokenizer
if has_tokenizer("/path/to/model"):
tokenizer = build_tokenizer("/path/to/model", config)