Implementation:Mit han lab Llm awq NVILA Builder

Knowledge Sources	Mit_han_lab_Llm_awq
Domains	NLP, Model_Loading
Last Updated	2026-02-15 00:00 GMT

Overview

Provides builder functions for constructing NVILA language models and tokenizers, with support for multiple quantization backends including QLlama, QMemLlama, and FP8 variants.

Description

This module contains the core construction utilities for the NVILA model family's language model and tokenizer components.

has_tokenizer checks whether a given path or HuggingFace repo contains a tokenizer by looking for "tokenizer_config.json" locally or via the HuggingFace Hub API. It gracefully handles HFValidationError exceptions.

context_length_extension scales RoPE (Rotary Position Embedding) when the model_max_length exceeds the original max_position_embeddings, applying linear scaling with a computed factor.

build_llm_and_tokenizer is the primary builder function. It loads an LLM configuration via AutoConfig, optionally modifies it for quantization, instantiates the model, and configures the tokenizer. Quantization support includes:

"QLlamaForCausalLM": Quantized LLaMA using AWQ-style quantization with QLlamaConfig
"QMemLlamaForCausalLM": Memory-efficient quantized LLaMA via QMemLlamaConfig
"FP8LinearQwen2ForCausalLM": FP8 linear quantization for Qwen2 models
"FP8ActivationQwen2ForCausalLM": FP8 activation quantization with checkpoint restoration
"FP8ActivationResidualQwen2ForCausalLM": FP8 activation + residual quantization

The tokenizer setup includes chat template loading from Jinja files, stop token inference, and media token registration (for image/video tokens). The build_tokenizer function provides the same tokenizer construction without instantiating the LLM.

Usage

Import build_llm_and_tokenizer when constructing an NVILA model that requires both a quantized LLM and a properly configured tokenizer. Use build_tokenizer when only the tokenizer is needed (e.g., for the NVILA LlavaMetaModel init_vlm path).

Code Reference

Source Location

Repository: Mit_han_lab_Llm_awq
File: tinychat/models/nvila/builder.py
Lines: 1-291

Signature

def has_tokenizer(repo_id_or_path: str) -> bool: ...

def context_length_extension(config) -> PretrainedConfig: ...

def build_llm_and_tokenizer(
    model_name_or_path: str,
    config: PretrainedConfig,
    attn_implementation=None,
    model_max_length=None,
    *args,
    **kwargs,
) -> Tuple[PreTrainedModel, PreTrainedTokenizer]: ...

def build_tokenizer(
    model_name_or_path: str,
    config: PretrainedConfig,
    attn_implementation=None,
    model_max_length=None,
    *args,
    **kwargs,
) -> PreTrainedTokenizer: ...

Import

from tinychat.models.nvila.builder import build_llm_and_tokenizer, build_tokenizer, has_tokenizer

I/O Contract

Inputs

Name	Type	Required	Description
model_name_or_path	str	Yes	Path or HuggingFace model ID for the LLM
config	PretrainedConfig	Yes	Top-level model config containing model_dtype, chat_template, etc.
attn_implementation	str	No	Attention implementation type (e.g., "flash_attention_2")
model_max_length	int	No	Maximum sequence length; triggers RoPE scaling if exceeds original context
quantize_model_class	str	No (kwarg)	Quantization class name (e.g., "QLlamaForCausalLM")
model_args	dataclass	No (kwarg)	Quantization-specific arguments
fp8_llm_cfg	str	No (kwarg)	Path for FP8 checkpoint restoration

Outputs

Name	Type	Description
llm	PreTrainedModel	Instantiated language model (possibly quantized)
tokenizer	PreTrainedTokenizer	Configured tokenizer with media tokens, stop tokens, and chat template

Usage Examples

Building LLM and tokenizer

from tinychat.models.nvila.builder import build_llm_and_tokenizer

llm, tokenizer = build_llm_and_tokenizer(
    model_name_or_path="meta-llama/Llama-2-7b-hf",
    config=nvila_config,
    attn_implementation="flash_attention_2",
    model_max_length=4096,
)

Checking tokenizer availability

from tinychat.models.nvila.builder import has_tokenizer

if has_tokenizer("/path/to/model"):
    tokenizer = build_tokenizer("/path/to/model", config)

Related Pages

Principle:Mit_han_lab_Llm_awq_NVILA_Model_Construction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment