Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Mit han lab Llm awq NVILA Builder

From Leeroopedia
Knowledge Sources
Domains NLP, Model_Loading
Last Updated 2026-02-15 00:00 GMT

Overview

Provides builder functions for constructing NVILA language models and tokenizers, with support for multiple quantization backends including QLlama, QMemLlama, and FP8 variants.

Description

This module contains the core construction utilities for the NVILA model family's language model and tokenizer components.

has_tokenizer checks whether a given path or HuggingFace repo contains a tokenizer by looking for "tokenizer_config.json" locally or via the HuggingFace Hub API. It gracefully handles HFValidationError exceptions.

context_length_extension scales RoPE (Rotary Position Embedding) when the model_max_length exceeds the original max_position_embeddings, applying linear scaling with a computed factor.

build_llm_and_tokenizer is the primary builder function. It loads an LLM configuration via AutoConfig, optionally modifies it for quantization, instantiates the model, and configures the tokenizer. Quantization support includes:

  • "QLlamaForCausalLM": Quantized LLaMA using AWQ-style quantization with QLlamaConfig
  • "QMemLlamaForCausalLM": Memory-efficient quantized LLaMA via QMemLlamaConfig
  • "FP8LinearQwen2ForCausalLM": FP8 linear quantization for Qwen2 models
  • "FP8ActivationQwen2ForCausalLM": FP8 activation quantization with checkpoint restoration
  • "FP8ActivationResidualQwen2ForCausalLM": FP8 activation + residual quantization

The tokenizer setup includes chat template loading from Jinja files, stop token inference, and media token registration (for image/video tokens). The build_tokenizer function provides the same tokenizer construction without instantiating the LLM.

Usage

Import build_llm_and_tokenizer when constructing an NVILA model that requires both a quantized LLM and a properly configured tokenizer. Use build_tokenizer when only the tokenizer is needed (e.g., for the NVILA LlavaMetaModel init_vlm path).

Code Reference

Source Location

Signature

def has_tokenizer(repo_id_or_path: str) -> bool: ...

def context_length_extension(config) -> PretrainedConfig: ...

def build_llm_and_tokenizer(
    model_name_or_path: str,
    config: PretrainedConfig,
    attn_implementation=None,
    model_max_length=None,
    *args,
    **kwargs,
) -> Tuple[PreTrainedModel, PreTrainedTokenizer]: ...

def build_tokenizer(
    model_name_or_path: str,
    config: PretrainedConfig,
    attn_implementation=None,
    model_max_length=None,
    *args,
    **kwargs,
) -> PreTrainedTokenizer: ...

Import

from tinychat.models.nvila.builder import build_llm_and_tokenizer, build_tokenizer, has_tokenizer

I/O Contract

Inputs

Name Type Required Description
model_name_or_path str Yes Path or HuggingFace model ID for the LLM
config PretrainedConfig Yes Top-level model config containing model_dtype, chat_template, etc.
attn_implementation str No Attention implementation type (e.g., "flash_attention_2")
model_max_length int No Maximum sequence length; triggers RoPE scaling if exceeds original context
quantize_model_class str No (kwarg) Quantization class name (e.g., "QLlamaForCausalLM")
model_args dataclass No (kwarg) Quantization-specific arguments
fp8_llm_cfg str No (kwarg) Path for FP8 checkpoint restoration

Outputs

Name Type Description
llm PreTrainedModel Instantiated language model (possibly quantized)
tokenizer PreTrainedTokenizer Configured tokenizer with media tokens, stop tokens, and chat template

Usage Examples

Building LLM and tokenizer

from tinychat.models.nvila.builder import build_llm_and_tokenizer

llm, tokenizer = build_llm_and_tokenizer(
    model_name_or_path="meta-llama/Llama-2-7b-hf",
    config=nvila_config,
    attn_implementation="flash_attention_2",
    model_max_length=4096,
)

Checking tokenizer availability

from tinychat.models.nvila.builder import has_tokenizer

if has_tokenizer("/path/to/model"):
    tokenizer = build_tokenizer("/path/to/model", config)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment