Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Mit han lab Llm awq VILA Multimodal Environment

From Leeroopedia
Knowledge Sources
Domains Infrastructure, Multimodal
Last Updated 2026-02-15 01:00 GMT

Overview

Optional VILA framework dependency for multimodal (vision-language) model quantization and inference with NVILA, VILA 1.5, and InternVL3.

Description

VILA is an external framework from NVIDIA Labs that provides vision-language model architectures and utilities. AWQ uses VILA for multimodal model support including image/video processing, media extraction, and multimodal tokenization. Without VILA installed, AWQ is limited to text-only LLM quantization and inference. VILA provides the `llava` module with media processing, image constants, and multimodal utilities needed for quantizing and running vision-language models.

Usage

Use this environment when quantizing or running inference on multimodal models including NVILA, VILA 1.0/1.5, InternVL3, or any LLaVA-based vision-language model. Without VILA, attempting to import these model classes will print a warning and the features will be unavailable.

System Requirements

Category Requirement Notes
Hardware NVIDIA GPU Same as base AWQ requirements
Disk Additional 10GB+ VILA framework and vision model weights

Dependencies

Python Packages

  • `llava` (from VILA repository; not on PyPI)
    • Provides: `llava.media`, `llava.utils.media`, `llava.constants`, `llava.mm_utils`

Installation

VILA must be installed from source:

git clone https://github.com/NVlabs/VILA.git
cd VILA
pip install -e .

Credentials

No additional credentials required beyond the base Python runtime environment.

Quick Install

# Install VILA from source (required for multimodal features)
git clone https://github.com/NVlabs/VILA.git
cd VILA
pip install -e .

Code Evidence

Optional import with graceful fallback from `awq/quantize/smooth.py:5-12`:

try:
    import llava
    from llava.media import Image, Video
    from llava.utils.media import extract_media
    from llava.constants import DEFAULT_IMAGE_TOKEN
    from llava.mm_utils import process_image, process_images
except ImportError:
    print("VILA is not installed. Multimodal features will not be available. "
          "To activate, please install VILA at https://github.com/NVlabs/VILA.")

InternVL3 import guard from `tinychat/models/__init__.py:6-9`:

try:
    from .internvl3 import InternVL3
except ImportError as e:
    print("InternVL3 model import failure. To activate, please install VILA "
          "at https://github.com/NVlabs/VILA.")

LLaVA model optional import from `awq/quantize/pre_quant.py:12-15`:

try:
    from tinychat.models import LlavaLlamaForCausalLM
except ImportError as e:
    pass

Common Errors

Error Message Cause Solution
`VILA is not installed. Multimodal features will not be available.` VILA package not found Clone and install VILA: `git clone https://github.com/NVlabs/VILA && cd VILA && pip install -e .`
`InternVL3 model import failure` VILA not installed Install VILA as above; InternVL3 depends on VILA utilities
`AttributeError: module 'llava' has no attribute ...` Incompatible VILA version Update VILA to latest version from GitHub

Compatibility Notes

  • Fully Optional: AWQ works for text-only LLM quantization without VILA
  • NVILA/VILA/InternVL3: All multimodal model variants require VILA
  • Not on PyPI: Must be installed from source via `pip install -e .`
  • Jetson: VILA works on Jetson but requires compatible torchvision installation

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment