Environment:Mit han lab Llm awq VILA Multimodal Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Multimodal |
| Last Updated | 2026-02-15 01:00 GMT |
Overview
Optional VILA framework dependency for multimodal (vision-language) model quantization and inference with NVILA, VILA 1.5, and InternVL3.
Description
VILA is an external framework from NVIDIA Labs that provides vision-language model architectures and utilities. AWQ uses VILA for multimodal model support including image/video processing, media extraction, and multimodal tokenization. Without VILA installed, AWQ is limited to text-only LLM quantization and inference. VILA provides the `llava` module with media processing, image constants, and multimodal utilities needed for quantizing and running vision-language models.
Usage
Use this environment when quantizing or running inference on multimodal models including NVILA, VILA 1.0/1.5, InternVL3, or any LLaVA-based vision-language model. Without VILA, attempting to import these model classes will print a warning and the features will be unavailable.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Hardware | NVIDIA GPU | Same as base AWQ requirements |
| Disk | Additional 10GB+ | VILA framework and vision model weights |
Dependencies
Python Packages
- `llava` (from VILA repository; not on PyPI)
- Provides: `llava.media`, `llava.utils.media`, `llava.constants`, `llava.mm_utils`
Installation
VILA must be installed from source:
git clone https://github.com/NVlabs/VILA.git
cd VILA
pip install -e .
Credentials
No additional credentials required beyond the base Python runtime environment.
Quick Install
# Install VILA from source (required for multimodal features)
git clone https://github.com/NVlabs/VILA.git
cd VILA
pip install -e .
Code Evidence
Optional import with graceful fallback from `awq/quantize/smooth.py:5-12`:
try:
import llava
from llava.media import Image, Video
from llava.utils.media import extract_media
from llava.constants import DEFAULT_IMAGE_TOKEN
from llava.mm_utils import process_image, process_images
except ImportError:
print("VILA is not installed. Multimodal features will not be available. "
"To activate, please install VILA at https://github.com/NVlabs/VILA.")
InternVL3 import guard from `tinychat/models/__init__.py:6-9`:
try:
from .internvl3 import InternVL3
except ImportError as e:
print("InternVL3 model import failure. To activate, please install VILA "
"at https://github.com/NVlabs/VILA.")
LLaVA model optional import from `awq/quantize/pre_quant.py:12-15`:
try:
from tinychat.models import LlavaLlamaForCausalLM
except ImportError as e:
pass
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `VILA is not installed. Multimodal features will not be available.` | VILA package not found | Clone and install VILA: `git clone https://github.com/NVlabs/VILA && cd VILA && pip install -e .` |
| `InternVL3 model import failure` | VILA not installed | Install VILA as above; InternVL3 depends on VILA utilities |
| `AttributeError: module 'llava' has no attribute ...` | Incompatible VILA version | Update VILA to latest version from GitHub |
Compatibility Notes
- Fully Optional: AWQ works for text-only LLM quantization without VILA
- NVILA/VILA/InternVL3: All multimodal model variants require VILA
- Not on PyPI: Must be installed from source via `pip install -e .`
- Jetson: VILA works on Jetson but requires compatible torchvision installation