Environment:Mit han lab Llm awq Python Runtime Environment

Knowledge Sources	llm-awq PyPI pyproject.toml
Domains	Infrastructure, Deep_Learning
Last Updated	2026-02-15 01:00 GMT

Overview

Python 3.8+ runtime with PyTorch 2.3.0, Transformers 4.46.0, and pinned dependencies for AWQ quantization and evaluation.

Description

This environment provides the core Python runtime and library stack required for all AWQ operations: model loading, quantization, evaluation, and export. The project uses strict version pinning for most critical dependencies (PyTorch, Transformers, Accelerate, lm-eval) to ensure reproducible quantization results. The environment includes HuggingFace ecosystem libraries for model management, lm-eval-harness for benchmark evaluation, and Gradio for the TinyChat serving UI.

Usage

Use this environment for all AWQ operations including model quantization (`awq/entry.py`), perplexity evaluation, lm-eval-harness benchmarks, HuggingFace model export, and TinyChat inference. This is the base prerequisite for every Implementation in the repository.

System Requirements

Category	Requirement	Notes
OS	Linux (Ubuntu 20.04+ recommended)	Jetson requires JetPack-compatible Python
Hardware	NVIDIA GPU with CUDA support	CPU-only not supported for quantization or inference
RAM	32GB+ recommended	Large models require significant host memory for loading
Disk	50GB+	Model checkpoints and calibration data

Dependencies

Python Runtime

`python` >= 3.8 (3.10 recommended; 3.8 for Jetson JetPack 5)

Core Packages (Pinned Versions)

`torch` == 2.3.0
`torchvision` == 0.18.0
`transformers` == 4.46.0
`accelerate` == 0.34.2
`lm_eval` == 0.3.0
`gradio` == 3.35.2
`gradio_client` == 0.2.9
`pydantic` == 1.10.19

Core Packages (Flexible Versions)

`tokenizers` >= 0.12.1
`sentencepiece`
`texttable`
`toml`
`attributedict`
`protobuf`
`fastapi`
`uvicorn`

Credentials

The following environment variables may be needed depending on usage:

`CUDA_VISIBLE_DEVICES`: Controls which GPUs are visible (auto-parallel assumes up to 8 if unset)
`PYTORCH_CUDA_ALLOC_CONF`: Set to `expandable_segments:True` for InternVL3 inference
`OPENAI_API_KEY`: Only required if using content moderation via `log_utils.violates_moderation()`

Quick Install

# Install AWQ and all dependencies
pip install -e .

# For Jetson devices: comment out torch==2.3.0 in pyproject.toml first,
# then install NVIDIA prebuilt PyTorch >= 2.0.0

Code Evidence

Version pinning from `pyproject.toml:10-24`:

requires-python = ">=3.8"
dependencies = [
    "accelerate==0.34.2", "sentencepiece", "tokenizers>=0.12.1",
    "torch==2.3.0", "torchvision==0.18.0",
    "transformers==4.46.0",
    "lm_eval==0.3.0", "texttable",
    "toml", "attributedict",
    "protobuf",
    "gradio==3.35.2", "gradio_client==0.2.9",
    "fastapi", "uvicorn",
    "pydantic==1.10.19"
]

OOM prevention with transformers cache from `awq/entry.py:142`:

# Note (Haotian): To avoid OOM after huggingface transformers 4.36.2
config.use_cache = False

Multi-GPU auto-parallel from `awq/utils/parallel.py:19-27`:

cuda_visible_devices = os.environ.get("CUDA_VISIBLE_DEVICES", None)
if isinstance(cuda_visible_devices, str):
    cuda_visible_devices = cuda_visible_devices.split(",")
else:
    cuda_visible_devices = list(range(8))

Common Errors

Error Message	Cause	Solution
`ImportError: No module named 'lm_eval'`	lm-eval not installed	`pip install lm_eval==0.3.0`
OOM with HuggingFace transformers >= 4.36.2	KV cache consumes too much memory	Set `config.use_cache = False` (done automatically in entry.py)
`CUDA out of memory` during model loading	GPU VRAM insufficient	Use `--max_memory 0:10GiB cpu:30GiB` for device mapping
Version conflict with transformers	Mismatched transformers version	Pin to `transformers==4.46.0`

Compatibility Notes

Jetson (Edge): Must remove `torch==2.3.0` pin from `pyproject.toml` and install NVIDIA prebuilt PyTorch >= 2.0.0. Use Python 3.8 for JetPack 5.
Multi-GPU: Auto-parallel in `parallel.py` infers GPU count from model size: <20GB=1 GPU, 20-50GB=4 GPUs, >50GB=8 GPUs.
lm-eval Version: Pinned to 0.3.0 which uses the `BaseLM` adapter interface. Newer versions (0.4+) have a different API.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment