Implementation:Mit han lab Llm awq Awq config export

Overview

Concrete tool for converting AWQ checkpoints to HuggingFace Hub format provided by the llm-awq library (Wrapper Doc type).

Source

examples/convert_to_hf.py, Lines 44-69

Doc Type

This is a Wrapper Doc documenting how the repository uses external HuggingFace APIs (AwqConfig, AutoConfig, HfApi).

Key APIs Used

# Create quantization config
quantization_config = AwqConfig(
    bits=args.w_bit,
    group_size=args.q_group_size,
    zero_point=not args.no_zero_point,
    backend="llm-awq",
    version="gemv",
)

# Load and patch config
config = AutoConfig.from_pretrained(original_model_path)
config.quantization_config = quantization_config

# Push to hub
config.push_to_hub(quantized_model_hub_path)
tok.push_to_hub(quantized_model_hub_path)

# Upload weights
api.upload_file(
    path_or_fileobj=quantized_model_path,
    path_in_repo="pytorch_model.bin",
    repo_id=quantized_model_hub_path,
    repo_type="model",
)

Import

from transformers import AwqConfig, AutoConfig
from huggingface_hub import HfApi

I/O

Inputs:

original model path (str) - path to the original unquantized model
quantized model path (str) - path to the AWQ-quantized checkpoint file
hub repo path (str) - target HuggingFace Hub repository ID
w_bit (int) - quantization bit width (e.g., 4)
q_group_size (int) - quantization group size (e.g., 128)
no_zero_point (bool) - whether to disable zero-point quantization

Outputs:

HuggingFace Hub repository containing:
- config.json - model configuration with quantization metadata
- Tokenizer files (tokenizer.json, tokenizer_config.json, etc.)
- pytorch_model.bin - the quantized model weights

Related Pages

Knowledge Sources

Repo|llm-awq|https://github.com/mit-han-lab/llm-awq
Doc|HuggingFace AWQ|https://huggingface.co/docs/transformers/quantization/awq

Domains

Deployment
Model_Distribution

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment