Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Mit han lab Llm awq Awq config export

From Leeroopedia

Overview

Concrete tool for converting AWQ checkpoints to HuggingFace Hub format provided by the llm-awq library (Wrapper Doc type).

Source

examples/convert_to_hf.py, Lines 44-69

Doc Type

This is a Wrapper Doc documenting how the repository uses external HuggingFace APIs (AwqConfig, AutoConfig, HfApi).

Key APIs Used

# Create quantization config
quantization_config = AwqConfig(
    bits=args.w_bit,
    group_size=args.q_group_size,
    zero_point=not args.no_zero_point,
    backend="llm-awq",
    version="gemv",
)

# Load and patch config
config = AutoConfig.from_pretrained(original_model_path)
config.quantization_config = quantization_config

# Push to hub
config.push_to_hub(quantized_model_hub_path)
tok.push_to_hub(quantized_model_hub_path)

# Upload weights
api.upload_file(
    path_or_fileobj=quantized_model_path,
    path_in_repo="pytorch_model.bin",
    repo_id=quantized_model_hub_path,
    repo_type="model",
)

Import

from transformers import AwqConfig, AutoConfig
from huggingface_hub import HfApi

I/O

Inputs:

  • original model path (str) - path to the original unquantized model
  • quantized model path (str) - path to the AWQ-quantized checkpoint file
  • hub repo path (str) - target HuggingFace Hub repository ID
  • w_bit (int) - quantization bit width (e.g., 4)
  • q_group_size (int) - quantization group size (e.g., 128)
  • no_zero_point (bool) - whether to disable zero-point quantization

Outputs:

  • HuggingFace Hub repository containing:
    • config.json - model configuration with quantization metadata
    • Tokenizer files (tokenizer.json, tokenizer_config.json, etc.)
    • pytorch_model.bin - the quantized model weights

Related Pages

Knowledge Sources

Domains

  • Deployment
  • Model_Distribution

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment