Implementation:Mit han lab Llm awq LMEvalAdaptor
Appearance
Overview
Concrete tool for wrapping AWQ-quantized models as lm-eval-harness compatible evaluators provided by the llm-awq library.
Source
awq/utils/lm_eval_adaptor.py, Lines 7-116
Signature
class LMEvalAdaptor(BaseLM):
def __init__(self, model_name, model, tokenizer, batch_size=1, max_length=-1):
Import
from awq.utils.lm_eval_adaptor import LMEvalAdaptor
I/O
Constructor
Inputs:
- model_name (str) - name of the model (used for identification)
- model (nn.Module) - the quantized model instance
- tokenizer (PreTrainedTokenizer) - the tokenizer for the model
- batch_size (int, default 1) - evaluation batch size
- max_length (int, default -1) - maximum sequence length; -1 means auto-detect from model config
Key Methods
- _model_call(inps) - runs forward pass on input token IDs, returns logits tensor
- _model_generate(context, max_length, eos_token_id) - generates tokens autoregressively from a context, returns generated token tensor
- tok_encode(string) - encodes a string to a list of token IDs
- tok_decode(tokens) - decodes a list of token IDs back to a string
Properties
- eot_token_id - end-of-text token ID from the tokenizer
- max_length - maximum sequence length (auto-detected if -1)
- max_gen_toks - maximum generation tokens, fixed at 256
- batch_size - evaluation batch size
- device - fixed to "cuda"
Related Pages
- Principle:Mit_han_lab_Llm_awq_LM_Evaluation_Harness_Adaptation
- Environment:Mit_han_lab_Llm_awq_Python_Runtime_Environment
Knowledge Sources
- Repo|llm-awq|https://github.com/mit-han-lab/llm-awq
Domains
- NLP
- Evaluation
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment