Implementation:Sail sg LongSpec Glide Inference Init

Knowledge Sources	LongSpec
Domains	LLM_Inference, Model_Architecture
Last Updated	2026-02-14 05:00 GMT

Overview

Concrete tool for loading trained GLIDE inference models with target LLM, draft layer, and model registry for LongBench and AIME evaluation.

Description

The inference-side LlamaGlide and Qwen2Glide classes (in longspec/test/) provide the same initialization pattern as training but with inference-specific methods. The loading process uses a model registry (model_names dict) mapping friendly names to HuggingFace paths.

The initialization includes:

AutoConfig.from_pretrained() to load target model architecture config
AutoTokenizer.from_pretrained() to load the tokenizer
LlamaGlide/Qwen2Glide constructor to load target + draft models
Model-specific token configuration (e.g., QwQ requires specific pad/eos tokens)

Usage

Used at the beginning of inference scripts (inference_long-bench.py, inference_qwq.py) to set up the model before running evaluation.

Code Reference

Source Location

Repository: LongSpec
File (Llama): longspec/test/llama_glide.py
Lines: L472-490
File (Qwen2): longspec/test/qwen2_glide.py
Lines: L475-493
File (LongBench loading): longspec/test/inference_long-bench.py
Lines: L82-112
File (QwQ loading): longspec/test/inference_qwq.py
Lines: L31-46

Signature

# Model classes (same __init__ as training side):
class LlamaGlide(LlamaForCausalLM):
    def __init__(
        self,
        config: LlamaConfig,
        target_model_path: str,
        glide_path: Optional[str] = None,
    ) -> None:
        """
        Args:
            config: LlamaConfig from AutoConfig.from_pretrained()
            target_model_path: HuggingFace path to target LLM
            glide_path: Path to trained GLIDE draft layer weights
        """

class Qwen2Glide(Qwen2ForCausalLM):
    def __init__(
        self,
        config: Qwen2Config,
        target_model_path: str,
        glide_path: Optional[str] = None,
    ) -> None:
        """Same interface as LlamaGlide for Qwen2 architecture."""

# Model registry (inference_long-bench.py):
model_names = {
    "vicuna7b": {
        "target": "lmsys/vicuna-7b-v1.5",
        "draft": "sail/longspec-vicuna-7b-v1.5",
    },
    "llama8b": {
        "target": "meta-llama/Llama-3.1-8B-Instruct",
        "draft": "sail/longspec-Llama-3.1-8B-Instruct",
    },
    # ... more models
}

Import

from transformers import AutoConfig, AutoTokenizer
from longspec.test.llama_glide import LlamaGlide
from longspec.test.qwen2_glide import Qwen2Glide

I/O Contract

Inputs

Name	Type	Required	Description
model_name	str	Yes	Key into model_names registry (e.g., "vicuna7b", "qwq")
config	AutoConfig	Yes	Architecture config loaded from target model
target_model_path	str	Yes	HuggingFace path to target LLM
glide_path	str	Yes	HuggingFace path or local path to trained GLIDE draft layer

Outputs

Name	Type	Description
model	LlamaGlide / Qwen2Glide	Fully loaded inference model on CUDA with target + draft
tokenizer	AutoTokenizer	Configured tokenizer with proper special tokens

Usage Examples

LongBench Model Loading

from transformers import AutoConfig, AutoTokenizer
from longspec.test.llama_glide import LlamaGlide

model_names = {
    "vicuna7b": {
        "target": "lmsys/vicuna-7b-v1.5",
        "draft": "sail/longspec-vicuna-7b-v1.5",
    },
}

# Select model
model_name = "vicuna7b"
target = model_names[model_name]["target"]
draft = model_names[model_name]["draft"]

# Load config and tokenizer
config = AutoConfig.from_pretrained(target)
tokenizer = AutoTokenizer.from_pretrained(target)

# Initialize model
model = LlamaGlide(config, target, draft)

QwQ Model Loading (with Token Configuration)

from transformers import AutoConfig, AutoTokenizer
from longspec.test.qwen2_glide import Qwen2Glide

target = "Qwen/QwQ-32B-Preview"
draft = "sail/longspec-QwQ-32B-Preview"

config = AutoConfig.from_pretrained(target)
config.pad_token_id = 151643  # QwQ-specific pad token
config.eos_token_id = 151645  # QwQ-specific EOS token

tokenizer = AutoTokenizer.from_pretrained(target)

model = Qwen2Glide(config, target, draft)

Related Pages

Implements Principle

Principle:Sail_sg_LongSpec_Inference_Model_Loading

Requires Environment

Environment:Sail_sg_LongSpec_Inference_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment