Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Sail sg LongSpec Glide Inference Init

From Leeroopedia
Revision as of 13:49, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Sail_sg_LongSpec_Glide_Inference_Init.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains LLM_Inference, Model_Architecture
Last Updated 2026-02-14 05:00 GMT

Overview

Concrete tool for loading trained GLIDE inference models with target LLM, draft layer, and model registry for LongBench and AIME evaluation.

Description

The inference-side LlamaGlide and Qwen2Glide classes (in longspec/test/) provide the same initialization pattern as training but with inference-specific methods. The loading process uses a model registry (model_names dict) mapping friendly names to HuggingFace paths.

The initialization includes:

  • AutoConfig.from_pretrained() to load target model architecture config
  • AutoTokenizer.from_pretrained() to load the tokenizer
  • LlamaGlide/Qwen2Glide constructor to load target + draft models
  • Model-specific token configuration (e.g., QwQ requires specific pad/eos tokens)

Usage

Used at the beginning of inference scripts (inference_long-bench.py, inference_qwq.py) to set up the model before running evaluation.

Code Reference

Source Location

  • Repository: LongSpec
  • File (Llama): longspec/test/llama_glide.py
  • Lines: L472-490
  • File (Qwen2): longspec/test/qwen2_glide.py
  • Lines: L475-493
  • File (LongBench loading): longspec/test/inference_long-bench.py
  • Lines: L82-112
  • File (QwQ loading): longspec/test/inference_qwq.py
  • Lines: L31-46

Signature

# Model classes (same __init__ as training side):
class LlamaGlide(LlamaForCausalLM):
    def __init__(
        self,
        config: LlamaConfig,
        target_model_path: str,
        glide_path: Optional[str] = None,
    ) -> None:
        """
        Args:
            config: LlamaConfig from AutoConfig.from_pretrained()
            target_model_path: HuggingFace path to target LLM
            glide_path: Path to trained GLIDE draft layer weights
        """

class Qwen2Glide(Qwen2ForCausalLM):
    def __init__(
        self,
        config: Qwen2Config,
        target_model_path: str,
        glide_path: Optional[str] = None,
    ) -> None:
        """Same interface as LlamaGlide for Qwen2 architecture."""

# Model registry (inference_long-bench.py):
model_names = {
    "vicuna7b": {
        "target": "lmsys/vicuna-7b-v1.5",
        "draft": "sail/longspec-vicuna-7b-v1.5",
    },
    "llama8b": {
        "target": "meta-llama/Llama-3.1-8B-Instruct",
        "draft": "sail/longspec-Llama-3.1-8B-Instruct",
    },
    # ... more models
}

Import

from transformers import AutoConfig, AutoTokenizer
from longspec.test.llama_glide import LlamaGlide
from longspec.test.qwen2_glide import Qwen2Glide

I/O Contract

Inputs

Name Type Required Description
model_name str Yes Key into model_names registry (e.g., "vicuna7b", "qwq")
config AutoConfig Yes Architecture config loaded from target model
target_model_path str Yes HuggingFace path to target LLM
glide_path str Yes HuggingFace path or local path to trained GLIDE draft layer

Outputs

Name Type Description
model LlamaGlide / Qwen2Glide Fully loaded inference model on CUDA with target + draft
tokenizer AutoTokenizer Configured tokenizer with proper special tokens

Usage Examples

LongBench Model Loading

from transformers import AutoConfig, AutoTokenizer
from longspec.test.llama_glide import LlamaGlide

model_names = {
    "vicuna7b": {
        "target": "lmsys/vicuna-7b-v1.5",
        "draft": "sail/longspec-vicuna-7b-v1.5",
    },
}

# Select model
model_name = "vicuna7b"
target = model_names[model_name]["target"]
draft = model_names[model_name]["draft"]

# Load config and tokenizer
config = AutoConfig.from_pretrained(target)
tokenizer = AutoTokenizer.from_pretrained(target)

# Initialize model
model = LlamaGlide(config, target, draft)

QwQ Model Loading (with Token Configuration)

from transformers import AutoConfig, AutoTokenizer
from longspec.test.qwen2_glide import Qwen2Glide

target = "Qwen/QwQ-32B-Preview"
draft = "sail/longspec-QwQ-32B-Preview"

config = AutoConfig.from_pretrained(target)
config.pad_token_id = 151643  # QwQ-specific pad token
config.eos_token_id = 151645  # QwQ-specific EOS token

tokenizer = AutoTokenizer.from_pretrained(target)

model = Qwen2Glide(config, target, draft)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment