Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:FlagOpen FlagEmbedding LLM Embedder LM Model

From Leeroopedia


Knowledge Sources
Domains Large_Language_Models, Text_Generation, Perplexity_Evaluation
Last Updated 2026-02-09 00:00 GMT

Overview

Language model wrapper providing unified interface for generation and perplexity computation across causal and encoder-decoder LMs.

Description

The LM class wraps HuggingFace language models with utilities for common evaluation tasks:

Initialization: Automatically loads causal LMs (GPT, LLaMA) or encoder-decoder models (T5, BART) with configurable precision (fp16, bf16, fp32) and device mapping. Handles tokenizer setup including pad token configuration for models without predefined pad tokens.

Generation: The generate() method performs batched text generation with optional new-token-only output and automatic decoding. It handles distributed generation via Accelerate, padding outputs across processes for gathering metrics.

Perplexity computation: The compute_nlls() method calculates negative log-likelihoods on labeled data, supporting:

  • Causal LMs: Shifts logits/labels by one position
  • Encoder-decoder models: Uses standard alignment
  • Masked tokens: Only computes loss on non-masked tokens (labels != -100)
  • Per-sample normalization: Divides total loss by valid token count

Both methods integrate with Accelerate for multi-GPU evaluation and handle edge cases like encoder-decoder position tracking.

Usage

Use this as a standard wrapper for evaluating any HuggingFace LM on generation or perplexity tasks with consistent APIs across model architectures.

Code Reference

Source Location

Signature

class LM(torch.nn.Module):
    def __init__(self, model_name_or_path=None, padding_side="left",
                 dtype="bf16", cache_dir="/share/LMs", device_map=None,
                 accelerator: Accelerator=None, generation_args: Dict=None)

    def compute_nlls(self, dataloader)
    def generate(self, dataloader, return_new_tokens_only=True, decode=True, **gen_kwargs)

Import

from research.llm_embedder.src.lm import LM

I/O Contract

Inputs

Name Type Required Description
model_name_or_path str Yes HuggingFace model name or local path
padding_side str No "left" or "right" padding (default: "left")
dtype str No Precision: "bf16", "fp16", "fp32" (default: "bf16")
device_map str No Device mapping strategy (e.g., "auto")
accelerator Accelerator No Accelerate instance for distributed eval
generation_args Dict No Generation config (max_new_tokens, temperature, etc.)
dataloader DataLoader Yes DataLoader with input_ids, attention_mask, labels/query_id

Outputs

Name Type Description
query_ids List Query IDs (if provided in dataloader)
nlls List[float] Negative log-likelihoods per sample
generations List[str] Generated text strings (if decode=True) or token IDs

Usage Examples

from accelerate import Accelerator
from torch.utils.data import DataLoader
from research.llm_embedder.src.lm import LM

# Initialize LM
accelerator = Accelerator()
lm = LM(
    model_name_or_path="meta-llama/Llama-2-7b-hf",
    dtype="bf16",
    device_map="auto",
    accelerator=accelerator,
    generation_args={"max_new_tokens": 50, "temperature": 0.7}
)

# Compute perplexity
dataloader = DataLoader(dataset, batch_size=4, collate_fn=collator)
dataloader = accelerator.prepare(dataloader)
query_ids, nlls = lm.compute_nlls(dataloader)
perplexity = np.exp(np.mean(nlls))
print(f"Perplexity: {perplexity:.2f}")

# Generate text
gen_dataloader = DataLoader(gen_dataset, batch_size=4)
gen_dataloader = accelerator.prepare(gen_dataloader)
query_ids, generations = lm.generate(gen_dataloader)
for qid, gen in zip(query_ids, generations):
    print(f"Query {qid}: {gen}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment