Implementation:FlagOpen FlagEmbedding LLM Embedder LM Model
| Knowledge Sources | |
|---|---|
| Domains | Large_Language_Models, Text_Generation, Perplexity_Evaluation |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Language model wrapper providing unified interface for generation and perplexity computation across causal and encoder-decoder LMs.
Description
The LM class wraps HuggingFace language models with utilities for common evaluation tasks:
Initialization: Automatically loads causal LMs (GPT, LLaMA) or encoder-decoder models (T5, BART) with configurable precision (fp16, bf16, fp32) and device mapping. Handles tokenizer setup including pad token configuration for models without predefined pad tokens.
Generation: The generate() method performs batched text generation with optional new-token-only output and automatic decoding. It handles distributed generation via Accelerate, padding outputs across processes for gathering metrics.
Perplexity computation: The compute_nlls() method calculates negative log-likelihoods on labeled data, supporting:
- Causal LMs: Shifts logits/labels by one position
- Encoder-decoder models: Uses standard alignment
- Masked tokens: Only computes loss on non-masked tokens (labels != -100)
- Per-sample normalization: Divides total loss by valid token count
Both methods integrate with Accelerate for multi-GPU evaluation and handle edge cases like encoder-decoder position tracking.
Usage
Use this as a standard wrapper for evaluating any HuggingFace LM on generation or perplexity tasks with consistent APIs across model architectures.
Code Reference
Source Location
- Repository: FlagOpen_FlagEmbedding
- File: research/llm_embedder/src/lm/modeling_lm.py
- Lines: 1-173
Signature
class LM(torch.nn.Module):
def __init__(self, model_name_or_path=None, padding_side="left",
dtype="bf16", cache_dir="/share/LMs", device_map=None,
accelerator: Accelerator=None, generation_args: Dict=None)
def compute_nlls(self, dataloader)
def generate(self, dataloader, return_new_tokens_only=True, decode=True, **gen_kwargs)
Import
from research.llm_embedder.src.lm import LM
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_name_or_path | str | Yes | HuggingFace model name or local path |
| padding_side | str | No | "left" or "right" padding (default: "left") |
| dtype | str | No | Precision: "bf16", "fp16", "fp32" (default: "bf16") |
| device_map | str | No | Device mapping strategy (e.g., "auto") |
| accelerator | Accelerator | No | Accelerate instance for distributed eval |
| generation_args | Dict | No | Generation config (max_new_tokens, temperature, etc.) |
| dataloader | DataLoader | Yes | DataLoader with input_ids, attention_mask, labels/query_id |
Outputs
| Name | Type | Description |
|---|---|---|
| query_ids | List | Query IDs (if provided in dataloader) |
| nlls | List[float] | Negative log-likelihoods per sample |
| generations | List[str] | Generated text strings (if decode=True) or token IDs |
Usage Examples
from accelerate import Accelerator
from torch.utils.data import DataLoader
from research.llm_embedder.src.lm import LM
# Initialize LM
accelerator = Accelerator()
lm = LM(
model_name_or_path="meta-llama/Llama-2-7b-hf",
dtype="bf16",
device_map="auto",
accelerator=accelerator,
generation_args={"max_new_tokens": 50, "temperature": 0.7}
)
# Compute perplexity
dataloader = DataLoader(dataset, batch_size=4, collate_fn=collator)
dataloader = accelerator.prepare(dataloader)
query_ids, nlls = lm.compute_nlls(dataloader)
perplexity = np.exp(np.mean(nlls))
print(f"Perplexity: {perplexity:.2f}")
# Generate text
gen_dataloader = DataLoader(gen_dataset, batch_size=4)
gen_dataloader = accelerator.prepare(gen_dataloader)
query_ids, generations = lm.generate(gen_dataloader)
for qid, gen in zip(query_ids, generations):
print(f"Query {qid}: {gen}")