Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:FlagOpen FlagEmbedding Evaluation Model Loading

From Leeroopedia


Sources Repo: FlagOpen/FlagEmbedding
Domains NLP, Information_Retrieval, Evaluation

Overview

A coordinated loading pattern that instantiates both an embedding model and an optional reranker model from evaluation arguments for benchmarking.

Description

For evaluation, both the embedder and reranker must be loaded together with consistent configuration. The AbsEvalRunner.get_models() static method orchestrates this loading process:

  1. Embedder loading: The embedding model is loaded via FlagAutoModel.from_finetuned(), which automatically detects the model architecture and instantiates the appropriate embedder class. Configuration parameters passed include: model path, model class, normalization settings, pooling method, precision (FP16), query instructions, device allocation, batch size, and max lengths.
  2. Reranker loading (optional): If reranker_name_or_path is provided in the model arguments, the reranker is loaded via FlagAutoReranker.from_finetuned(). This supports multiple reranker architectures including encoder-only, decoder-only, layerwise, and lightweight variants. Additional reranker-specific parameters include PEFT adapter path, BF16 precision, passage instructions, prompt, cutoff layers, and compression settings.
  3. Evaluation wrapping: The loaded models are wrapped in evaluation-specific classes. The embedder is wrapped in EvalDenseRetriever (which adds search_top_k and overwrite configuration), and the reranker is wrapped in EvalReranker (which adds rerank_top_k configuration).

The loading supports four embedder model classes (encoder-only-base, encoder-only-m3, decoder-only-base, decoder-only-icl) and four reranker model classes (encoder-only-base, decoder-only-base, decoder-only-layerwise, decoder-only-lightweight).

Usage

At the start of any evaluation run. The AbsEvalRunner.__init__ method calls load_retriever_and_reranker(), which internally invokes get_models() and wraps the results in evaluation classes.

Theoretical Basis

Two-stage retrieval (retrieve then rerank) is a widely adopted pattern in information retrieval that balances efficiency and effectiveness. The first stage uses a fast dense retriever to narrow down the candidate set from millions of documents to a manageable top-k (typically 1000). The second stage applies a more expensive cross-encoder reranker to re-score only the top candidates (typically 100). This requires coordinated model loading to ensure both models are configured consistently (same devices, compatible precision, aligned max lengths). The evaluation wrappers add the search_top_k and rerank_top_k configuration on top of the base models, enforcing the two-stage pipeline structure.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment