Principle:FlagOpen FlagEmbedding Evaluation Model Loading
| Sources | Repo: FlagOpen/FlagEmbedding |
|---|---|
| Domains | NLP, Information_Retrieval, Evaluation |
Overview
A coordinated loading pattern that instantiates both an embedding model and an optional reranker model from evaluation arguments for benchmarking.
Description
For evaluation, both the embedder and reranker must be loaded together with consistent configuration. The AbsEvalRunner.get_models() static method orchestrates this loading process:
- Embedder loading: The embedding model is loaded via
FlagAutoModel.from_finetuned(), which automatically detects the model architecture and instantiates the appropriate embedder class. Configuration parameters passed include: model path, model class, normalization settings, pooling method, precision (FP16), query instructions, device allocation, batch size, and max lengths. - Reranker loading (optional): If
reranker_name_or_pathis provided in the model arguments, the reranker is loaded viaFlagAutoReranker.from_finetuned(). This supports multiple reranker architectures including encoder-only, decoder-only, layerwise, and lightweight variants. Additional reranker-specific parameters include PEFT adapter path, BF16 precision, passage instructions, prompt, cutoff layers, and compression settings. - Evaluation wrapping: The loaded models are wrapped in evaluation-specific classes. The embedder is wrapped in EvalDenseRetriever (which adds
search_top_kandoverwriteconfiguration), and the reranker is wrapped in EvalReranker (which addsrerank_top_kconfiguration).
The loading supports four embedder model classes (encoder-only-base, encoder-only-m3, decoder-only-base, decoder-only-icl) and four reranker model classes (encoder-only-base, decoder-only-base, decoder-only-layerwise, decoder-only-lightweight).
Usage
At the start of any evaluation run. The AbsEvalRunner.__init__ method calls load_retriever_and_reranker(), which internally invokes get_models() and wraps the results in evaluation classes.
Theoretical Basis
Two-stage retrieval (retrieve then rerank) is a widely adopted pattern in information retrieval that balances efficiency and effectiveness. The first stage uses a fast dense retriever to narrow down the candidate set from millions of documents to a manageable top-k (typically 1000). The second stage applies a more expensive cross-encoder reranker to re-score only the top candidates (typically 100). This requires coordinated model loading to ensure both models are configured consistently (same devices, compatible precision, aligned max lengths). The evaluation wrappers add the search_top_k and rerank_top_k configuration on top of the base models, enforcing the two-stage pipeline structure.