Implementation:FlagOpen FlagEmbedding FlagDRESModel
| Knowledge Sources | |
|---|---|
| Domains | Dense Retrieval, Embedding, C-MTEB |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
A dense retrieval model wrapper for Chinese multilingual text embedding benchmark (C-MTEB) evaluation.
Description
The FlagDRESModel class provides a standardized interface for dense retrieval models compatible with the C-MTEB evaluation framework. It implements query encoding with optional instruction prefixing for retrieval tasks, corpus encoding supporting both dictionary and string formats, flexible pooling methods (CLS token or mean pooling), optional embedding normalization for cosine similarity, and multi-GPU support with automatic data parallelization. The class is designed to work with Hugging Face Transformers models and supports both CUDA and NPU devices.
Usage
Use this class when evaluating Chinese text embedding models on the C-MTEB benchmark, implementing dense retrieval systems with instruction-augmented queries, and wrapping custom models for compatibility with MTEB evaluation protocols. The class is specifically designed for the C-MTEB research project.
Code Reference
Source Location
- Repository: FlagOpen_FlagEmbedding
- File: research/C_MTEB/flag_dres_model.py
- Lines: 1-97
Signature
class FlagDRESModel:
def __init__(
self,
model_name_or_path: str = None,
pooling_method: str = 'cls',
normalize_embeddings: bool = True,
query_instruction_for_retrieval: str = None,
batch_size: int = 256,
) -> None:
pass
def encode_queries(self, queries: List[str], **kwargs) -> np.ndarray:
"""Encode queries with optional instruction"""
def encode_corpus(self, corpus: List[Union[Dict[str, str], str]], **kwargs) -> np.ndarray:
"""Encode corpus documents"""
def encode(self, sentences: List[str], **kwargs) -> np.ndarray:
"""Core encoding function"""
Import
from flag_dres_model import FlagDRESModel
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_name_or_path | str | Yes | Model name or path on Hugging Face |
| pooling_method | str | No | "cls" or "mean" (default: "cls") |
| normalize_embeddings | bool | No | Whether to normalize embeddings (default: True) |
| query_instruction_for_retrieval | str | No | Instruction prefix for queries |
| batch_size | int | No | Batch size for encoding (default: 256) |
| queries | List[str] | Yes | List of query strings |
| corpus | List[Union[Dict, str]] | Yes | List of documents (dicts or strings) |
Outputs
| Name | Type | Description |
|---|---|---|
| embeddings | np.ndarray | Encoded embeddings (N, D) where D is embedding dimension |
Usage Examples
# Example 1: Basic usage with Chinese model
from flag_dres_model import FlagDRESModel
model = FlagDRESModel(
model_name_or_path="BAAI/bge-base-zh-v1.5",
pooling_method="cls",
normalize_embeddings=True,
batch_size=256
)
# Encode queries
queries = ["什么是机器学习?", "深度学习的应用"]
query_embeddings = model.encode_queries(queries)
print(query_embeddings.shape) # (2, 768)
# Encode corpus
corpus = [
{"title": "机器学习", "text": "机器学习是人工智能的一个分支..."},
{"title": "深度学习", "text": "深度学习是机器学习的一个子集..."}
]
corpus_embeddings = model.encode_corpus(corpus)
print(corpus_embeddings.shape) # (2, 768)
# Compute similarities
similarities = query_embeddings @ corpus_embeddings.T
print(similarities)
# Example 2: With retrieval instruction
model_with_inst = FlagDRESModel(
model_name_or_path="BAAI/bge-large-zh-v1.5",
query_instruction_for_retrieval="为这个句子生成表示以用于检索相关文章:",
normalize_embeddings=True
)
queries = ["人工智能的历史", "自然语言处理"]
query_emb = model_with_inst.encode_queries(queries)
# Example 3: Mean pooling
model_mean = FlagDRESModel(
model_name_or_path="BAAI/bge-base-zh-v1.5",
pooling_method="mean",
normalize_embeddings=True
)
texts = ["这是第一个句子", "这是第二个句子"]
embeddings = model_mean.encode(texts)
print(embeddings.shape)
# Example 4: C-MTEB evaluation compatibility
# The model follows MTEB interface for automatic evaluation
import mteb
model = FlagDRESModel(model_name_or_path="BAAI/bge-base-zh-v1.5")
tasks = mteb.get_tasks(tasks=["T2Retrieval"]) # Chinese task
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(model)