Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:FlagOpen FlagEmbedding FlagDRESModel

From Leeroopedia


Knowledge Sources
Domains Dense Retrieval, Embedding, C-MTEB
Last Updated 2026-02-09 00:00 GMT

Overview

A dense retrieval model wrapper for Chinese multilingual text embedding benchmark (C-MTEB) evaluation.

Description

The FlagDRESModel class provides a standardized interface for dense retrieval models compatible with the C-MTEB evaluation framework. It implements query encoding with optional instruction prefixing for retrieval tasks, corpus encoding supporting both dictionary and string formats, flexible pooling methods (CLS token or mean pooling), optional embedding normalization for cosine similarity, and multi-GPU support with automatic data parallelization. The class is designed to work with Hugging Face Transformers models and supports both CUDA and NPU devices.

Usage

Use this class when evaluating Chinese text embedding models on the C-MTEB benchmark, implementing dense retrieval systems with instruction-augmented queries, and wrapping custom models for compatibility with MTEB evaluation protocols. The class is specifically designed for the C-MTEB research project.

Code Reference

Source Location

Signature

class FlagDRESModel:
    def __init__(
        self,
        model_name_or_path: str = None,
        pooling_method: str = 'cls',
        normalize_embeddings: bool = True,
        query_instruction_for_retrieval: str = None,
        batch_size: int = 256,
    ) -> None:
        pass

    def encode_queries(self, queries: List[str], **kwargs) -> np.ndarray:
        """Encode queries with optional instruction"""

    def encode_corpus(self, corpus: List[Union[Dict[str, str], str]], **kwargs) -> np.ndarray:
        """Encode corpus documents"""

    def encode(self, sentences: List[str], **kwargs) -> np.ndarray:
        """Core encoding function"""

Import

from flag_dres_model import FlagDRESModel

I/O Contract

Inputs

Name Type Required Description
model_name_or_path str Yes Model name or path on Hugging Face
pooling_method str No "cls" or "mean" (default: "cls")
normalize_embeddings bool No Whether to normalize embeddings (default: True)
query_instruction_for_retrieval str No Instruction prefix for queries
batch_size int No Batch size for encoding (default: 256)
queries List[str] Yes List of query strings
corpus List[Union[Dict, str]] Yes List of documents (dicts or strings)

Outputs

Name Type Description
embeddings np.ndarray Encoded embeddings (N, D) where D is embedding dimension

Usage Examples

# Example 1: Basic usage with Chinese model
from flag_dres_model import FlagDRESModel

model = FlagDRESModel(
    model_name_or_path="BAAI/bge-base-zh-v1.5",
    pooling_method="cls",
    normalize_embeddings=True,
    batch_size=256
)

# Encode queries
queries = ["什么是机器学习?", "深度学习的应用"]
query_embeddings = model.encode_queries(queries)
print(query_embeddings.shape)  # (2, 768)

# Encode corpus
corpus = [
    {"title": "机器学习", "text": "机器学习是人工智能的一个分支..."},
    {"title": "深度学习", "text": "深度学习是机器学习的一个子集..."}
]
corpus_embeddings = model.encode_corpus(corpus)
print(corpus_embeddings.shape)  # (2, 768)

# Compute similarities
similarities = query_embeddings @ corpus_embeddings.T
print(similarities)

# Example 2: With retrieval instruction
model_with_inst = FlagDRESModel(
    model_name_or_path="BAAI/bge-large-zh-v1.5",
    query_instruction_for_retrieval="为这个句子生成表示以用于检索相关文章:",
    normalize_embeddings=True
)

queries = ["人工智能的历史", "自然语言处理"]
query_emb = model_with_inst.encode_queries(queries)

# Example 3: Mean pooling
model_mean = FlagDRESModel(
    model_name_or_path="BAAI/bge-base-zh-v1.5",
    pooling_method="mean",
    normalize_embeddings=True
)

texts = ["这是第一个句子", "这是第二个句子"]
embeddings = model_mean.encode(texts)
print(embeddings.shape)

# Example 4: C-MTEB evaluation compatibility
# The model follows MTEB interface for automatic evaluation
import mteb

model = FlagDRESModel(model_name_or_path="BAAI/bge-base-zh-v1.5")
tasks = mteb.get_tasks(tasks=["T2Retrieval"])  # Chinese task
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(model)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment