Implementation:FlagOpen FlagEmbedding BGE Coder CoIR Eval

Knowledge Sources	FlagOpen_FlagEmbedding
Domains	Code Retrieval, Benchmark Evaluation, Information Retrieval
Last Updated	2026-02-09 00:00 GMT

Overview

Evaluation script for running CoIR (Code Information Retrieval) benchmark tasks with FlagEmbedding models.

Description

This module provides a comprehensive evaluation framework for code retrieval models on the CoIR benchmark. It supports both encoder-only and decoder-only embedding models through FlagModel and FlagLLMModel wrappers, implements custom query and corpus encoding with special instruction handling, and computes NDCG@10 metrics across multiple code retrieval tasks including CodeSearchNet variants. The script handles task-specific instructions for improved retrieval performance, aggregates results across language-specific CodeSearchNet tasks, and saves detailed per-task and overall evaluation results in JSON format.

Usage

Use this script when evaluating embedding models on the CoIR code retrieval benchmark, comparing different model architectures (encoder-only vs decoder-only) on code search tasks, and generating comprehensive evaluation reports with NDCG metrics. The script is specifically designed for the BGE-Coder model evaluation pipeline.

Code Reference

Source Location

Repository: FlagOpen_FlagEmbedding
File: research/BGE_Coder/evaluation/coir_eval/main.py
Lines: 1-167

Signature

def get_model(model_args: COIREvalModelArgs):
    """Initialize FlagModel or FlagLLMModel based on configuration"""

def main(
    eval_args: COIREvalArgs,
    model_args: COIREvalModelArgs
):
    """Run CoIR evaluation on specified tasks"""

class CustomFlagModel:
    """Wrapper for FlagModel with dict-based input handling"""
    def encode_queries(self, queries, show_progress_bar, convert_to_tensor, **kwargs):
        pass
    def encode_corpus(self, corpus, show_progress_bar, convert_to_tensor, **kwargs):
        pass

Import

from main import get_model, main

I/O Contract

Inputs

Name	Type	Required	Description
embedder_name_or_path	str	Yes	Model name or path for embedding
embedder_model_class	str	Yes	"encoder-only-base" or "decoder-only-base"
tasks	List[str] or str	Yes	CoIR task name(s) to evaluate
output_dir	str	Yes	Directory to save evaluation results
use_special_instructions	bool	No	Use task-specific instructions
embedder_batch_size	int	No	Batch size for encoding
normalize_embeddings	bool	No	Whether to normalize embeddings

Outputs

Name	Type	Description
task_results	dict	Per-task NDCG@10 scores saved as JSON files
overall_results	dict	Aggregated results including CodeSearchNet averages

Usage Examples

# Example: Run CoIR evaluation from command line
# python main.py \
#   --embedder_name_or_path BAAI/bge-base-en-v1.5 \
#   --embedder_model_class encoder-only-base \
#   --tasks CodeSearchNet-python CodeSearchNet-java \
#   --output_dir ./results \
#   --embedder_batch_size 256 \
#   --use_special_instructions

# Example: Programmatic usage
from transformers import HfArgumentParser
from arguments import COIREvalArgs, COIREvalModelArgs
from main import main

# Setup arguments
eval_args = COIREvalArgs(
    tasks=["CodeSearchNet-python"],
    output_dir="./eval_results",
    use_special_instructions=True
)

model_args = COIREvalModelArgs(
    embedder_name_or_path="BAAI/bge-base-en-v1.5",
    embedder_model_class="encoder-only-base",
    embedder_batch_size=256,
    normalize_embeddings=True
)

# Run evaluation
main(eval_args, model_args)

# Results will be saved to:
# - ./eval_results/bge-base-en-v1.5/CodeSearchNet-python.json
# - ./eval_results/bge-base-en-v1.5/OVERALL-results.json

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment