Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:FlagOpen FlagEmbedding BGE Coder CoIR Eval

From Leeroopedia


Knowledge Sources
Domains Code Retrieval, Benchmark Evaluation, Information Retrieval
Last Updated 2026-02-09 00:00 GMT

Overview

Evaluation script for running CoIR (Code Information Retrieval) benchmark tasks with FlagEmbedding models.

Description

This module provides a comprehensive evaluation framework for code retrieval models on the CoIR benchmark. It supports both encoder-only and decoder-only embedding models through FlagModel and FlagLLMModel wrappers, implements custom query and corpus encoding with special instruction handling, and computes NDCG@10 metrics across multiple code retrieval tasks including CodeSearchNet variants. The script handles task-specific instructions for improved retrieval performance, aggregates results across language-specific CodeSearchNet tasks, and saves detailed per-task and overall evaluation results in JSON format.

Usage

Use this script when evaluating embedding models on the CoIR code retrieval benchmark, comparing different model architectures (encoder-only vs decoder-only) on code search tasks, and generating comprehensive evaluation reports with NDCG metrics. The script is specifically designed for the BGE-Coder model evaluation pipeline.

Code Reference

Source Location

Signature

def get_model(model_args: COIREvalModelArgs):
    """Initialize FlagModel or FlagLLMModel based on configuration"""

def main(
    eval_args: COIREvalArgs,
    model_args: COIREvalModelArgs
):
    """Run CoIR evaluation on specified tasks"""

class CustomFlagModel:
    """Wrapper for FlagModel with dict-based input handling"""
    def encode_queries(self, queries, show_progress_bar, convert_to_tensor, **kwargs):
        pass
    def encode_corpus(self, corpus, show_progress_bar, convert_to_tensor, **kwargs):
        pass

Import

from main import get_model, main

I/O Contract

Inputs

Name Type Required Description
embedder_name_or_path str Yes Model name or path for embedding
embedder_model_class str Yes "encoder-only-base" or "decoder-only-base"
tasks List[str] or str Yes CoIR task name(s) to evaluate
output_dir str Yes Directory to save evaluation results
use_special_instructions bool No Use task-specific instructions
embedder_batch_size int No Batch size for encoding
normalize_embeddings bool No Whether to normalize embeddings

Outputs

Name Type Description
task_results dict Per-task NDCG@10 scores saved as JSON files
overall_results dict Aggregated results including CodeSearchNet averages

Usage Examples

# Example: Run CoIR evaluation from command line
# python main.py \
#   --embedder_name_or_path BAAI/bge-base-en-v1.5 \
#   --embedder_model_class encoder-only-base \
#   --tasks CodeSearchNet-python CodeSearchNet-java \
#   --output_dir ./results \
#   --embedder_batch_size 256 \
#   --use_special_instructions

# Example: Programmatic usage
from transformers import HfArgumentParser
from arguments import COIREvalArgs, COIREvalModelArgs
from main import main

# Setup arguments
eval_args = COIREvalArgs(
    tasks=["CodeSearchNet-python"],
    output_dir="./eval_results",
    use_special_instructions=True
)

model_args = COIREvalModelArgs(
    embedder_name_or_path="BAAI/bge-base-en-v1.5",
    embedder_model_class="encoder-only-base",
    embedder_batch_size=256,
    normalize_embeddings=True
)

# Run evaluation
main(eval_args, model_args)

# Results will be saved to:
# - ./eval_results/bge-base-en-v1.5/CodeSearchNet-python.json
# - ./eval_results/bge-base-en-v1.5/OVERALL-results.json

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment