Principle:FlagOpen FlagEmbedding Multi Task Retrieval Embedder

Knowledge Sources	FlagOpen_FlagEmbedding
Domains	Machine Learning, Large Language Models, Multi-Task Learning, Information Retrieval
Last Updated	2026-02-09 00:00 GMT

Overview

Multi-task retrieval training for LLM embedders that jointly optimizes across diverse downstream tasks including semantic search, question answering, and in-context example retrieval to create universal embedding models.

Description

This principle addresses the challenge of creating a single LLM-based embedding model that excels across a wide variety of retrieval scenarios. The approach trains on a diverse mixture of datasets covering different retrieval types: asymmetric search (queries vs documents), symmetric similarity (text-to-text), QA pairs, code search, and in-context example selection. The training framework uses specialized loss functions for different task types, including dense retrieval losses, language modeling objectives for retrieval-augmented generation, and sentence representation learning metrics (SRLM). The system incorporates task-specific preprocessing, custom evaluation metrics (MRR, recall@k, nDCG), and careful batch construction to balance diverse tasks. This multi-task approach creates more robust embeddings that generalize better to unseen domains compared to single-task training.

Usage

Use this principle when:

Building universal embedding models for production systems
Training embedders that handle diverse retrieval scenarios
Developing LLM-based retrievers for RAG applications
Creating embeddings that work across multiple domains without fine-tuning

Theoretical Basis

The multi-task retrieval framework consists of:

Task Taxonomy:

- Dense retrieval: Query-document matching with contrastive loss
- Symmetric similarity: Text pair similarity with symmetric loss
- In-context learning: Example retrieval for few-shot prompting
- QA retrieval: Question-answer pair matching

Multi-task Loss:

- Combined objective: L = Σ_t λ_t * L_t(θ)
- Where t indexes tasks, λ_t are task weights
- L_dense = InfoNCE loss for retrieval
- L_SRLM = Sentence representation loss
- L_LM = Language modeling loss for generation

Batch Construction:

- Sample batches from multiple datasets simultaneously
- Ensure task diversity within each batch
- Balance high-resource and low-resource tasks

Training Strategy:

- Task sampling: Proportional to dataset size or uniform
- Gradient accumulation across tasks
- Task-specific learning rates via parameter groups

Evaluation Suite:

- BEIR benchmark: Zero-shot retrieval across 18 datasets
- MTEB: Massive multi-task embedding benchmark
- Task-specific metrics: MRR, Recall@k, nDCG
- In-context learning: Accuracy on downstream tasks

Model Architecture:

- Base: LLM backbone (Llama, Mistral, etc.)
- Embedding extraction: Pooling over hidden states
- Optional: Task-specific projection heads

The key insight is that training on diverse tasks creates more generalizable representations through implicit regularization and knowledge transfer across domains.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment