Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:FlagOpen FlagEmbedding Hard Negative Mining

From Leeroopedia


Template:Metadata

Overview

A technique that identifies challenging negative examples by retrieving passages that are similar to the query but not relevant, improving the discriminative ability of contrastive learning.

Description

Random negatives are too easy for the model to distinguish. Hard negatives are passages ranked highly by an embedding model but not in the positive set. The hn_mine.py script:

  1. Encodes all corpus passages with an embedder
  2. Builds a FAISS index
  3. Retrieves top-k candidates per query
  4. Filters out positives and samples from a specified rank range (e.g., 10-210) to get hard negatives

This avoids the hardest negatives (likely false negatives) and the easiest ones.

Usage

After preparing initial training data and before training to enhance negative quality.

Theoretical Basis

Hard negatives increase gradient signal in contrastive learning. The sampling range avoids rank 1-10 (likely false negatives) and very low ranks (too easy). FAISS IndexFlatIP enables exact inner product search with optional GPU acceleration.

Related Pages

Implementation:FlagOpen_FlagEmbedding_Hn_Mine_Script

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment