Implementation:FlagOpen FlagEmbedding LLM Reranker Instruction Modeling

Knowledge Sources	FlagOpen_FlagEmbedding
Domains	Reranking, Large_Language_Models, Instruction_Tuning
Last Updated	2026-02-09 00:00 GMT

Overview

Bi-encoder model for training instruction-tuned LLM rerankers using binary classification on the "Yes" token logit.

Description

BiEncoderModel adapts instruction-tuned LLMs for reranking:

Architecture:

Processes query-passage pairs formatted with instruction prompts
Extracts the logit for the "Yes" token at the answer position
Uses this single logit as the relevance score

Training:

Groups passages by query (1 positive + N-1 negatives)
Applies cross-entropy loss treating positive passage as target class 0
Trains the model to assign higher "Yes" probability to relevant passages

Scoring mechanism:

Identifies the position of the answer in the sequence (via labels)
Extracts logits at position-1 (the last non-label token)
Takes the "Yes" token logit as the relevance score
Higher "Yes" logit = more relevant passage

This approach leverages instruction-following capabilities of LLMs, teaching them to judge relevance through natural language ("Yes"/"No") rather than arbitrary scoring functions.

Usage

Use this for training instruction-tuned LLMs as rerankers while preserving their instruction-following abilities and using interpretable relevance judgments.

Code Reference

Source Location

Repository: FlagOpen_FlagEmbedding
File: research/llm_reranker/finetune_for_instruction/modeling.py
Lines: 1-90

Signature

class BiEncoderModel(nn.Module):
    def __init__(self, model: None, tokenizer: AutoTokenizer = None,
                 train_batch_size: int = 4)

    def encode(self, features)
    def forward(self, pair: Union[Dict[str, Tensor], List[Dict[str, Tensor]]])

Import

from research.llm_reranker.finetune_for_instruction.modeling import BiEncoderModel

I/O Contract

Inputs

Name	Type	Required	Description
model	PreTrainedModel	Yes	Instruction-tuned LLM (LLaMA, Mistral, etc.)
tokenizer	AutoTokenizer	Yes	Tokenizer with "Yes" token
train_batch_size	int	No	Batch size for grouping passages (default: 4)
pair	Dict/List[Dict]	Yes	Tokenized inputs with input_ids, attention_mask, labels, position_ids

Outputs

Name	Type	Description
loss	Tensor	Cross-entropy loss (training only)
scores	Tensor	"Yes" token logits for each query-passage pair

Usage Examples

from transformers import AutoModelForCausalLM, AutoTokenizer
from research.llm_reranker.finetune_for_instruction.modeling import BiEncoderModel

# Initialize model
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")

model = BiEncoderModel(
    model=base_model,
    tokenizer=tokenizer,
    train_batch_size=4
)

# Training forward pass
# Input format: "[BOS]Query: what is AI\nPassage: AI is...\nIs relevant? Yes"
pair_inputs = {
    "input_ids": pair_ids,          # [batch_size * group_size, seq_len]
    "attention_mask": pair_mask,
    "labels": labels,               # -100 everywhere except last "Yes" token
    "position_ids": position_ids
}

outputs = model(pair=pair_inputs)
loss = outputs.loss  # Cross-entropy comparing positive vs negatives
loss.backward()

# Inference
model.eval()
with torch.no_grad():
    scores = model.encode(pair_inputs)  # [num_pairs] "Yes" token logits
    # Higher score = more relevant
    print(f"Relevance scores: {scores}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment