Implementation:FlagOpen FlagEmbedding LLM Reranker Instruction Modeling
| Knowledge Sources | |
|---|---|
| Domains | Reranking, Large_Language_Models, Instruction_Tuning |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Bi-encoder model for training instruction-tuned LLM rerankers using binary classification on the "Yes" token logit.
Description
BiEncoderModel adapts instruction-tuned LLMs for reranking:
Architecture:
- Processes query-passage pairs formatted with instruction prompts
- Extracts the logit for the "Yes" token at the answer position
- Uses this single logit as the relevance score
Training:
- Groups passages by query (1 positive + N-1 negatives)
- Applies cross-entropy loss treating positive passage as target class 0
- Trains the model to assign higher "Yes" probability to relevant passages
Scoring mechanism:
- Identifies the position of the answer in the sequence (via labels)
- Extracts logits at position-1 (the last non-label token)
- Takes the "Yes" token logit as the relevance score
- Higher "Yes" logit = more relevant passage
This approach leverages instruction-following capabilities of LLMs, teaching them to judge relevance through natural language ("Yes"/"No") rather than arbitrary scoring functions.
Usage
Use this for training instruction-tuned LLMs as rerankers while preserving their instruction-following abilities and using interpretable relevance judgments.
Code Reference
Source Location
- Repository: FlagOpen_FlagEmbedding
- File: research/llm_reranker/finetune_for_instruction/modeling.py
- Lines: 1-90
Signature
class BiEncoderModel(nn.Module):
def __init__(self, model: None, tokenizer: AutoTokenizer = None,
train_batch_size: int = 4)
def encode(self, features)
def forward(self, pair: Union[Dict[str, Tensor], List[Dict[str, Tensor]]])
Import
from research.llm_reranker.finetune_for_instruction.modeling import BiEncoderModel
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | PreTrainedModel | Yes | Instruction-tuned LLM (LLaMA, Mistral, etc.) |
| tokenizer | AutoTokenizer | Yes | Tokenizer with "Yes" token |
| train_batch_size | int | No | Batch size for grouping passages (default: 4) |
| pair | Dict/List[Dict] | Yes | Tokenized inputs with input_ids, attention_mask, labels, position_ids |
Outputs
| Name | Type | Description |
|---|---|---|
| loss | Tensor | Cross-entropy loss (training only) |
| scores | Tensor | "Yes" token logits for each query-passage pair |
Usage Examples
from transformers import AutoModelForCausalLM, AutoTokenizer
from research.llm_reranker.finetune_for_instruction.modeling import BiEncoderModel
# Initialize model
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
model = BiEncoderModel(
model=base_model,
tokenizer=tokenizer,
train_batch_size=4
)
# Training forward pass
# Input format: "[BOS]Query: what is AI\nPassage: AI is...\nIs relevant? Yes"
pair_inputs = {
"input_ids": pair_ids, # [batch_size * group_size, seq_len]
"attention_mask": pair_mask,
"labels": labels, # -100 everywhere except last "Yes" token
"position_ids": position_ids
}
outputs = model(pair=pair_inputs)
loss = outputs.loss # Cross-entropy comparing positive vs negatives
loss.backward()
# Inference
model.eval()
with torch.no_grad():
scores = model.encode(pair_inputs) # [num_pairs] "Yes" token logits
# Higher score = more relevant
print(f"Relevance scores: {scores}")