Implementation:FlagOpen FlagEmbedding Embedding Similarity Scoring
| Field | Value |
|---|---|
| Type | Pattern Doc (user-side computation patterns) |
| Source | User-side numpy/torch + FlagEmbedding/inference/embedder/encoder_only/m3.py:L129-177 for sparse and ColBERT matching
|
Interface
Three scoring methods are available after encoding queries and passages with an M3 embedder:
1. Dense Similarity
Direct matrix multiplication of query and passage embeddings:
scores = embeddings_q @ embeddings_p.T
When embeddings are L2-normalized (as returned by the default encode methods), this produces cosine similarity scores.
| Parameter | Type | Description |
|---|---|---|
| embeddings_q | np.ndarray (shape: [num_queries, dim]) |
Dense query embeddings from model.encode_queries() or model.encode()
|
| embeddings_p | np.ndarray (shape: [num_passages, dim]) |
Dense passage embeddings from model.encode_corpus() or model.encode()
|
| Returns | np.ndarray (shape: [num_queries, num_passages]) |
Cosine similarity matrix |
2. Sparse Lexical Matching
M3Embedder.compute_lexical_matching_score(lexical_weights_1, lexical_weights_2)
| Parameter | Type | Description |
|---|---|---|
| lexical_weights_1 | Union[Dict[str, float], List[Dict[str, float]]] |
Lexical weights for queries. Each dict maps tokens to learned weights. |
| lexical_weights_2 | Union[Dict[str, float], List[Dict[str, float]]] |
Lexical weights for passages. Each dict maps tokens to learned weights. |
| Returns | Union[float, np.ndarray] |
Single float for dict-dict input; 2D array (shape: [num_queries, num_passages]) for list-list input. |
3. ColBERT Token-Level Interaction
M3Embedder.colbert_score(q_reps, p_reps)
| Parameter | Type | Description |
|---|---|---|
| q_reps | np.ndarray |
Multi-vector (token-level) embeddings for a single query. Shape: [num_query_tokens, dim]. |
| p_reps | np.ndarray |
Multi-vector (token-level) embeddings for a single passage. Shape: [num_passage_tokens, dim]. |
| Returns | torch.Tensor |
Scalar ColBERT score: average of per-query-token maximum similarities. |
I/O
Input: Embeddings produced by AbsEmbedder.encode(), encode_queries(), or encode_corpus(). For M3 models, the encode methods return dictionaries with keys "dense_vecs", "lexical_weights", and "colbert_vecs".
Output: Similarity scores as float (single pair) or np.ndarray (batch). Higher scores indicate greater relevance.
Examples
Example 1: Dense Scoring
import numpy as np
from FlagEmbedding import BGEM3FlagModel
model = BGEM3FlagModel("BAAI/bge-m3", use_fp16=True)
queries = ["What is the capital of France?", "How does photosynthesis work?"]
passages = [
"Paris is the capital and largest city of France.",
"Photosynthesis converts light energy into chemical energy in plants.",
"The Eiffel Tower is located in Paris.",
]
# Encode with dense output
q_embeddings = model.encode(queries)["dense_vecs"]
p_embeddings = model.encode(passages)["dense_vecs"]
# Compute dense similarity (cosine similarity for normalized embeddings)
scores = q_embeddings @ p_embeddings.T
print(scores)
# Output shape: (2, 3) - each query scored against each passage
Example 2: Sparse Lexical Matching
from FlagEmbedding import BGEM3FlagModel
model = BGEM3FlagModel("BAAI/bge-m3", use_fp16=True)
queries = ["What is the capital of France?"]
passages = ["Paris is the capital and largest city of France."]
# Encode with sparse output
q_output = model.encode(queries, return_sparse=True)
p_output = model.encode(passages, return_sparse=True)
q_lexical_weights = q_output["lexical_weights"]
p_lexical_weights = p_output["lexical_weights"]
# Compute sparse lexical matching score
sparse_score = model.compute_lexical_matching_score(
q_lexical_weights[0], p_lexical_weights[0]
)
print(f"Sparse score: {sparse_score}")
# Returns a single float for dict-dict input
# Batch scoring: pass lists of dicts
sparse_scores = model.compute_lexical_matching_score(
q_lexical_weights, p_lexical_weights
)
print(f"Sparse scores shape: {sparse_scores.shape}")
# Returns np.ndarray of shape (num_queries, num_passages)
Example 3: ColBERT Scoring
from FlagEmbedding import BGEM3FlagModel
model = BGEM3FlagModel("BAAI/bge-m3", use_fp16=True)
query = "What is the capital of France?"
passage = "Paris is the capital and largest city of France."
# Encode with ColBERT output
q_output = model.encode([query], return_colbert_vecs=True)
p_output = model.encode([passage], return_colbert_vecs=True)
q_colbert_vecs = q_output["colbert_vecs"][0] # single query token embeddings
p_colbert_vecs = p_output["colbert_vecs"][0] # single passage token embeddings
# Compute ColBERT score via MaxSim
colbert_score = model.colbert_score(q_colbert_vecs, p_colbert_vecs)
print(f"ColBERT score: {colbert_score.item()}")
Example 4: Combined Multi-Method Scoring
from FlagEmbedding import BGEM3FlagModel
model = BGEM3FlagModel("BAAI/bge-m3", use_fp16=True)
sentence_pairs = [
["What is the capital of France?", "Paris is the capital of France."],
["How does photosynthesis work?", "Plants convert light to energy."],
]
# compute_score returns all scoring methods combined
scores = model.compute_score(
sentence_pairs,
weights_for_different_modes=[1.0, 1.0, 1.0] # [dense, sparse, colbert]
)
print(scores)
# Returns dict with keys: 'colbert', 'sparse', 'dense', 'sparse+dense', 'colbert+sparse+dense'