Implementation:Speechbrain Speechbrain Get Verification Scores
| Property | Value |
|---|---|
| Implementation Name | Get Verification Scores |
| Type | API Doc |
| Repository | speechbrain/speechbrain |
| Source File | recipes/VoxCeleb/SpeakerRec/speaker_verification_cosine.py:L82-155 (scores), L54-79 (embedding loop)
|
| Import | Recipe-specific. Uses torch.nn.CosineSimilarity
|
| Related Principle | Principle:Speechbrain_Speechbrain_Speaker_Verification_Scoring |
API Signatures
get_verification_scores
def get_verification_scores(veri_test):
"""Computes positive and negative scores given the verification split.
Arguments
---------
veri_test : list
List of verification trial strings, each formatted as
"label enrol_id test_id".
Returns
-------
positive_scores : list
Cosine similarity scores for same-speaker trials (label=1).
negative_scores : list
Cosine similarity scores for different-speaker trials (label=0).
"""
compute_embedding_loop
def compute_embedding_loop(data_loader):
"""Computes the embeddings of all the waveforms specified in the dataloader.
Arguments
---------
data_loader : DataLoader
DataLoader yielding batches with .id and .sig attributes.
Returns
-------
embedding_dict : dict
Dictionary mapping segment IDs (str) to embedding tensors.
"""
Description
These functions implement the speaker verification scoring pipeline. compute_embedding_loop pre-computes embeddings for all utterances in a given DataLoader. get_verification_scores then iterates over a list of verification trial pairs, computes cosine similarity between enrollment and test embeddings, optionally applies score normalization, and returns separate lists of positive (same-speaker) and negative (different-speaker) scores.
Parameters
get_verification_scores
| Parameter | Type | Description |
|---|---|---|
| veri_test | list of str | Verification trial list. Each string has format "label enrol_id test_id" where label is 1 (same speaker) or 0 (different speaker).
|
The function also depends on module-level variables:
enrol_dict: Dictionary of enrollment embeddings (fromcompute_embedding_loop)test_dict: Dictionary of test embeddingstrain_dict: Dictionary of training embeddings (for score normalization cohort)params: Hyperparameters dictionary
compute_embedding_loop
| Parameter | Type | Description |
|---|---|---|
| data_loader | DataLoader |
A SpeechBrain DataLoader yielding PaddedBatch objects with .id (list of segment IDs) and .sig (waveforms, lengths).
|
Inputs
- Verification pairs file: Loaded as a list of strings, one trial per line.
- Pre-computed embedding dictionaries: Enrollment, test, and (optionally) training embeddings stored in memory.
- DataLoader objects: For enrollment, test, and (optionally) training data.
Outputs
- positive_scores (list of float): Cosine similarity scores for target (same-speaker) trials.
- negative_scores (list of float): Cosine similarity scores for non-target (different-speaker) trials.
- scores.txt (file): Written to
params["output_folder"]/scores.txtwith format:enrol_id test_id label score.
Implementation Details
Embedding Loop
def compute_embedding_loop(data_loader):
embedding_dict = {}
with torch.no_grad():
for batch in tqdm(data_loader, dynamic_ncols=True):
batch = batch.to(run_opts["device"])
seg_ids = batch.id
wavs, lens = batch.sig
# Skip if all segments already computed
found = False
for seg_id in seg_ids:
if seg_id not in embedding_dict:
found = True
if not found:
continue
wavs, lens = wavs.to(run_opts["device"]), lens.to(run_opts["device"])
emb = compute_embedding(wavs, lens).unsqueeze(1)
for i, seg_id in enumerate(seg_ids):
embedding_dict[seg_id] = emb[i].detach().clone()
return embedding_dict
Key behaviors:
- Processes all batches under
torch.no_grad()for efficiency. - Checks whether all segment IDs in a batch are already computed, skipping redundant computation.
- Each embedding is detached and cloned to prevent memory leaks from the computation graph.
Scoring with Cosine Similarity
similarity = torch.nn.CosineSimilarity(dim=-1, eps=1e-6)
for i, line in enumerate(veri_test):
lab_pair = int(line.split(" ")[0].rstrip().split(".")[0].strip())
enrol_id = line.split(" ")[1].rstrip().split(".")[0].strip()
test_id = line.split(" ")[2].rstrip().split(".")[0].strip()
enrol = enrol_dict[enrol_id]
test = test_dict[test_id]
score = similarity(enrol, test)[0]
if lab_pair == 1:
positive_scores.append(score)
else:
negative_scores.append(score)
Score Normalization
When "score_norm" is set in params, the function applies normalization using a training cohort:
# Z-norm: normalize by enrollment impostor statistics
if params["score_norm"] == "z-norm":
enrol_rep = enrol.repeat(train_cohort.shape[0], 1, 1)
score_e_c = similarity(enrol_rep, train_cohort)
if "cohort_size" in params:
score_e_c = torch.topk(score_e_c, k=params["cohort_size"], dim=0)[0]
mean_e_c = torch.mean(score_e_c, dim=0)
std_e_c = torch.std(score_e_c, dim=0)
score = (score - mean_e_c) / std_e_c
# T-norm: normalize by test impostor statistics
elif params["score_norm"] == "t-norm":
score = (score - mean_t_c) / std_t_c
# S-norm: symmetric (average of z-norm and t-norm)
elif params["score_norm"] == "s-norm":
score_e = (score - mean_e_c) / std_e_c
score_t = (score - mean_t_c) / std_t_c
score = 0.5 * (score_e + score_t)
Usage Example
import os
import sys
import torch
import speechbrain as sb
from hyperpyyaml import load_hyperpyyaml
from speechbrain.utils.metric_stats import EER, minDCF
# Load params and pretrained model
params_file, run_opts, overrides = sb.core.parse_arguments(sys.argv[1:])
with open(params_file) as fin:
params = load_hyperpyyaml(fin, overrides)
params["pretrainer"].collect_files()
params["pretrainer"].load_collected()
params["embedding_model"].eval()
# Create dataloaders
train_dataloader, enrol_dataloader, test_dataloader = dataio_prep(params)
# Compute embeddings
enrol_dict = compute_embedding_loop(enrol_dataloader)
test_dict = compute_embedding_loop(test_dataloader)
# Load verification trials
with open(veri_file_path) as f:
veri_test = [line.rstrip() for line in f]
# Compute scores
positive_scores, negative_scores = get_verification_scores(veri_test)
# Evaluate
eer, th = EER(torch.tensor(positive_scores), torch.tensor(negative_scores))
min_dcf, th = minDCF(torch.tensor(positive_scores), torch.tensor(negative_scores))
print(f"EER: {eer * 100:.2f}%")
print(f"minDCF: {min_dcf:.4f}")
See Also
- Principle:Speechbrain_Speechbrain_Speaker_Verification_Scoring
- Implementation:Speechbrain_Speechbrain_Compute_Embeddings
- Implementation:Speechbrain_Speechbrain_EER_And_MinDCF