Implementation:Cleanlab Cleanlab Get Label Quality Multiannotator
| API | multiannotator.get_label_quality_multiannotator
|
|---|---|
| Source | cleanlab/multiannotator.py:L46-58
|
| Domains | Machine_Learning, Data_Quality, Crowdsourcing |
| Last Updated | 2026-02-09 |
Overview
Implementation of the CROWDLAB algorithm for estimating consensus labels, quality scores, and annotator statistics from crowdsourced annotations combined with model predictions.
Description
This function takes a matrix of annotations from multiple annotators (with NaN for missing annotations) and model predicted probabilities, then returns a comprehensive dictionary containing:
- Consensus labels: Best estimate of the true label for each example, computed using either the CROWDLAB algorithm or majority voting.
- Label quality scores: Per-example confidence in the consensus label.
- Detailed quality information: Per-example, per-annotator agreement details (optional).
- Annotator statistics: Reliability scores for each annotator (optional).
- Learned weights: Model weight and per-annotator weights from the CROWDLAB algorithm (optional).
The function supports multiple consensus methods that can be requested simultaneously, and offers optional temperature scaling for calibrating model probabilities.
Usage
This function is the primary entry point for multiannotator label quality analysis. It is used after collecting crowd annotations and training a classifier. The results drive downstream decisions about label quality, annotator management, and data cleaning.
Code Reference
Source Location
cleanlab/multiannotator.py, lines 46-58.
Signature
def get_label_quality_multiannotator(
labels_multiannotator: Union[pd.DataFrame, np.ndarray],
pred_probs: np.ndarray,
*,
consensus_method: Union[str, List[str]] = "best_quality",
quality_method: str = "crowdlab",
calibrate_probs: bool = False,
return_detailed_quality: bool = True,
return_annotator_stats: bool = True,
return_weights: bool = False,
verbose: bool = True,
label_quality_score_kwargs: dict = {},
) -> Dict[str, Any]
Import
from cleanlab.multiannotator import get_label_quality_multiannotator
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
labels_multiannotator |
Union[pd.DataFrame, np.ndarray] |
Matrix of shape (N, M) where N is the number of examples and M is the number of annotators. Each entry is an integer class label or NaN if the annotator did not label that example. |
pred_probs |
np.ndarray |
Array of shape (N, K) containing the model's predicted class probabilities for each example, where K is the number of classes. |
consensus_method |
Union[str, List[str]] |
Method(s) for computing consensus labels. Options: "best_quality" (CROWDLAB), "majority_vote". Can be a single string or a list to compute multiple consensus methods. Defaults to "best_quality".
|
quality_method |
str |
Method for computing label quality scores. Options: "crowdlab" (uses annotator weights and model), "agreement" (uses inter-annotator agreement only). Defaults to "crowdlab".
|
calibrate_probs |
bool |
If True, applies temperature scaling to calibrate the model's predicted probabilities before use. Defaults to False. |
return_detailed_quality |
bool |
If True, includes per-example, per-annotator agreement details in the output. Defaults to True. |
return_annotator_stats |
bool |
If True, includes per-annotator reliability statistics in the output. Defaults to True. |
return_weights |
bool |
If True, includes the learned model weight and annotator weights in the output. Defaults to False. |
verbose |
bool |
If True, prints progress information. Defaults to True. |
label_quality_score_kwargs |
dict |
Additional keyword arguments passed to the label quality scoring function. Defaults to empty dict. |
Outputs
| Key | Type | Description |
|---|---|---|
"label_quality" |
pd.DataFrame |
DataFrame with columns for consensus label, consensus quality score, and related metrics. One row per example. |
"detailed_label_quality" |
pd.DataFrame |
(When return_detailed_quality=True) DataFrame with per-annotator agreement details for each example.
|
"annotator_stats" |
pd.DataFrame |
(When return_annotator_stats=True) DataFrame with reliability statistics for each annotator, including quality scores and agreement rates.
|
"model_weight" |
float |
(When return_weights=True) The learned weight for the model's predictions.
|
"annotator_weight" |
np.ndarray |
(When return_weights=True) Array of learned weights for each annotator.
|
Usage Examples
import numpy as np
import pandas as pd
from cleanlab.multiannotator import get_label_quality_multiannotator
# 5 examples labeled by 3 annotators (NaN = not labeled)
labels_multiannotator = pd.DataFrame({
"annotator_0": [0, 1, 2, 1, 0],
"annotator_1": [0, 1, 1, 1, np.nan],
"annotator_2": [1, 1, 2, np.nan, 0],
})
# Model predictions (K=3 classes)
pred_probs = np.array([
[0.8, 0.1, 0.1],
[0.05, 0.9, 0.05],
[0.1, 0.3, 0.6],
[0.1, 0.85, 0.05],
[0.7, 0.2, 0.1],
])
# Get full multiannotator analysis
results = get_label_quality_multiannotator(
labels_multiannotator,
pred_probs,
consensus_method="best_quality",
quality_method="crowdlab",
return_detailed_quality=True,
return_annotator_stats=True,
return_weights=True,
)
# Access consensus labels and quality scores
label_quality = results["label_quality"]
consensus_labels = label_quality["consensus_label"]
quality_scores = label_quality["consensus_quality_score"]
# Access annotator statistics
annotator_stats = results["annotator_stats"]
# Access learned weights
model_weight = results["model_weight"]
annotator_weights = results["annotator_weight"]
# Use multiple consensus methods simultaneously
results_multi = get_label_quality_multiannotator(
labels_multiannotator,
pred_probs,
consensus_method=["best_quality", "majority_vote"],
)