Implementation:Cleanlab Cleanlab Get Label Quality Multiannotator

API	`multiannotator.get_label_quality_multiannotator`
Source	`cleanlab/multiannotator.py:L46-58`
Domains	Machine_Learning, Data_Quality, Crowdsourcing
Last Updated	2026-02-09

Overview

Implementation of the CROWDLAB algorithm for estimating consensus labels, quality scores, and annotator statistics from crowdsourced annotations combined with model predictions.

Description

This function takes a matrix of annotations from multiple annotators (with NaN for missing annotations) and model predicted probabilities, then returns a comprehensive dictionary containing:

Consensus labels: Best estimate of the true label for each example, computed using either the CROWDLAB algorithm or majority voting.
Label quality scores: Per-example confidence in the consensus label.
Detailed quality information: Per-example, per-annotator agreement details (optional).
Annotator statistics: Reliability scores for each annotator (optional).
Learned weights: Model weight and per-annotator weights from the CROWDLAB algorithm (optional).

The function supports multiple consensus methods that can be requested simultaneously, and offers optional temperature scaling for calibrating model probabilities.

Usage

This function is the primary entry point for multiannotator label quality analysis. It is used after collecting crowd annotations and training a classifier. The results drive downstream decisions about label quality, annotator management, and data cleaning.

Code Reference

Source Location

cleanlab/multiannotator.py, lines 46-58.

Signature

def get_label_quality_multiannotator(
    labels_multiannotator: Union[pd.DataFrame, np.ndarray],
    pred_probs: np.ndarray,
    *,
    consensus_method: Union[str, List[str]] = "best_quality",
    quality_method: str = "crowdlab",
    calibrate_probs: bool = False,
    return_detailed_quality: bool = True,
    return_annotator_stats: bool = True,
    return_weights: bool = False,
    verbose: bool = True,
    label_quality_score_kwargs: dict = {},
) -> Dict[str, Any]

Import

from cleanlab.multiannotator import get_label_quality_multiannotator

I/O Contract

Inputs

Parameter	Type	Description
`labels_multiannotator`	`Union[pd.DataFrame, np.ndarray]`	Matrix of shape (N, M) where N is the number of examples and M is the number of annotators. Each entry is an integer class label or NaN if the annotator did not label that example.
`pred_probs`	`np.ndarray`	Array of shape (N, K) containing the model's predicted class probabilities for each example, where K is the number of classes.
`consensus_method`	`Union[str, List[str]]`	Method(s) for computing consensus labels. Options: `"best_quality"` (CROWDLAB), `"majority_vote"`. Can be a single string or a list to compute multiple consensus methods. Defaults to `"best_quality"`.
`quality_method`	`str`	Method for computing label quality scores. Options: `"crowdlab"` (uses annotator weights and model), `"agreement"` (uses inter-annotator agreement only). Defaults to `"crowdlab"`.
`calibrate_probs`	`bool`	If True, applies temperature scaling to calibrate the model's predicted probabilities before use. Defaults to False.
`return_detailed_quality`	`bool`	If True, includes per-example, per-annotator agreement details in the output. Defaults to True.
`return_annotator_stats`	`bool`	If True, includes per-annotator reliability statistics in the output. Defaults to True.
`return_weights`	`bool`	If True, includes the learned model weight and annotator weights in the output. Defaults to False.
`verbose`	`bool`	If True, prints progress information. Defaults to True.
`label_quality_score_kwargs`	`dict`	Additional keyword arguments passed to the label quality scoring function. Defaults to empty dict.

Outputs

Key	Type	Description
`"label_quality"`	`pd.DataFrame`	DataFrame with columns for consensus label, consensus quality score, and related metrics. One row per example.
`"detailed_label_quality"`	`pd.DataFrame`	(When `return_detailed_quality=True`) DataFrame with per-annotator agreement details for each example.
`"annotator_stats"`	`pd.DataFrame`	(When `return_annotator_stats=True`) DataFrame with reliability statistics for each annotator, including quality scores and agreement rates.
`"model_weight"`	`float`	(When `return_weights=True`) The learned weight for the model's predictions.
`"annotator_weight"`	`np.ndarray`	(When `return_weights=True`) Array of learned weights for each annotator.

Usage Examples

import numpy as np
import pandas as pd
from cleanlab.multiannotator import get_label_quality_multiannotator

# 5 examples labeled by 3 annotators (NaN = not labeled)
labels_multiannotator = pd.DataFrame({
    "annotator_0": [0, 1, 2, 1, 0],
    "annotator_1": [0, 1, 1, 1, np.nan],
    "annotator_2": [1, 1, 2, np.nan, 0],
})

# Model predictions (K=3 classes)
pred_probs = np.array([
    [0.8, 0.1, 0.1],
    [0.05, 0.9, 0.05],
    [0.1, 0.3, 0.6],
    [0.1, 0.85, 0.05],
    [0.7, 0.2, 0.1],
])

# Get full multiannotator analysis
results = get_label_quality_multiannotator(
    labels_multiannotator,
    pred_probs,
    consensus_method="best_quality",
    quality_method="crowdlab",
    return_detailed_quality=True,
    return_annotator_stats=True,
    return_weights=True,
)

# Access consensus labels and quality scores
label_quality = results["label_quality"]
consensus_labels = label_quality["consensus_label"]
quality_scores = label_quality["consensus_quality_score"]

# Access annotator statistics
annotator_stats = results["annotator_stats"]

# Access learned weights
model_weight = results["model_weight"]
annotator_weights = results["annotator_weight"]

# Use multiple consensus methods simultaneously
results_multi = get_label_quality_multiannotator(
    labels_multiannotator,
    pred_probs,
    consensus_method=["best_quality", "majority_vote"],
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment