Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Cleanlab Cleanlab Get Label Quality Multiannotator

From Leeroopedia


API multiannotator.get_label_quality_multiannotator
Source cleanlab/multiannotator.py:L46-58
Domains Machine_Learning, Data_Quality, Crowdsourcing
Last Updated 2026-02-09

Overview

Implementation of the CROWDLAB algorithm for estimating consensus labels, quality scores, and annotator statistics from crowdsourced annotations combined with model predictions.

Description

This function takes a matrix of annotations from multiple annotators (with NaN for missing annotations) and model predicted probabilities, then returns a comprehensive dictionary containing:

  1. Consensus labels: Best estimate of the true label for each example, computed using either the CROWDLAB algorithm or majority voting.
  2. Label quality scores: Per-example confidence in the consensus label.
  3. Detailed quality information: Per-example, per-annotator agreement details (optional).
  4. Annotator statistics: Reliability scores for each annotator (optional).
  5. Learned weights: Model weight and per-annotator weights from the CROWDLAB algorithm (optional).

The function supports multiple consensus methods that can be requested simultaneously, and offers optional temperature scaling for calibrating model probabilities.

Usage

This function is the primary entry point for multiannotator label quality analysis. It is used after collecting crowd annotations and training a classifier. The results drive downstream decisions about label quality, annotator management, and data cleaning.

Code Reference

Source Location

cleanlab/multiannotator.py, lines 46-58.

Signature

def get_label_quality_multiannotator(
    labels_multiannotator: Union[pd.DataFrame, np.ndarray],
    pred_probs: np.ndarray,
    *,
    consensus_method: Union[str, List[str]] = "best_quality",
    quality_method: str = "crowdlab",
    calibrate_probs: bool = False,
    return_detailed_quality: bool = True,
    return_annotator_stats: bool = True,
    return_weights: bool = False,
    verbose: bool = True,
    label_quality_score_kwargs: dict = {},
) -> Dict[str, Any]

Import

from cleanlab.multiannotator import get_label_quality_multiannotator

I/O Contract

Inputs

Parameter Type Description
labels_multiannotator Union[pd.DataFrame, np.ndarray] Matrix of shape (N, M) where N is the number of examples and M is the number of annotators. Each entry is an integer class label or NaN if the annotator did not label that example.
pred_probs np.ndarray Array of shape (N, K) containing the model's predicted class probabilities for each example, where K is the number of classes.
consensus_method Union[str, List[str]] Method(s) for computing consensus labels. Options: "best_quality" (CROWDLAB), "majority_vote". Can be a single string or a list to compute multiple consensus methods. Defaults to "best_quality".
quality_method str Method for computing label quality scores. Options: "crowdlab" (uses annotator weights and model), "agreement" (uses inter-annotator agreement only). Defaults to "crowdlab".
calibrate_probs bool If True, applies temperature scaling to calibrate the model's predicted probabilities before use. Defaults to False.
return_detailed_quality bool If True, includes per-example, per-annotator agreement details in the output. Defaults to True.
return_annotator_stats bool If True, includes per-annotator reliability statistics in the output. Defaults to True.
return_weights bool If True, includes the learned model weight and annotator weights in the output. Defaults to False.
verbose bool If True, prints progress information. Defaults to True.
label_quality_score_kwargs dict Additional keyword arguments passed to the label quality scoring function. Defaults to empty dict.

Outputs

Key Type Description
"label_quality" pd.DataFrame DataFrame with columns for consensus label, consensus quality score, and related metrics. One row per example.
"detailed_label_quality" pd.DataFrame (When return_detailed_quality=True) DataFrame with per-annotator agreement details for each example.
"annotator_stats" pd.DataFrame (When return_annotator_stats=True) DataFrame with reliability statistics for each annotator, including quality scores and agreement rates.
"model_weight" float (When return_weights=True) The learned weight for the model's predictions.
"annotator_weight" np.ndarray (When return_weights=True) Array of learned weights for each annotator.

Usage Examples

import numpy as np
import pandas as pd
from cleanlab.multiannotator import get_label_quality_multiannotator

# 5 examples labeled by 3 annotators (NaN = not labeled)
labels_multiannotator = pd.DataFrame({
    "annotator_0": [0, 1, 2, 1, 0],
    "annotator_1": [0, 1, 1, 1, np.nan],
    "annotator_2": [1, 1, 2, np.nan, 0],
})

# Model predictions (K=3 classes)
pred_probs = np.array([
    [0.8, 0.1, 0.1],
    [0.05, 0.9, 0.05],
    [0.1, 0.3, 0.6],
    [0.1, 0.85, 0.05],
    [0.7, 0.2, 0.1],
])

# Get full multiannotator analysis
results = get_label_quality_multiannotator(
    labels_multiannotator,
    pred_probs,
    consensus_method="best_quality",
    quality_method="crowdlab",
    return_detailed_quality=True,
    return_annotator_stats=True,
    return_weights=True,
)

# Access consensus labels and quality scores
label_quality = results["label_quality"]
consensus_labels = label_quality["consensus_label"]
quality_scores = label_quality["consensus_quality_score"]

# Access annotator statistics
annotator_stats = results["annotator_stats"]

# Access learned weights
model_weight = results["model_weight"]
annotator_weights = results["annotator_weight"]

# Use multiple consensus methods simultaneously
results_multi = get_label_quality_multiannotator(
    labels_multiannotator,
    pred_probs,
    consensus_method=["best_quality", "majority_vote"],
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment