Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Cleanlab Cleanlab Get Active Learning Scores

From Leeroopedia


API multiannotator.get_active_learning_scores
Source cleanlab/multiannotator.py:L564-568
Domains Machine_Learning, Data_Quality, Crowdsourcing
Last Updated 2026-02-09

Overview

Implementation of the ActiveLab algorithm for computing active learning priority scores that determine which examples most need additional annotations. Returns comparable scores for both already-labeled and unlabeled examples.

Description

This function computes active learning scores for two pools of data:

  1. Labeled pool: Examples that already have one or more annotator labels. The score reflects how much additional annotations would improve consensus quality, considering both annotator agreement and model confidence.
  2. Unlabeled pool: Examples with no annotations. The score reflects model uncertainty, with lower scores indicating examples where the model is most uncertain.

Both score types are calibrated to be directly comparable, enabling a unified ranking across both pools. Lower scores indicate higher priority for additional labeling effort.

The function accepts optional inputs, allowing it to be used with only labeled data, only unlabeled data, or both.

Usage

This function is used in active learning loops to select which examples to annotate next. It is typically called after each round of annotation and model retraining, producing an updated priority ranking that guides the next batch of annotation tasks.

Code Reference

Source Location

cleanlab/multiannotator.py, lines 564-568.

Signature

def get_active_learning_scores(
    labels_multiannotator: Optional[Union[pd.DataFrame, np.ndarray]] = None,
    pred_probs: Optional[np.ndarray] = None,
    pred_probs_unlabeled: Optional[np.ndarray] = None,
) -> Tuple[np.ndarray, np.ndarray]

Import

from cleanlab.multiannotator import get_active_learning_scores

I/O Contract

Inputs

Parameter Type Description
labels_multiannotator Optional[Union[pd.DataFrame, np.ndarray]] Matrix of shape (N, M) where N is the number of labeled examples and M is the number of annotators. Each entry is an integer class label or NaN if the annotator did not label that example. None if there are no labeled examples.
pred_probs Optional[np.ndarray] Array of shape (N, K) containing the model's predicted class probabilities for each labeled example. Required when labels_multiannotator is provided.
pred_probs_unlabeled Optional[np.ndarray] Array of shape (U, K) containing the model's predicted class probabilities for each unlabeled example, where U is the number of unlabeled examples. None if there are no unlabeled examples.

Outputs

Type Description
Tuple[np.ndarray, np.ndarray] A tuple of (active_learning_scores, active_learning_scores_unlabeled). active_learning_scores is a np.ndarray of shape (N,) with scores for labeled examples. active_learning_scores_unlabeled is a np.ndarray of shape (U,) with scores for unlabeled examples. Lower scores indicate higher priority for additional labeling. Both arrays are on the same scale and directly comparable.

Usage Examples

import numpy as np
import pandas as pd
from cleanlab.multiannotator import get_active_learning_scores

# 5 labeled examples with annotations from 3 annotators
labels_multiannotator = pd.DataFrame({
    "annotator_0": [0, 1, 2, 1, 0],
    "annotator_1": [0, 1, 1, 1, np.nan],
    "annotator_2": [1, 1, 2, np.nan, 0],
})

# Model predictions for labeled examples (K=3 classes)
pred_probs = np.array([
    [0.8, 0.1, 0.1],
    [0.05, 0.9, 0.05],
    [0.1, 0.3, 0.6],
    [0.1, 0.85, 0.05],
    [0.7, 0.2, 0.1],
])

# Model predictions for 3 unlabeled examples
pred_probs_unlabeled = np.array([
    [0.4, 0.3, 0.3],   # uncertain - high priority
    [0.05, 0.05, 0.9],  # confident - low priority
    [0.35, 0.35, 0.3],  # uncertain - high priority
])

# Get active learning scores for both pools
scores_labeled, scores_unlabeled = get_active_learning_scores(
    labels_multiannotator=labels_multiannotator,
    pred_probs=pred_probs,
    pred_probs_unlabeled=pred_probs_unlabeled,
)

# scores_labeled: np.ndarray of shape (5,) - scores for labeled examples
# scores_unlabeled: np.ndarray of shape (3,) - scores for unlabeled examples

# Combine and rank all examples (lower score = higher priority)
all_scores = np.concatenate([scores_labeled, scores_unlabeled])
all_indices = np.argsort(all_scores)  # ascending: highest priority first

# Select top-k examples for next annotation batch
batch_size = 3
next_batch = all_indices[:batch_size]

# Only labeled examples (no unlabeled pool)
scores_labeled_only, _ = get_active_learning_scores(
    labels_multiannotator=labels_multiannotator,
    pred_probs=pred_probs,
)

# Only unlabeled examples (no labeled pool)
_, scores_unlabeled_only = get_active_learning_scores(
    pred_probs_unlabeled=pred_probs_unlabeled,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment