Principle:Cleanlab Cleanlab Active Learning Prioritization
| Knowledge Sources | ActiveLab, Cleanlab |
|---|---|
| Domains | Machine_Learning, Data_Quality, Crowdsourcing |
| Last Updated | 2026-02-09 |
Overview
Method (ActiveLab) for prioritizing which examples to collect additional labels for in active learning settings with multiple annotators.
Description
Active learning prioritization scores each example by how informative additional labels would be for improving classifier performance. It addresses the question: given a limited annotation budget, which examples should we label next?
The method handles two distinct pools of data:
- Already-labeled examples: Examples that have been annotated by one or more annotators but where additional annotations from more annotators could help resolve disagreements and improve consensus quality.
- Unlabeled examples: Examples that have not been annotated by any annotator, where any label would provide new information.
For each example, the method computes an active learning score where lower scores indicate higher priority for additional labeling. The scores from both pools are directly comparable, enabling a unified ranking that optimally allocates annotation effort across labeled and unlabeled data.
This is particularly valuable in crowdsourcing settings where the cost of obtaining each label is non-trivial and must be allocated efficiently.
Usage
Active learning prioritization is used in iterative annotation workflows where the goal is to maximize label quality improvements per annotation dollar. Typical applications include:
- Annotation budget allocation: Deciding whether to get more labels for already-labeled examples (to improve consensus) or to label new examples.
- Crowdsourcing optimization: Routing annotation tasks to maximize the information gained per annotation.
- Iterative model improvement: In each round of active learning, selecting the batch of examples most likely to improve the model when labeled.
Theoretical Basis
The ActiveLab scoring method assigns a priority score to each example based on how much additional labeling would reduce uncertainty.
For labeled examples (already have some annotations):
The score reflects confidence in the current consensus. When annotators highly agree and the model is confident, the score is high (low priority). When there is disagreement or model uncertainty, the score is low (high priority):
score_labeled[x] = f(annotator_agreement[x], model_confidence[x], annotator_weights, model_weight)
Where:
- annotator_agreement measures the level of agreement among existing annotations for example x.
- model_confidence is the model's predicted probability for the consensus class.
- annotator_weights and model_weight are the learned reliability weights from CROWDLAB.
For unlabeled examples (no annotations yet):
The score reflects model confidence alone, since there are no annotator labels to consider:
score_unlabeled[x] = g(model_confidence[x])
Where model_confidence is derived from the model's predicted class probabilities. Highly confident predictions receive high scores (low priority), while uncertain predictions receive low scores (high priority).
Key property: Both scoring formulas are calibrated to be directly comparable, so the scores from labeled and unlabeled pools can be merged into a single ranked list. This enables the unified decision of whether it is more valuable to re-annotate a contentious labeled example or to annotate a new unlabeled example.
Ranking: Examples are sorted by score in ascending order. The examples with the lowest scores are the highest priority for additional annotation.