Active Learning

  1. If you need to start somewhere start here - types of AL, the methodology, examples, sample selection functions.

  2. A thorough review paper about AL

  3. Choose your model first, then do AL, from lighttag

    1. The alternative is Query by committee - Importantly, the active learning method we presented above is the most naive form of what is called "uncertainty sampling" where we chose to sample based on how uncertain our model was. An alternative approach, called Query by Committee, maintains a collection of models (the committee) and selecting the most "controversial" data point to label next, that is one where the models disagreed on. Using such a committee may allow us to overcome the restricted hypothesis a single model can express, though at the onset of a task we still have no way of knowing what hypothesis we should be using.

    2. Paper: warning against transferring actively sampled datasets to other models

  4. Using weak and strong oracle in AL, paper.

  5. The pitfalls of AL - how to choose (cost-effectively) the active learning technique when one starts without the labeled data needed for methods like cross-validation; 2. how to choose (cost-effectively) the base learning technique when one starts without the labeled data needed for methods like cross-validation, given that we know that learning curves cross, and given possible interactions between active learning technique and base learner; 3. how to deal with highly skewed class distributions, where active learning strategies find few (or no) instances of rare classes; 4. how to deal with concepts including very small subconcepts (“disjuncts”)—which are hard enough to find with random sampling (because of their rarity), but active learning strategies can actually avoid finding them if they are misclassified strongly to begin with; 5. how best to address the cold-start problem, and especially 6. whether and what alternatives exist for using human resources to improve learning, that may be more cost efficient than using humans simply for labeling selected cases, such as guided learning [3], active dual supervision [2], guided feature labeling [1], etc.

  6. A great tutorial

  7. AWS Sagemaker Active Learning, using annotation consolidation that finds outliers and weights accordingly, then takes that data, trains a model with the annotation + training data, if labeled with high probability, will use those labels, otherwise will re-annotate.

Human In The loop ML book by Robert munro

  1. Uncertainty sampling

    1. Least Confidence: difference between the most confident prediction and 100% confidence

    2. Margin of Confidence: difference between the top two most confident predictions

    3. Ratio of Confidence: ratio between the top two most confident predictions

    4. Entropy: difference between all predictions, as defined by information theory

Diversity sampling - you want to make sure that it covers as diverse a set of data and real-world demographics as possible.

  1. Model-based Outliers: sampling for low activation in your logits and hidden layers to find items that are confusing to your model because of lack of information

  2. Cluster-based Sampling: using Unsupervised Machine Learning to sample data from all the meaningful trends in your data’s feature-space

  3. Representative Sampling: sampling items that are the most representative of the target domain for your model, relative to your current training data

  4. Real-world diversity: using sampling strategies that increase fairness when trying to support real-world diversity

Combine uncertainty sampling and diversity sampling

  1. Least Confidence Sampling with Clustering-based Sampling: sample items that are confusing to your model and then cluster those items to ensure a diverse sample (see diagram below).

  2. Uncertainty Sampling with Model-based Outliers: sample items that are confusing to your model and within those find items with low activation in the model.

  3. Uncertainty Sampling with Model-based Outliers and Clustering: combine methods 1 and 2.

  4. Representative Cluster-based Sampling: cluster your data to capture multinodal distributions and sample items that are most like your target domain (see diagram below).

  5. Sampling from the Highest Entropy Cluster: cluster your unlabeled data and find the cluster with the highest average confusion for your model.

  6. Uncertainty Sampling and Representative Sampling: sample items that are both confusing to your current model and the most like your target domain.

  7. Model-based Outliers and Representative Sampling: sample items that have low activation in your model but are relatively common in your target domain.

  8. Clustering with itself for hierarchical clusters: recursively cluster to maximize the diversity.

  9. Sampling from the Highest Entropy Cluster with Margin of Confidence Sampling: find the cluster with the most confusion and then sample for the maximum pairwise label confusion within that cluster.

  10. Combining Ensemble Methods and Dropouts with individual strategies: aggregate results that come from multiple models or multiple predictions from one model via Monte-Carlo Dropouts aka Bayesian Deep Learning.

Active transfer learning.

Machine in the loop

Last updated