Contents

A Way To Make Labeling Tasks Easier - Active Learning

What’s Active Learning?

Imagine you have a bunch of the unlabeled data for classification purpose and you need to finished it in efficient way to save the time cost, active learning is what you seek for.

Active learning allows you to only pick up the most informative data for the model every time you labeling by computing the output uncertainty of the unlabeled data from active learning model.

Workflow

https://github.com/modAL-python/modAL
  • Gather Data: Label the few amount of the data from the unlabeled dataset.

  • Build a model: Use the labeled data in current stag to build the model.

  • Model Evaluation: Evaluate weather the model reach the standard of the performance or not.

  • Measure uncertainty of the prediction: If the model didn’t reach the standard then start to compute the uncertainty of all the unlabeled data from the model.

  • Query for labels: Pick up the most informative data to label.

  • Employ: Until the model reach our standard then the process can be done.

Most Informative(Uncertainty) data

Here provides three ways to select the most informative unlabeled data by observing model’s outputs.

Shannon entropy

The summation of all the output’s entropy. The higher the entropy then the higher uncertainty data point contains.

$$H(x)=-\sum_{k}p_k\log(p_k)$$

Personal comprehension: The application scenario of Shannon entropy is while performing multilabel classification (Can be several selected outputs).

Least confident

Pick up the output of the probabilities which has the highest value and apply the formula. The higher the value we get then the lower the confidence from the prediction, that means it is the informative data.

$$U(x)=1-P(\hat{x}|x)$$

Margin Sampling

Subtract the second-highest probability from the first-highest probability. The lower the output of the formula then the higher uncertainty the data point contains.

$$M(x)=P(\hat{x_1}|x)-P(\hat{x_2}|x)$$

Personal comprehension: The application scenario of Margin Sampling is while performing multiclass classification (Only one of the output can be selected).

OpenSource Framework

Reference

[1] Measuring the uncertainty of predictions

[2] How to measure uncertainty in uncertainty sampling for active learning

[3] modAL