A Way To Make Labeling Tasks Easier - Active Learning
What’s Active Learning?
Imagine you have a bunch of the unlabeled data for classification purpose and you need to finished it in efficient way to save the time cost, active learning is what you seek for.
Active learning allows you to only pick up the most informative data for the model every time you labeling by computing the output uncertainty of the unlabeled data from active learning model.
Workflow
Gather Data: Label the few amount of the data from the unlabeled dataset.
Build a model: Use the labeled data in current stag to build the model.
Model Evaluation: Evaluate weather the model reach the standard of the performance or not.
Measure uncertainty of the prediction: If the model didn’t reach the standard then start to compute the uncertainty of all the unlabeled data from the model.
Query for labels: Pick up the most informative data to label.
Employ: Until the model reach our standard then the process can be done.
Most Informative(Uncertainty) data
Here provides three ways to select the most informative unlabeled data by observing model’s outputs.
Shannon entropy
The summation of all the output’s entropy. The higher the entropy then the higher uncertainty data point contains.
$$H(x)=-\sum_{k}p_k\log(p_k)$$
Personal comprehension: The application scenario of Shannon entropy is while performing multilabel classification (Can be several selected outputs).
Least confident
Pick up the output of the probabilities which has the highest value and apply the formula. The higher the value we get then the lower the confidence from the prediction, that means it is the informative data.
$$U(x)=1-P(\hat{x}|x)$$
Margin Sampling
Subtract the second-highest probability from the first-highest probability. The lower the output of the formula then the higher uncertainty the data point contains.
$$M(x)=P(\hat{x_1}|x)-P(\hat{x_2}|x)$$
Personal comprehension: The application scenario of Margin Sampling is while performing multiclass classification (Only one of the output can be selected).
OpenSource Framework
Reference
[1] Measuring the uncertainty of predictions
[2] How to measure uncertainty in uncertainty sampling for active learning
[3] modAL