You Don’t Need Many Labels to Learn

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

A Gaussian Mixture Variational Autoencoder (GMVAE) can be trained entirely without labels and then converted into a classifier using as little as 0.2% labeled data — 35x less than XGBoost needs for comparable accuracy. The key insight is that unsupervised training already discovers the data's cluster structure; labels are only needed to name the clusters, not to build representations. Two decoding strategies are compared: hard decoding (assign each point to its most likely cluster) and soft decoding (use the full posterior distribution over clusters). Soft decoding provides an 18 percentage point accuracy gain when labeled data is extremely scarce, because it leverages model uncertainty and handles impure clusters. Experiments use the EMNIST Letters dataset with 145,600 images across 26 classes.

#machine-learning

#unsupervised-learning

Apr 17•10m read time•From towardsdatascience.com

Table of contents

Introduction Turning Clusters Into a Classifier How Much Supervision Do We Need in Practice?Conclusion

Comment

Bookmark

Copy

Sort: