3% of Google’s new code is written by a Language Model, DeepSpeed for making inferences with a Trillion-scale Mixture of Experts. The paper explores a technique — data pruning — that can improve the learning efficiency of NNs “beating They perform k-means clustering on the embeddings of a pre-trained ImageNet model. To be fair, the results are not jaw-dropping.
Sort: