MosaicBERT is an optimized framework that improves the pretraining speed and accuracy of the BERT architecture. It incorporates speed optimizations such as FlashAttention, ALiBi, low-precision LayerNorm, and Gated Linear Units. MosaicBERT achieved the same accuracy as BERT in significantly less time on the same hardware.
•3m read time• From marktechpost.com
Sort: