MosaicBERT is an optimized framework that improves the pretraining speed and accuracy of the BERT architecture. It incorporates speed optimizations such as FlashAttention, ALiBi, low-precision LayerNorm, and Gated Linear Units. MosaicBERT achieved the same accuracy as BERT in significantly less time on the same hardware.

3m read time From marktechpost.com
Post cover image

Sort: