Meta AI researchers, in collaboration with Stanford University, have introduced Mixture-of-Transformers (MoT), a novel sparse multi-modal transformer architecture designed to significantly reduce pretraining computational costs. MoT incorporates modality-specific parameters, optimizing text, image, and speech processing without
Sort: