Researchers propose Vision Mamba (Vim), a new generic vision backbone with bidirectional Mamba blocks. Vim combines position embeddings for location-aware visual identification with bidirectional SSMs for data-dependent global visual context modeling. It achieves the same modeling power as ViT without requiring attention and outperforms the DeiT model in terms of performance.

4m read timeFrom marktechpost.com
Post cover image

Sort: