moonshotai/MoonViT-SO-400M · Hugging Face

Debajyati Dey · 2026-05-10T07:25:07.885Z

Shared: moonshotai/MoonViT-SO-400M · Hugging Face

moonshotai/MoonViT-SO-400M · Hugging Face

MoonViT is a native-resolution vision encoder initialized from and continually pre-trained on SigLIP-SO-400M, extracted from Moonshot AI's Kimi-VL-A3B multimodal model for standalone use. It supports native image resolution processing and is available on Hugging Face with example code using the Transformers library. Full training details are available in the Kimi-VL Technical Report.

Comment

Bookmark

Copy

Sort: