NVIDIA’s New AI Is Fast For A Strange Reason

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

NVIDIA has released a new 30-billion parameter open multimodal AI model capable of processing nearly 10 hours of video per hour — almost 10x real-time speed. The efficiency gains come from five architectural innovations: linear (not quadratic) memory scaling with context length, a lightweight audio tokenizer that preserves emotion and tone without a separate speech model, native aspect-ratio preservation with 3D convolutions for video, a distilled single encoder replacing three separate CLIP models, and efficient video frame sampling that discards redundant frames. The model requires ~25GB VRAM to run locally. Its license permits commercial use and derivative works but is not Apache 2.0. The main limitation is that it is not optimized for pure text reasoning or coding tasks — it shines specifically for high-throughput, cost-efficient multimodal workloads.

5m watch time

Sort: