Grab built a custom 1B-parameter Vision LLM to extract information from Southeast Asian documents for eKYC verification. Starting with Qwen2-VL 2B, they progressed from LoRA fine-tuning to full parameter training, then built a lightweight model from scratch combining Qwen2-VL's vision encoder with Qwen2.5's compact language

12m read timeFrom blog.bytebytego.com
Post cover image
Table of contents
Kubernetes Quick-Start Guide (Sponsored)Understanding Vision LLMsBuild product instead of babysitting prod (Sponsored)Selecting the Base ModelTraining Dataset GenerationThe Experimentation JourneyFour-Stage Training ProcessResults and PerformanceKey Technical InsightsConclusion

Sort: