Roblox built a single unified transformer-based translation model using Mixture of Experts (MoE) architecture to handle all 256 language direction pairs across 16 languages, replacing the need for 256 separate models. To meet a 100ms latency ceiling at 5,000+ chats per second, they applied knowledge distillation to compress the model from ~1B to under 650M parameters, combined with quantization and model compilation. The serving pipeline includes a translation cache, dynamic batching, and an embedding cache that avoids re-encoding the same source message for multiple target languages. For quality measurement, they built a reference-free quality estimation model that scores translations at word-level granularity without needing human reference translations. Low-resource language pairs were improved via iterative back-translation, and domain-specific slang was handled through human-labeled platform terminology.
Table of contents
OpenClaw You Can Trust (Sponsored)One Model Versus ManyUnblocked: Context that saves you time and tokens (Sponsored)Making a Billion Parameters Fast Enough for a ConversationMeasuring QualityConclusion1 Comment
Sort: