Vimeo's engineering team encountered a 'blank screen bug' when using LLMs for subtitle translation: the model produced fluent, condensed translations that didn't match the required number of timed subtitle slots. The root cause was the LLM optimizing for linguistic quality while ignoring structural timing constraints. To fix this, Vimeo split the pipeline into three phases: smart chunking (grouping source lines into logical thought blocks), creative translation (LLM translates freely without structural constraints), and line mapping (a separate LLM call re-splits the translated block to match original timing). This achieved ~95% accuracy on the first pass. For the remaining 5%, a graduated fallback chain was built: a correction loop with explicit error feedback, a simplified LLM prompt, and finally a rule-based algorithm that pads or truncates lines. The result is 100% of subtitle chunks reaching viewers in a valid state, eliminating blank screens. The multi-pass approach adds 4-8% processing time and 6-10% token cost but eliminates ~20 hours of manual QA per 1,000 videos. Key lessons: separate creative from structural LLM work, design fallback chains before happy paths, and account for the 'infrastructure tax of intelligence.'
Sort: