In October 2024, Canva reached 200 million monthly active users, highlighting the need for an effective search function for user designs. Traditional search evaluations use sampled user queries and expert labeling, but Canva prioritizes privacy, avoiding the use of real user data. Instead, they leverage generative AI to create synthetic content, enabling them to test search improvements without compromising privacy. The post discusses building a labeled evaluation dataset using GPT-4, issues encountered, and the development of an evaluation tool for rapid, reproducible offline testing. This allows engineers to confidently assess changes locally before online experimentation, enhancing iteration speed and result reliability.
Table of contents
Original stateIdeal stateGenerating realistic private search datasetsTest cases to measure recallTest cases to measure precisionIssues encountered using LLMsRunning the evaluationVisualizing the resultsImpact and future plansAcknowledgementsSort: