A graduate student shares how they used Claude to generate synthetic practice exams when real past exams weren't available at their new university. Drawing a parallel between machine learning concepts (synthetic data, overfitting, dataset pollution) and human studying, the author describes two scenarios: replicating a known exam template and constructing mock exams from scratch. About 60% of questions on the actual exam matched their practice material, but a blind spot emerged from over-relying on personal assumptions about what would be tested. Key lessons include using separate chat sessions to avoid context rot, keeping an open mind about edge-case topics, and supplementing synthetic data with real exam questions when possible. The piece concludes with a broader reflection on LLMs as learning tools that can personalize education when used responsibly.

17m read timeFrom towardsdatascience.com
Post cover image
Table of contents
Practice Makes PassingThe Human Training Data ProblemSynthetic Training Data for HumansEasy Mode: Replicating a TemplateHard Mode: Construction from ScratchGeneralizing to Test Data and Preventing Dataset PollutionOvercoming Overfitting: How to Make the Best of Synthetic Human Training DataAfterword: My Thoughts on LLMs as a Learning AidFootnotesReferences

Sort: