A graduate student shares how they used Claude to generate synthetic practice exams when real past exams weren't available at their new university. Drawing a parallel between machine learning concepts (synthetic data, overfitting, dataset pollution) and human studying, the author describes two scenarios: replicating a known exam template and constructing mock exams from scratch. About 60% of questions on the actual exam matched their practice material, but a blind spot emerged from over-relying on personal assumptions about what would be tested. Key lessons include using separate chat sessions to avoid context rot, keeping an open mind about edge-case topics, and supplementing synthetic data with real exam questions when possible. The piece concludes with a broader reflection on LLMs as learning tools that can personalize education when used responsibly.
Table of contents
Practice Makes PassingThe Human Training Data ProblemSynthetic Training Data for HumansEasy Mode: Replicating a TemplateHard Mode: Construction from ScratchGeneralizing to Test Data and Preventing Dataset PollutionOvercoming Overfitting: How to Make the Best of Synthetic Human Training DataAfterword: My Thoughts on LLMs as a Learning AidFootnotesReferencesSort: