Common hackathon judging systems that rely on absolute numerical scores are fundamentally flawed due to judge bias, lack of global context, and order-dependence. A better approach uses pairwise comparisons, where judges compare two entries head-to-head rather than assigning scores. Using Thurstone's probabilistic model and a maximum likelihood estimator (with Tikhonov regularization), pairwise comparison data can be converted into a high-quality global ranking. The method was successfully implemented at HackMIT's Blueprint high school hackathon using a Twilio-based data collection system and a Python/MATLAB optimization pipeline, and has since been open-sourced as Gavel and adopted at dozens of events.
Table of contents
Designing a Better Judging SystemAveraging: A First AttemptNormalization: A Step in the Right DirectionA Fundamentally Flawed SystemA Better ApproachA Preliminary AlgorithmA Robust ModelImplementationConclusionSort: