Coming up with fair methods to select rankings is difficult, and it's especially hard to design judging methods for large-scale events.

Anish Athalye

Common hackathon judging systems that rely on absolute numerical scores are fundamentally flawed due to judge bias, lack of global context, and order-dependence. A better approach uses pairwise comparisons, where judges compare two entries head-to-head rather than assigning scores. Using Thurstone's probabilistic model and a maximum likelihood estimator (with Tikhonov regularization), pairwise comparison data can be converted into a high-quality global ranking. The method was successfully implemented at HackMIT's Blueprint high school hackathon using a Twilio-based data collection system and a Python/MATLAB optimization pipeline, and has since been open-sourced as Gavel and adopted at dozens of events.

Designing a Better Judging System

Normalization: A Step in the Right Direction