A technique using bagging (bootstrap aggregating) as a regularizer is explored, where instead of training a single Gradient Boosted Decision Tree, 100 smaller GBDTs are trained on heavily subsampled data. The spread of predictions across these models approximates the uncertainty of each prediction. By penalizing uncertain
Sort: