This post discusses a subtle bias that can impact decision trees and random forests. The bias can be eliminated by integrating out the effect of the choice of conditioning operators. The bias is more likely to occur when a feature domain contains highly probable equidistant values and when relatively deep trees are built. Integrating out the bias involves building two forests of half the size, fitting one to the original data and the other to the mirrored data, and averaging the results.
Table of contents
IntroductionA motivating exampleBinary decision tree induction and inferenceThe conditioning and the thresholdThe relation of conditioning and mirroringWhen can this happen?It is a bias, model selection cannot help!Mitigating the bias in random forestsConclusionFurther readingSort: