There is a chance that your decision trees and random forests suffer from a small bias, which can be eliminated with ease, for basically no cost. This is what we explore in this post. Decision trees…

Towards Data Science is a community-powered publication that showcases work in data science, machine learning and artificial intelligence. Every day newcomers, seasoned researchers and industry practitioners publish tutorials, research notes and real-world case studies that help the field move forward.

Towards Data Science

This post discusses a subtle bias that can impact decision trees and random forests. The bias can be eliminated by integrating out the effect of the choice of conditioning operators. The bias is more likely to occur when a feature domain contains highly probable equidistant values and when relatively deep trees are built. Integrating out the bias involves building two forests of half the size, fitting one to the original data and the other to the mirrored data, and averaging the results.

A Subtle Bias that Could Impact Your Decision Trees and Random Forests

Binary decision tree induction and inference

The relation of conditioning and mirroring

It is a bias, model selection cannot help!